Since algorithm DP cannot be used for large instances due to its high computational complexity, we design the following simple heuristics running in \(O(m(m+n))\) time, which build dataset sequences step by step, using greedy methods.
-
1.
Algorithm gTime always chooses to send the dataset that will be transferred in the shortest time.
-
2.
Algorithm gRate selects the dataset that will be sent at the best average communication rate.
-
3.
Algorithm gSlowtime chooses the dataset for which the time when data are transferred over a loaded link will be the shortest.
In all these three heuristics ties are broken by selecting a larger dataset, and in the case of equal dataset sizes, by choosing a dataset with a lower number. We also implement algorithm Rnd, which constructs a random dataset sequence and computes the corresponding makespan in \(O(m+n)\) time.
Finally, we propose a group of local search algorithms with neighborhoods based on dataset swaps. Each of these algorithms starts with a schedule generated by one of the simple heuristics and then applies the following local search procedure. For each pair of datasets, we check whether swapping their positions in the current sequence leads to decreasing the schedule length. The swap that results in the shortest schedule is executed, and the search continues until no further improvement is possible. These algorithms are called gTimeLocal, gRateLocal, gSlowtimeLocal and RndLocal. The beginning of each name refers to the heuristic used to construct the initial schedule.
We will now analyze theoretical performance guarantees of heuristics gTime, gRate and gSlowtime. Obviously, local search algorithms deliver solutions that are not worse than the initial schedules obtained by the corresponding simple heuristics.
Proposition 6
The approximation ratio of any algorithm is at most \(\delta \).
Proof
The maximum possible time of transferring the data is \(\delta \sum _{i=1}^m \alpha _i\). The optimum schedule is not shorter than \(\sum _{i=1}^m \alpha _i\). The claim follows. \(\square \)
Proposition 7
Algorithms gTime and gSlowtime have tight approximation ratios equal to \(\delta \).
Proof
We will analyze the following instance. Let \(m\ge 2\). We have one big dataset of size \(\alpha _1=(m-1)\delta + 2\) and \(m-1\) small datasets with \(\alpha _2=\cdots =\alpha _m=1\). The first communication link is free in interval \([0, (m-1)\delta )\) and loaded in interval \([(m-1)\delta , \infty )\). Communication links \(2,\dots ,m\) follow the opposite pattern: they are loaded in interval \([0, (m-1)\delta )\) and free in interval \([(m-1)\delta , \infty )\).
The maximum time that may be necessary to transfer a small dataset is \(\delta \), whereas the time of transferring the big dataset is at least \((m-1)\delta + 2> \delta \). Hence, algorithm gTime always prefers sending a small dataset to sending the big one.
As the first link is free only in interval of length \((m-1)\delta \), the big dataset must be sent over a loaded link for at least \(2\delta \) time units. The maximum possible slowdown time for a small dataset is \(\delta \). Thus, algorithm gSlowtime also prefers small datasets.
Therefore, both algorithms gTime and gSlowtime will first send all small datasets in total time \((m-1)\delta \), and then dataset \(D_1\) in time \(\delta \alpha _1=\delta ((m-1)\delta + 2)\). This results in schedule length
$$\begin{aligned} T=(m-1)\delta (\delta +1)+2\delta . \end{aligned}$$
(5)
However, in the optimum schedule, dataset \(D_1\) is transferred first, in time \((m-1)\delta +2\delta \), and then the \(m-1\) small datasets are sent in time 1 each. Hence, the minimum schedule length is
$$\begin{aligned} T^{*}=(m-1)(\delta +1)+2\delta . \end{aligned}$$
(6)
Since
$$\begin{aligned} \lim _{m\rightarrow \infty } \frac{T}{T^{*}}= \lim _{m\rightarrow \infty } \frac{(m-1)\delta (\delta +1)+2\delta }{(m-1)(\delta +1)+2\delta }= \delta , \end{aligned}$$
(7)
the approximation ratios of algorithms gTime and gSlowtime are not smaller than \(\delta \). By Proposition 6, the claim follows. \(\square \)
Proposition 8
Algorithm gRate has a tight approximation ratio of \(\delta \).
Proof
Let k be a number such that \(k\delta \) is an integer greater than 2, and let \(m=k\delta +1\). Suppose we have to collect one big dataset of size \(\alpha _1=k\) and \(k\delta \) small datasets of size \(\alpha _2=\cdots =\alpha _{k\delta +1}=1\). The first link is loaded in interval \([0,(k-1)\delta )\). Links \(2,\dots ,k\delta +1\) are loaded in intervals \([0,\delta )\) and \([(k-1)\delta +1,\infty )\). At time 0, each small dataset can be sent at average speed \(1/\delta \), and the average speed for sending the big dataset is \(k/((k-1)\delta +1)> 1/\delta \). Thus, algorithm gRate sends dataset \(D_1\) first and finishes this transfer at time \((k-1)\delta +1\). Hence, all \(k\delta \) small datasets are sent over loaded links in time \(\delta \) each, and the resulting schedule length is
$$\begin{aligned} T=(k-1)\delta +1 + k\delta ^2= k\delta (\delta +1) - \delta +1. \end{aligned}$$
(8)
In the optimum schedule, the small datasets are sent first. The free intervals on the respective links allow to transfer data of size \((k-2)\delta +1\). Thus, parts of small datasets of total size \(2\delta -1\) have to be sent over loaded links. Consequently, the transfer of all small datasets finishes at time
$$\begin{aligned} t=(k-2)\delta +1 +(2\delta -1)\delta > (k-1)\delta +1, \end{aligned}$$
(9)
and the big dataset is sent over a free link, in time k. Hence, the optimum makespan is
$$\begin{aligned} T^{*}=t+k=k(\delta +1) + (2\delta -1)(\delta -1). \end{aligned}$$
(10)
As
$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{T}{T^{*}}= \lim _{k\rightarrow \infty } \frac{k\delta (\delta +1) - \delta +1}{k(\delta +1) + (2\delta -1)(\delta -1)}= \delta , \end{aligned}$$
(11)
the approximation ratio of gRate is at least \(\delta \). By Proposition 6, we obtain the claim. \(\square \)
We showed that the tight approximation ratios of algorithms gTime, gRate and gSlowtime are as big as possible. Still, the average performance of these algorithms may be much better. We will evaluate it experimentally in Sect. 7.
It is interesting to compare the algorithms with each other, and determine whether there exist domination relations between them. In the remainder of this section, we show that this is not the case. For each of the three heuristics analyzed here, there exist instances which it solves to the optimum, while the remaining two algorithms do not.
Proposition 9
There exist instances which are solved to the optimum by gRate, but not by gTime or gSlowtime.
Proof
Let us analyze the instance described in the proof of Proposition 7. We already know that the solutions returned by gTime and gSlowtime are far from the optimum, especially when m is large. At time 0, any small dataset will be sent at average communication speed \(1/\delta \), and the large dataset \(D_1\) will be transferred at average speed larger than \(1/\delta \). Thus, algorithm gRate starts with sending dataset \(D_1\), and in consequence, it constructs the optimum schedule.
\(\square \)
Proposition 10
There exist instances which are solved to the optimum by gTime, but not by gRate or gSlowtime.
Proof
Let \(m=2\), \(\alpha _1=1\), \(\alpha _2=2\). Suppose that the first link is always loaded, and the second link is loaded in interval \([0,\delta )\), and free afterward. The average communication speed of a transfer starting at time 0 is \(1/\delta \) if \(D_1\) is sent, and \(2/(\delta +1)\) if \(D_2\) is chosen. Since \(2/(\delta +1) > 1/\delta \), gRate sends dataset \(D_2\) first. As both datasets would be sent over a loaded link for time \(\delta \) if their transfers started at time 0, algorithm gSlowtime chooses the larger dataset, which is \(D_2\). Thus, both gRate and gSlowtime construct dataset sequence \((D_2,D_1)\), which gives a schedule of length \(2\delta +1\).
At moment 0, the time necessary to transfer \(D_1\) is \(\delta \), and the time needed for transferring \(D_2\) is \(\delta +1\). Hence, algorithm gTime constructs the optimum schedule \((D_1,D_2)\) of length \(\delta +2< 2\delta +1\). \(\square \)
Proposition 11
There exist instances which are solved to the optimum by gSlowtime, but not by gRate or gTime.
Proof
Let \(m=3\), \(\alpha _1=2\), \(\alpha _2=5\), \(\alpha _3=4\), \(\delta <1.5\). Suppose that the first link is free only in interval [0, 1), and the second link is free only in interval \([2\delta ,2\delta +3)\). The third link is always loaded. If the respective transfers start at time 0, the average communication speeds for sending datasets \(D_1,D_2,D_3\) will be \(2/(\delta +1),5/(2\delta +3), 1/\delta \), correspondingly. Hence, algorithm gRate will start with sending dataset \(D_2\) in interval \([0,2\delta +3)\). After time \(t=2\delta +3\) the links of \(P_1\) and \(P_3\) are always loaded. Hence, their communication speeds are the same, and gRate will first transfer \(D_3\), which is larger than \(D_1\), and finish with \(D_1\). Thus, the dataset sequence constructed by gRate is \((D_2,D_3,D_1)\), and the corresponding schedule length equals
$$\begin{aligned} T_{gRate}=2\delta +3+4\delta +2\delta =8\delta +3. \end{aligned}$$
(12)
If the respective transfers start at time 0, datasets \(D_1,D_2,D_3\) will be sent in times \(1+\delta \), \(2\delta +3\), \(4\delta \), correspondingly. The resulting slowdown times will be \(\delta ,2\delta \), \(4\delta \), respectively. Thus, algorithms gTime and gSlowtime send \(D_1\) first. This transfer finishes at time \(1+\delta \). At this moment, the time required for the transfer of \(D_2\) is \(2\delta +3\), and the transfer of \(D_3\) would take time \(4\delta \). Since \(\delta <1.5\), algorithm gTime selects \(D_3\), whose transfer is completed at time \(1+5\delta > 2\delta +3\). Thus, the whole dataset \(D_2\) will be sent over a loaded link. The sequence constructed by gTime is (\(D_1\), \(D_3\), \(D_2\)), which gives schedule length
$$\begin{aligned} T_\mathrm{{gTime}}=1+\delta +4\delta +5\delta =10\delta +1. \end{aligned}$$
(13)
After transferring dataset \(D_1\), algorithm gSlowtime has to choose between sending \(D_2\) and \(D_3\), starting at time \(\delta +1\). The respective times of transfer over loaded links equal \(2\delta \) and \(4\delta \). Hence, gSlowtime selects dataset \(D_2\). The obtained schedule is \((D_1,D_2,D_3)\), and its makespan is
$$\begin{aligned} T_\mathrm{{gSlowtime}}=1+\delta +3+2\delta +4\delta =7\delta +4. \end{aligned}$$
(14)
Since \(\delta >1\), the schedule constructed by gSlowtime is shorter than the ones obtained by gTime and gRate. It is also easy to check that the solution delivered by gSlowtime is optimal. \(\square \)