Introduction

In most single-machine literature studies, researchers commonly assumed job-processing times or ready times or due dates are fixed numbers, and the setup times are even ignored. However, there are significant uncertainties in many real production systems. For example, machines may break down, job operators may lack ability, the working facility may be changed, i.e., random job arrivals, as well as several other external complex factors can lead to job cancellations or altered tool quality [12]. In light of these facts, job-processing times might not be assumed as fixed numbers. On the other hand, huge economic costs can be saved when setup times are effectively considered in scheduling decisions in various industrial environments [14].

In addressing the uncertainty issues observed in the production process, researchers modeled job processing times or due dates or ready times as random parameters, or considered machines can break down randomly. Thus, stochastic scheduling problems have been inspired much attention in the researcher community, in which authors applied some probability distributions to model the uncertain factors and attempted optimality using the average sense idea. For some stochastic single machine or flowshop studies with stochastic sequence-dependent setup times or with learning effect, readers may refer to Li [16], Ertem et al. [6], Urgo and Vancza [23], Lusby et al. [17], Fu et al. [7,8,9] etc. However, in another stream of addressing uncertainty issues, some authors considered job-processing times as statistical data, in which the data variance might be a bit larger and the underlying distributions drawn from the data cannot be correct. Additionally, the worst-case measurement of a system is more important than the average-case measurement. In addressing this situation, Kouvelis and Yu [15] introduced a robustness method to hedge against the worst case contingency. For more practical applications with the uncertainty expressed as a part of the scenarios, readers may refer to Aloulou and Della Croce [3], Aissi et al. [4], Yang and Yu [29], and Kasperski and Zielinski [12], and so on.

Meanwhile, the literature also showed that setup times are often assumed to be either sequence independent or sequence dependent. In the former case, the setup times are ignored or included in the job-processing times. In the latter case, the setup times are assumed not only to depend on the job currently being scheduled but also the last scheduled job. This study will focus on the latter case. That is, setup times are proportionate to the length of the already scheduled jobs. This is also termed as past-sequence-dependent or p-s-d setup times. The situation occurs in high-tech manufacturing in which a batch of jobs consists of a group of electronic components mounted together on an integrated circuit (IC) board [14]. Following this approach, Wang [24,25,26] and Cheng et al. [5] addressed some polynomial solutions to solve single-machine scheduling models with past-sequence-dependent setup times and the exponential/or time-dependent learning effect. Wang and Li [27] examined several single-machine p-s-d setup times scheduling problems along with general position-dependent and time-dependent learning effects. Soroush [20] introduced new exact and polynomially solution strategies to minimize the sum of earliness penalties subject to no tardy jobs on a single machine. Soroush [21] addressed a bicriteria single-machine scheduling problem with job-dependent p-s-d setup times and job-dependent position-based learning effects. The objective functions are a linear composite function of a pair of performance criteria such as the makespan, the total completion time, the total lateness, the total absolute differences in completion times, and the sum of earliness, tardiness, and common due date penalty, etc. Regarding more discussion of variants of scheduling models with p-s-d setup times, readers may refer to a third comprehensive survey paper by Allahverdi [2]. For more recent single-machine literature studies with different settings, readers may refer to Adulyasak et al. [1], Yin et al. [31,32,33,34,35], Xiong et al. [36], Kamran et al. [13], and Gao et al. [10], etc.

Motivated by the many applications of the past-sequence-dependent setup times by Wang [24,25,26] and Cheng et al. [5] and a semi-automatic assembly example of scenario-dependent processing times by Wu et al. [29], following the robustness idea of Kouvelis and Yu [15] and the p-s-d idea of Koulamas and Kyparisis [14], this study introduces a single-machine scheduling problem with p-s-d setup times and scenario-dependent processing times where the objective function is to minimize the maximum total completion time among scenarios. To solve this problem, we derive one lower bound and one property to be used in a branch-and-bound algorithm for small job cases. Then we propose four new neighborhood search methods to embed in the cloud theoretical simulated annealing [22] to compare with the common one (two-point swapping method) for the small and large job cases. The several contributions of this article are: (1) introduce a single-machine scheduling problem with p-s-d setup times and scenario-dependent processing times except the objective function is the same as that of Yang and Yu [29]. (2) To solve this model, we derive a lower bound and a property to be used in a branch-and-bound algorithm for the optimal solution for a small number of jobs. (3) We propose five variants of the cloud-theoretical simulated annealing (CSA) algorithm, including new neighborhood search methods for a small number of jobs and a large number of jobs.

The rest of this study is described as follows. The next section introduces the problem formulation. The following section presents properties and a lower bound used in the branch-and-bound algorithm. The next section introduces five neighborhoods and proposes five variants of CSA algorithms. The following section tunes the parameters values of the CSA algorithms. The next section tests and reports the performances of all proposed methods. The last section offers a conclusion and provides future suggestions.

Problem description

The study under consideration can be formulated as follows. Consider n jobs to be processed on a single machine and all jobs are available at time zero. No job can be preempted. The machine cannot be allowed to be idle and it can process one job at a time. No job can be interrupted after the start of its processing. Further, economic savings are achieved when setup times are explicitly included in scheduling decisions. On the other hand, many real systems frequently face various uncertainties such as working environment changes, machine breakages, a worker becomes unstable, etc. Job-processing times cannot be treated as fixed numbers. Thus, following the ideas of both Koulamas and Kyparisis [14] and Yung and Yu [29], we define the scenario-based processing times of job Jj to be scheduled in the kth position and a value that depends on the sum of the processing times of the all already scheduled jobs as \(p_{jk}^{\left( s \right)} = s_{\left[ k \right]}^{\left( s \right)} + p_{j}^{\left( s \right)}\), for scenario s = 1 or 2 on the machine, where \(s_{\left[ 1 \right]}^{\left( s \right)} = 0\), \(s_{\left[ k \right]}^{\left( s \right)} = b^{k - 1} \mathop \sum \nolimits_{i = 1}^{k - 1} p_{\left[ i \right]}^{\left( s \right)}\), k = 1, 2,…, n, [·] represents the scheduled job position, and b, named a learning rate, is a constant number with \(0 < b \le 1.\)

Suppose \({\Omega }\) denotes all possible permutations of given n jobs. For a schedule \(\sigma\), let \(t^{\left( s \right)}\) be the starting times of a job scheduled the kth position in \(\sigma\) with the initial condition \(t^{\left( s \right)} = 0\) for any job in the first position. Then, the completion times of Jj at sequence position k in schedule \(\sigma\) are defined as \( C_{\left[ k \right]}^{\left( s \right)} \left( \sigma \right) = t^{\left( s \right)} + s_{{\left[ {k - 1} \right]}}^{\left( s \right)} + p_{\left[ k \right]}^{\left( s \right)}\). Then, the robust total completion time of σ, is defined as \({\text{RpsdTC}}\left( \sigma \right) = \mathop {\max }\nolimits_{s = 1,2} \mathop \sum \nolimits_{1 \le k \le n} C_{\left[ k \right]}^{\left( s \right)} \left( \sigma \right)\). In standard machine scheduling terms [18], the proposed problem can be denoted by 1/psd, \(p_{jk}^{s}\)/\(\mathop {\max }\nolimits_{s = 1,2} \mathop \sum \nolimits_{1 \le k \le n} C_{\left[ k \right]}^{\left( s \right)} \left( \sigma \right)\). The goal is to find a sequence \(\sigma^{*}\) such that \({\text{RpsdTC}}\left( {\sigma^{*} } \right) = \min_{\sigma \in \Omega } \left\{ {{\text{RpsdTC}}\left( \sigma \right)} \right\}\).

Solution methodology

Since the same problem without p-s-d consideration has been proven to be an NP-hard problem by Yang and Yu [29], this study problem is also NP hard. Thus, we derive a lemma and a lower bound used in a branch-and-bound method for finding the optimal objective. Given two sequences \(\sigma_{1} = \left( {\delta ,i,j,\delta^{\prime}} \right)\) and \(\sigma_{2} = \left( {\delta ,j,i,\delta^{\prime}} \right)\) in which \( \delta\) and \( \delta^{^{\prime}}\) present two partial sequences, we say sequence \(\sigma_{1}\) dominates \(\sigma_{2}\) if the following conditions hold.

$$ \begin{aligned} {\max}\left\{ {C_{i}^{\left( 1 \right)} \left( {\sigma_{1} } \right) + C_{j}^{\left( 1 \right)} \left( {\sigma_{1} } \right),C_{i}^{\left( 2 \right)} \left( {\sigma_{1} } \right) + C_{j}^{\left( 2 \right)} \left( {\sigma_{1} } \right)} \right\} \hfill \\ < {\max}\{ C_{j}^{\left( 1 \right)} \left( {\sigma_{2} } \right) + C_{i}^{\left( 1 \right)} \left( {\sigma_{2} } \right),C_{j}^{\left( 2 \right)} \left( {\sigma_{2} } \right) + C_{i}^{\left( 2 \right)} \left( {\sigma_{2} } \right)\} , \hfill \\ C_{j}^{\left( 1 \right)} \left( {\sigma_{1} } \right) < C_{i}^{\left( 1 \right)} \left( {\sigma_{2} } \right),\;{\text{and}} \hfill \\ C_{j}^{\left( 2 \right)} \left( {\sigma_{1} } \right) < C_{i}^{\left( 2 \right)} \left( {\sigma_{2} } \right). \hfill \\ \end{aligned} $$

The details of Lemma 1 are given as follows:

Lemma 1

Given two jobs Ji and Jj to be scheduled in adjacent positions, there is an optimal sequence that job Ji is scheduled before job Jj if \(p_{i}^{\left( 1 \right)} < p_{j}^{\left( 1 \right)}\) and \(p_{i}^{\left( 2 \right)} < p_{j}^{\left( 2 \right)}\) hold.

Proof

Suppose \(\sigma_{1} = \left( {\delta ,i,j,\delta^{\prime}} \right)\) and \(\sigma_{2} = \left( {\delta ,j,i,\delta^{\prime}} \right)\), and there are r jobs scheduled in \(\delta\), the following will demonstrate \(\sigma_{1}\) dominates \(\sigma_{2}\). Let \(t^{\left( s \right)}\) be the starting times of a job scheduled in the (r + 1)th position in \({ }\sigma_{1}\) and \(\sigma_{2}\), and \(s_{\left[ k \right]}^{\left( s \right)} \left( \sigma \right) = b^{k - 1} \mathop \sum \nolimits_{i = 1}^{k - 1} p_{\left[ i \right]}^{\left( s \right)}\). Note, \(s_{\left[ r \right]}^{\left( s \right)} \left( {\sigma_{1} } \right) = b^{r - 1} \mathop \sum \nolimits_{i = 1}^{r - 1} p_{\left[ i \right]}^{\left( s \right)} = s_{\left[ r \right]}^{\left( s \right)} \left( {\sigma_{2} } \right)\).

According to the definition of the completion time of a job, we have:

$$ \begin{aligned} C_{i}^{\left( s \right)} \left( {\sigma_{1} } \right) &= t^{\left( s \right)} + s_{\left[ r \right]}^{\left( s \right)} \left( {\sigma_{1} } \right) + p_{i}^{\left( s \right)} , \hfill \\ C_{j}^{\left( s \right)} \left( {\sigma_{1} } \right) &= C_{i}^{\left( s \right)} \left( {\sigma_{1} } \right) + s_{{\left[ {r + 1} \right]}}^{\left( s \right)} \left( {\sigma_{1} } \right) + p_{j}^{\left( s \right)} \hfill \\ &= C_{i}^{\left( s \right)} \left( {\sigma_{1} } \right) + bs_{\left[ r \right]}^{\left( s \right)} \left( {\sigma_{1} } \right) + b^{r} p_{i}^{\left( s \right)} + p_{j}^{\left( s \right)} , \hfill \\ C_{j}^{\left( s \right)} \left( {\sigma_{2} } \right) &= t^{\left( s \right)} + s_{\left[ r \right]}^{\left( s \right)} \left( {\sigma_{2} } \right) + p_{j}^{\left( s \right)} , \hfill \\ \end{aligned} $$

and

$$ C_{i}^{\left( s \right)} \left( {\sigma_{2} } \right) = C_{j}^{\left( s \right)} \left( {\sigma_{2} } \right) + s_{{\left[ {r + 1} \right]}}^{\left( s \right)} \left( {\sigma_{2} } \right) + p_{i}^{\left( s \right)} = C_{j}^{\left( s \right)} \left( {\sigma_{2} } \right) + bs_{\left[ r \right]}^{\left( s \right)} \left( {\sigma_{2} } \right) + b^{r} p_{j}^{\left( s \right)} + p_{i}^{\left( s \right)} . $$

Based on the above, we have

$$ C_{i}^{\left( s \right)} \left( {\sigma_{2} } \right) - C_{j}^{\left( s \right)} \left( {\sigma_{1} } \right) = b^{r} \left( {p_{j}^{\left( s \right)} - p_{i}^{\left( s \right)} } \right) > 0,{\text{ for}}\;s = {1},{ 2}. $$

In addition,

$$ C_{j}^{\left( s \right)} \left( {\sigma_{2} } \right) + C_{i}^{\left( s \right)} \left( {\sigma_{2} } \right) - C_{i}^{\left( s \right)} \left( {\sigma_{1} } \right) - C_{j}^{\left( s \right)} \left( {\sigma_{1} } \right) = \left( {1 + b^{r} } \right)\left( {p_{j}^{\left( s \right)} - p_{i}^{\left( s \right)} } \right) > 0. $$

Hence \(\max_{s = 1,2} \left\{ {C_{i}^{\left( s \right)} \left( {\sigma_{1} } \right) + C_{j}^{\left( s \right)} \left( {\sigma_{1} } \right)} \right\} < \max_{s = 1,2} \left\{ {C_{j}^{\left( s \right)} \left( {\sigma_{2} } \right) + C_{i}^{\left( s \right)} \left( {\sigma_{2} } \right)} \right\}\), this completes the proof.

Next, a lower bound will be derived to decline the branching trees of the branch-and-bound method. It is noted that r denotes the number of scheduled jobs, \(n_{1}\) denotes the number of unscheduled jobs, where n = r + \(n_{1}\), and \(p_{{\left( {r + 1} \right)}}^{\left( s \right)} \le p_{{\left( {r + 2} \right)}}^{\left( s \right)} \le \cdots \le p_{{\left( {r + n_{1} } \right)}}^{\left( s \right)}\)denotes a non-decreasing order of {\(p_{r + 1}^{\left( s \right)} , p_{r + 1}^{\left( s \right)} , \ldots ,p_{{r + n_{1} }}^{\left( s \right)}\)} in the unscheduled part, s = 1, 2. By applying the same approach of Soroush [20, 21], Wang and Li [27], and Yang and Yu [29], a lower bound can be derived as follows:

For the completion times for the jobs scheduled on the subsequently (r + 1),…, (r + \(n_{1}\)), we have:

$$ \begin{aligned} C_{{\left[ {r + 1} \right]}}^{\left( s \right)} &= t^{\left( s \right)} + b^{r} \mathop \sum \limits_{k = 1}^{r} p_{\left[ k \right]}^{\left( s \right)} + p_{{\left[ {r + 1} \right]}}^{\left( s \right)} \ge t^{\left( s \right)} + b^{r} \mathop \sum \limits_{k = 1}^{r} p_{\left[ k \right]}^{\left( s \right)} + p_{{\left( {r + 1} \right)}}^{\left( s \right)} \hfill \\ C_{{\left[ {r + 2} \right]}}^{\left( s \right)} &= C_{{\left[ {r + 1} \right]}}^{\left( s \right)} + b^{r + 1} \left[ {\mathop \sum \limits_{k = 1}^{r} p_{\left[ k \right]}^{\left( s \right)} + p_{{\left[ {k + 1} \right]}}^{\left( s \right)} } \right] + p_{{\left[ {k + 2} \right]}}^{\left( s \right)} \hfill \\ &\ge t^{\left( s \right)} + \left( {b^{r} + b^{r + 1} } \right)\mathop \sum \limits_{k = 1}^{r} p_{\left[ k \right]}^{\left( s \right)} + \left( {1 + b^{r + 1} } \right)p_{{\left( {k + 1} \right)}}^{\left( s \right)} + p_{{\left( {k + 2} \right)}}^{\left( s \right)} \hfill \\ & \cdots \hfill \\ C_{{\left[ {r + n_{1} } \right]}}^{\left( s \right)} &\ge t^{\left( s \right)} + \left( {\mathop \sum \limits_{i = 0}^{{n_{1} - 1}} b_{ }^{r + i} } \right)\mathop \sum \limits_{k = 1}^{r} p_{\left[ k \right]}^{\left( s \right)} + \left( {1 + \mathop \sum \limits_{i = 0}^{{n_{1} - 2}} b^{r + i} } \right)p_{{\left( {r + 1} \right)}}^{\left( s \right)}\\ & \quad + \ldots + \left( {1 + b^{{r + n_{1} - 1}} } \right)p_{{\left( {r + n_{1} - 1} \right)}}^{\left( s \right)} + p_{{\left( {r + n_{1} } \right)}}^{\left( s \right)} . \hfill \\ \end{aligned} $$

According to the above developments, we have

$$ \begin{aligned} \mathop \sum \limits_{k = 1}^{n} C_{\left[ k \right]}^{\left( s \right)} &\ge n_{1} t^{\left( s \right)} + \left( {\mathop \sum \limits_{i = 1}^{{n_{1} }} \left( {n_{1} - i + 1} \right)b^{r + i - 1} } \right)\mathop \sum \limits_{k = 1}^{r} p_{\left[ k \right]}^{\left( s \right)} \\ & \quad + \left( {n_{1} + \mathop \sum \limits_{i = 2}^{{n_{1} }} \left( {n_{ } - i + 1} \right)b^{r + i - 1} } \right)p_{{\left( {r + 1} \right)}}^{\left( s \right)} \hfill \\ &\quad + \left( {n_{1} - 1 + \mathop \sum \limits_{i = 3}^{{n_{1} }} \left( {n_{ } - i + 1} \right)b^{r + i - 1} } \right)p_{{\left( {r + 2} \right)}}^{\left( s \right)} \\ &\quad + \cdots + \left( {1 + b^{{r + n_{1} - 1}} } \right)p_{{\left( {r + n_{1} - 1} \right)}}^{\left( s \right)} + p_{{\left( {r + n_{1} } \right)}}^{\left( s \right)} ,\;s = 1,2. \hfill \\ \end{aligned} $$

Therefore, we have:

$$ \begin{aligned} \max_{s = 1,2} \left\{ {\mathop \sum \limits_{k = 1}^{n} C_{\left[ k \right]}^{\left( s \right)} } \right\} &\ge \frac{1}{2}\{ n_{1} t^{\left( s \right)} + \left( {\mathop \sum \limits_{i = 1}^{{n_{1} }} \left( {n_{1} - i + 1} \right)b^{r + i - 1} } \right)\\ & \quad \times \mathop \sum \limits_{k = 1}^{r} p_{\left[ k \right]}^{\left( s \right)} + \left( {n_{1} + \mathop \sum \limits_{i = 2}^{{n_{1} }} \left( {n_{ } - i + 1} \right)b^{r + i - 1} } \right)p_{{\left( {r + 1} \right)}}^{\left( s \right)} \hfill \\ &\quad + \left( {n_{1} - 1 + \mathop \sum \limits_{i = 3}^{{n_{1} }} \left( {n_{ } - i + 1} \right)b^{r + i - 1} } \right)p_{{\left( {r + 2} \right)}}^{\left( s \right)} \\ &\quad + \cdots + \left( {1 + b^{{r + n_{1} - 1}} } \right)p_{{\left( {r + n_{1} - 1} \right)}}^{\left( s \right)} + p_{{\left( {r + n_{1} } \right)}}^{\left( s \right)} \} . \hfill \\ \end{aligned} $$

Following the lemma of Hardy et al. [11], the above inequality can be confirmed by applying to match the elements of the vectors {\(p_{{\left( {r + 1} \right)}}^{\left( s \right)} , \ldots ,p_{{\left( {r + n_{1} - 1} \right)}}^{\left( s \right)} ,p_{{\left( {r + n_{1} } \right)}}^{\left( s \right)}\)}, in the opposite order of the elements of their corresponding weight vectors \(\left\{ {\left( {n_{1} + \mathop \sum \nolimits_{i = 2}^{{n_{1} }} \left( {n_{ } - i + 1} \right)b^{r + i - 1} } \right), \ldots ,\left( {1 + b^{{r + n_{1} - 1}} } \right), 1} \right\}\). Thus, the proposed lower bound (lbdd) can be yielded as follows:

$$ \begin{aligned} {\text{lbdd}}& = \frac{1}{2}\{ n_{1} t^{\left( s \right)} + \left( {\mathop \sum \limits_{i = 1}^{{n_{1} }} \left( {n_{1} - i + 1} \right)b^{r + i - 1} } \right)\mathop \sum \limits_{k = 1}^{r} p_{\left[ k \right]}^{\left( s \right)} \\ & \quad + \left( {n_{1} + \mathop \sum \limits_{i = 2}^{{n_{1} }} \left( {n_{ } - i + 1} \right)b^{r + i - 1} } \right)p_{{\left( {r + 1} \right)}}^{\left( s \right)} \hfill \\ &\quad + \left( {n_{1} - 1 + \mathop \sum \limits_{i = 3}^{{n_{1} }} \left( {n_{ } - i + 1} \right)b^{r + i - 1} } \right)p_{{\left( {r + 2} \right)}}^{\left( s \right)} \\ & \quad + \ldots + \left( {1 + b^{{r + n_{1} - 1}} } \right)p_{{\left( {r + n_{1} - 1} \right)}}^{\left( s \right)} + p_{{\left( {r + n_{1} } \right)}}^{\left( s \right)} \} . \hfill \\ \end{aligned} $$

Five neighborhood search methods and a cloud theory-based simulated annealing

To save on cost or computational time, we provide a cloud theory-based simulated annealing (CSA) algorithm [22] searching for near-optimal solutions for the study problem. As for other variants of the simulated-annealing algorithm, readers might refer to Sayed and Hassanien [19]. There are four common parameters including the initial temperature (\(T_{{\text{i}}}\)), the final temperature (\(T_{{\text{f}}}\)), the cooling factor (λ), and the number of improvement repetitions (\(N_{{\text{r}}}\)) when running the CSA algorithm. The first neighborhood search method, named NS1, is to switch two randomly selected jobs, Ji, Jj, where 1 ≦ i < j  ≦  n, up to Nr times for each level of temperature. To enlarge the diversity of neighborhood search, we propose four more neighborhood search methods (named NS2, NS3, NS4, and NS5) decorated in the CSA algorithm. These five neighborhood search methods are described as follows:

  • NS1: switch two randomly selected jobs, Ji, Jj, where 1 ≦ i < j ≦ n.

  • NS2: randomly select two jobs, Ji, Jj, where 1 ≦ i < j < n, and swap both jobs with their immediately succeeding jobs.

  • NS3: randomly select two jobs, Ji, Jj, where 1 < i < j ≦ n, and swap both jobs with their closest preceding jobs.

  • NS4: randomly select two jobs, Ji, Jj, where 1 ≦ i < i + 1 < j ≦ n, swap the job in front with its immediately succeeding job and swap another with its closest preceding job.

  • NS5: randomly select two jobs, Ji, Jj, where 1 < i < j < n, swap the job in front with its closest preceding job and swap another with its immediately succeeding job.

Figure 1 illustrates these five neighborhood search methods for n = 8. For each method, the first row is an incumbent schedule of job and the circled jobs are two selected jobs by the corresponding search method. The second row is a schedule after applying the search method on the incumbent schedule (or on the circled jobs).

Fig. 1
figure 1

Illustration of five neighborhood search methods

These five neighborhood search methods are, respectively, decorated in the CSA algorithms. They are termed CSA1, CSA2, CSA3, CSA4, and CSA5. As for the initial seed used in the five variants of the CSA algorithm, we adopt the average of two scenarios of the processing time \(p_{j}^{\left( s \right)}\), s = 1, 2, i.e., (\(p_{j}^{\left( 1 \right)} + p_{j}^{\left( 2 \right)}\))/2, and then sort the sequence in a non-decreasing order of (\(p_{j}^{\left( 1 \right)} + p_{j}^{\left( 2 \right)}\))/2 of n jobs as the common initial seed in five CSAs. Recall the values of objective function for two sequences of \(\sigma \) and \({ }\sigma_{t}\) are \({\text{RpsdTC}}\left( \sigma \right) = \mathop {\max }\nolimits_{s = 1,2} \mathop \sum \nolimits_{1 \le k \le n} C_{\left[ k \right]}^{\left( s \right)} \left( \sigma \right)\) and \( {\text{RpsdTC}}\left( {\sigma_{t} } \right) = \mathop {\max }\nolimits_{s = 1,2} \mathop \sum \nolimits_{1 \le k \le n} C_{\left[ k \right]}^{\left( s \right)} \left( {\sigma_{t} } \right)\). Let \(d = \left( {{\text{RpsdTC}}\left( {\sigma_{t} } \right) - {\text{RpsdTC}}\left( \sigma \right)} \right)/{\text{RpsdTC}}\left( \sigma \right)\). The procedures of the proposed CSAs are described as follows:

figure a

Tuning the parameters values in CSAs

This section presents 100 generated instances by randomness in a 1 factor at a time way to calibrate the parameters of the proposed CSAs. The parameters to be tuned are the initial temperature (Ti), cooling rate (λ), and the number of improvements (Nr) for each level temperature. During the tuning procedure, we set the number of jobs at n = 10 and the final temperature at Tf = 10–8. We generate the processing times of \(p_{j}^{\left( 1 \right)}\) from a uniform distribution U(1, 50) for scenario one (s = 1) and the processing times of \(p_{j}^{\left( 2 \right)}\) from another uniform distribution U(1, 100) for scenario two (s = 2), respectively. The control number of the learning rate b is set at 0.5 for each trial. The measurement adopts the average error percentage (AEP), i.e., AEP = [(CO*)/O*] × 100 [%], where C is the total completion time of the job found by CSA1 and O* is the total completion time of the jobs by the branch and bound method. For simplification, all the determined parameters of CSA1 will be adopted in the other four CSAs. The results are summarized in Fig. 2a–c.

Fig. 2
figure 2

Tuning parameters of CSA algorithms

To tune the first parameter Ti, we fixed λ = 0.98, Nr = 5, and Tf = 10–8. Then we test the values of Ti from 0.00005 to 0.9999 by an increment of 0.00005 per time. Figure 2a shows the AEP was approaching at a relatively lower point as Ti at 0.5. Thereby, the Ti was adopted at 0.5 in the CSAs.

For tuning the second parameter λ, we fixed Ti = 0.5, Nr = 5, and Tf = 10–8, the examined value of λ from 0.05 to 0.95 by an increment of 0.05 per trial. As the AEPs gradually declined with λ growing, we extend the λ value to 0.99 and examine the value of λ from 0.90 to 0.99 by an increment of 0.05 per trial. Figure 2b shows the AEPs gradually declined as the λ increased and approached a lower point at λ = 0.95. Thus, a value of λ = 0.99 was adopted for the later tested experiments.

As for the last parameter (Nr), we fixed Ti = 0.5, λ = 0.99, and Tf = 10–8, the tested range of Nr from 5 to 30 by an increment of one per time. Figure 2c shows there is a relative lowest point at Nr = 26. Intentionally, the Nr was set at 26 for the later simulation tests.

In light of the above developments, the parameters (Ti = 0.5, λ = 0.99, Nr = 26, and Tf = 10–8) were finally set in the following simulation tests for all five CSA algorithms.

Computational simulation experiments and analysis

This section presents two computational experiment sets to evaluate the performances of the proposed branch-and-bound method and five CSA algorithms (CSA1, CSA2,…, and CSA5). All programs were coded in Compaq Visual Fortran version 6.6 and tested on a PC with a 2.66-GHz Intel(R) Core(TM) i4 Duo E8200 CPU and 1.99 GB RAM in Windows XP. The test instances were generated based on the combinations of the types of processing times and values of learning rate (b). Three types of processing time are described as follows:

  • Type 1: we consider the case where the range of variation of the processing times for scenario 1 was less than that of the processing times for scenario 2. That is, the processing times of the jobs were generated from the uniform distribution U(1, 100) for scenario 1 and U(1, 50) for scenario 2, respectively.

  • Type 2: we consider the case where the range of variation of the processing times for scenario 1 was greater than that of the processing times for scenario 2. That is, the processing times of the jobs were generated from the uniform distribution U(1, 50) for scenario 1 and U(1, 100) for scenario 2, respectively.

  • Type 3: we consider the case where the range of variation of the processing times for scenario 1 was equal to that of the processing times for scenario 2. That is, the processing times of the jobs were generated from the uniform distribution U(1, 100) for scenario 1 and U(1, 100) for scenario 2, respectively.

Three different learning rate values (b) were set at 0.25, 0.5, and 0.75. For the set of small number of jobs, it was set job size at n = 8, 10 and 12. A set of 100 instances were randomly generated for each case. Consequently, 2700 (n × type × b × 100 = 3 × 3 × 3 × 100) problem instances were tested. The branch-and-bound algorithm was set to skip to the next set of data if the number of nodes exceeded 108.

The set of experiments with a large number of jobs was the same as the small number of jobs, except the n was set at 80, 100, and 120. A set of 100 instances were randomly generated for each case. Consequently, 2700 problem instances were tested as well.

Small-n simulation results analysis

The criterion AEP (average error percentage) of test instances was employed to evaluate the ability of the five CSAs (CSA1, CSA2, CSA3, CSA4, CSA5) searching near-optimal solutions. The AEP was defined as \(100\left[ {\left( {C_{k} - O_{k} } \right)/O_{k} } \right]\left[ \% \right]\), where \(C_{k}\) was acquired from each CSA algorithm and \(O_{k}\) was acquired from the branch-and-bound method.

As to the implementation of the branch-and-bound method, Table 1 displays its capability. The average CPU execution times (in seconds), presented in columns 6 and 7 of Table 1, clearly increase the number of jobs n added. The mean of nodes was also increasing as n was growing, displayed in column 4 of Table 1. In addition, most of the generated test instances (2696 out of 2700) were solvable within 108 nodes, except four problem instances as n = 12.

Table 1 Performance of the branch-and-bound method

Table 2 illustrates the summaries of the node and CPU time for the factors n, type, and learning rate (b), respectively. As the number of jobs (n) increased, the numbers of mean nodes and mean CPU times grew. This meets a characteristic of NP-hard problems.

Table 2 Summary of the performance of branch-and-bound method

Concerning the performances of the proposed five CSA algorithms, Table 3 records the AEPs. The AEP of CSA1 increased slightly as n increased, on the contrary, the AEPs of other CSAs reduced slightly as n increased. Overall, the means of the AEPs are 0.07, 0.11, 0.11, 0.11, and 0.11 for CSA1, CSA2, CSA3, CSA4, and CSA5, respectively, for n = 8, 10, and 12. CSA1 performed better than the other four CSAs.

Table 3 Performance of CSA algorithms for n = 8, 10, 12

Table 4 demonstrates the means of AEPs for different values of the n, type, and learning rate (b), respectively. Table 4 shows AEP increased as the learning rate, b, p-s-d setup time control factor varied from 0.25, 0.50 to 0.75. In terms of the types of processing times, all five CSA algorithms performed best for type 3 problem instances, while performed worst for type 1 problems. Moreover, Fig. 3 demonstrates the distributions of AEPs for five CSA algorithms. The CPU expenditures were all less than 0.1 s.

Table 4 Mean AEP of n, type, and b for CSA algorithms (small n)
Fig. 3
figure 3

AEP boxplots of CSA algorithms

To look into the significant difference among the five CSA algorithms, an ANOVA (analysis of variance) on AEPs was run on SAS (version 9.4). The normality assumption was invalid for the linear model; the p value of the Shapiro–Wilk normality test was less than 0.0001 (with value of statistic 0.8913). Accordingly, a non-parametric statistical method was applied to perform a multiple comparison between the five CSA algorithms. Based on the ranks of AEP computed from 100 instances on the 27 (n·type·b = 3·3·3) blocks of test problem instances, a Freidman test was run and showed the p value was 0.0058 (with a Chi-square value of 14.539 and degrees of freedom of 4). Based on the commonly used level of significance α = 0.05, the test results confirmed the AEP samples did not come from the same distribution.

Further comparing the pairwise differences among the five CSA algorithms, we adopted the Wilcoxon–Nemenyi–McDonald–Thompson procedure (WNMT test; refer to chapter 7 in Hollander et al. [12]. Table 5 shows the sums of ranks of AEP across the 27 blocks for the five CSA algorithms. The rank sums of CSA1 to CSA5 were 58.0, 101.5, 78.0, 85.5, and 82.0, respectively. Only one pair (among ten pairs), “CSA1 vs. CSA2”, statistically differed by less than α = 0.05. The CSA1 was the best overall performer for a small number of jobs.

Table 5 Results of the WNMT pairwise comparison of performance of CSA algorithms

Large-n simulation results analysis

As for the tests on a large number of jobs, the number of jobs was set at n = 80, 100, and 120. A set of 100 instances were randomly generated for each combination of n, type and the learning rate (b); in total 2700 problem instances were examined. To further compare the performances of five proposed CSAs for large n in general, the criterion average relative percent deviation (RPD) is employed to evaluate the relative performance of the five proposed CSA algorithms. The RPD is defined as RPD = \(100\left[ {\left( {A_{k} - A_{*} } \right)/A_{*} } \right]\left[ \% \right]\), where \(A_{k}\) is obtained from each algorithm and \(A_{*}\) was the minimum value among Aks from the five CSAs. Table 6 displays the average RPDs for the five CSAs, while Fig. 4 presents the distributions of RPD for the five CSA algorithms. Overall, Table 6 shows the CSA3, CSA4, and CSA5 produced the lowest RPD, regardless of the number of n. Table 7 presents the means of RPD for different levels of n, type, and b, respectively. In terms of the learning rate b, from Table 7, the RPDs increased as b increased from 0.25, 0.50 to 0.75. In terms of type, all five CSAs produced the largest RPD values.

Table 6 Performance of CSA algorithms for n = 80, 100, 120
Fig. 4
figure 4

RPD boxplots of CSA algorithms

Table 7 Mean RPD of n, type, and b for CSA algorithms (large n)

To look into the significant differences among the five CSA algorithms, an ANOVA (analysis of variance) on RPDs was run. The normality assumption was invalid for the linear model; the p value of the Shapiro–Wilk normality test was less than 0.0001 (with value of statistic 0.5865). Accordingly, the non-parametric statistical method was applied to perform a multiple comparison between the five CSA algorithms. Based on the ranks of the RPDs computed from 100 instances on the 27 blocks of test problem instances, the Freidman test was run and showed the p value was less than 0.0001 (with a Chi-square value of 54.84 and degrees of freedom of 4). Based on the level of significance α = 0.05, the test result confirmed the RPD samples did not come from the same distribution because [……incomplete sentence]. Further comparing the pairwise differences among five CSA algorithms, we adopted the WNMT procedure (test). Column 4 of Table 5 shows the sum of ranks of RPD across the 27 blocks for the five algorithms. The performances among the five CSA algorithms can be divided into two subgroups under the 0.05 level of significance. As shown in column 5 of Table 5, CSA2, CSA3, CSA4, and CSA5 with rank-sums of 66.0, 74.0, 71.0, and 64.0, respectively, were located in a better performance group while CSA1 with a rank-sum of 130.0, was located in a worse performance set. CSA1 performed statistically significantly worse than any other algorithms for large-size problem instances. Moreover, Fig. 4 demonstrated the distributions of RPDs for the five CSA algorithms.

Conclusions and suggestions

The single-machine scheduling problem has been the subject of extensive study in many variants which considered diverse processing environments and learning effect or time dependent scheduling. This was easily verified by employing the Google Scholar search function. Even during the first half of 2020, more than 30 papers were published related to the key words “the single machine scheduling problem”. This research proposes a new variant based on the industrial setting including machine set up time and uncertainty. These characteristics lead to a complexity demanding a metaheuristic solution approach. This work developed a lower bound and a dominance property together embedded into the branch and bound method to solve the small-size problem exactly. We also provided a cloud theory-based simulated annealing (CSA) algorithm with five neighbor search methods searching for near-optimal solutions for relatively large problems. The computational simulation results in Tables 1, 3, and 6 show the branch and bound methods and the five CSA algorithms work well. Note, the proposed algorithms are not compared to other heuristics or metaheuristic algorithms because this problem was a new and unexplored one. In summary, there are three contributions of this research. (1) We introduce a single-machine scheduling problem with post-sequence-dependent setup times and scenario-dependent processing times with the objective function to minimize the maximum total completion time among scenarios. (2) To solve this model, we derive a lower bound and a property to be used in a branch-and-bound algorithm for the optimal solution. (3) We propose five variants of the cloud-theoretical simulated annealing (CSA) algorithm including new neighborhood search methods. However, the current approach assumes a deterministic parameter of uncertainty, i.e. the number of scenarios is two, the resulting schedule may not be robust if there are other types of uncertainty, for example, the uncertainty is from a continuous type of distribution. The main focus of this work is on designing five CSA algorithms which use the five neighborhoods, respectively. It would be an interesting future study to determine if CSA can obtain better results in cases where it adopts the five neighborhoods simultaneously.