Introduction

The hybrid flow shop problem (HFSP) is one branch of the classical permutation flow shop scheduling problem, which plays an indispensable role in modern manufacturing industries. A hybrid flow shop includes two or more processing stages and multiple parallel machines in at least one stage, commonly applied in real-world applications such as steel, textile, chemical, pharmaceutical and electronics [1].

The typical HFSP supposes that a job only visits each production stage one time. But this assumption is always not consistent with many realistic industries, in which all jobs may re-enter the same stage several times for machining. This phenomenon is very common in the semi-conductor and wafer production manufacturing process. To guarantee the product quality, each part needs to be inspected one or more times and if necessary, the part may be reprocessed, the structure is referred as reentrant hybrid flow shop [2].

According to the storage strategy, the machine environment of HFSP can be classified as zero-wait [3], blocking [4], limited buffers [5] and infinite buffers [6]. Prior research has most frequently focused on HFSP with infinite buffers. Yet, with science and technology developing, studies of the HFSP with limited buffers keep emerging. To be specific, if all buffers are occupied, a completed job will be blocked on the incumbent machine until a machine at the downstream stage is available. Such a situation often arises in the fields of steelmaking, automotive, and petrochemical [7].

In a real manufacturing system, tool wearing and service life of equipment may cause different processing times for parallel machines at the same stage. Generally, newer machines require less processing time for completing the same operation [8]. Additionally, it often occurs that some operations such as tool replacement and equipment cleaning are executed to process different jobs. That is to say, a setup time appears between two successive jobs, which is always relevant to job sequence and independent on the processing time [9, 10].

In this paper, we focus on discussing the reentrant HFSP with sequence-dependent setup time and limited buffers (referred to as RHFSP-SDST&LB). Each stage is composed of multiple non-identical parallel machines. The research concerned in this paper attempts to formulate a mathematical model with the criterion of optimizing the total weighted completion time. Since Gupta [11] has already proved that the two-stage HFSP with parallel machines at only a stage belongs to NP-hard, the more complex RHFSP-SDST&LB also is NP-hard. It is natural to develop an effective method to solve the RHFSP-SDST&LB. The main contributions of this work are summarized as follows:

  1. 1.

    We formulated a mathematical programming model of RHFSP-SDST&LB for the first time, aiming to minimizing the total weighted completion time.

  2. 2.

    A cooperative adaptive genetic algorithm named CAGA is newly proposed. In the CAGA, a dual chain coding scheme and a staged-hierarchical approach are adopted for encoding and decoding, respectively. Two initialization schemes including multiple heuristic rules and dynamic adjustment strategy are designed to reconstruct initial population.

  3. 3.

    To enhance the exploration ability of CAGA, some improvement is executed: (a) a GA-based cooperative evolutionary procedure is introduced to explore more promising fields; (b) an adaptive adjustment strategy based on ant colony search (ACS) algorithm and modified greedy heuristic (GH) is implemented to balance the local and global search abilities for some better solutions and (c) a reset operation with dynamic variable step size factor is performed to update unimproved solutions.

  4. 4.

    To estimate the performance of CAGA, we conduct a large number of computational and statistical experiments. The result demonstrates that the proposed algorithm can find better goal values within a reasonable computational time, especially for large scale problems.

The remainder of this paper is organized as follows: The next section introduces previous related research efforts. "Problem statement and formulation" briefly defines the RHFSP-SDST&LB, followed by notations and model formulation. The proposed CAGA is provided in detail in "Cooperative adaptive genetic algorithm". "Experiment and result analysis" implements the experimental test as well as reports computational results. The last section gives conclusions and further works.

Literature review

In this section, we review the relevant literature about HFSP. Literature review indicates that the HFSP can be divided into three groups, that is, reentrant HFSP (RHFSP), HFSP with limited buffers (HFSP-LB), and HFSP with sequence-dependent setup time (RHFSP-SDST). The focus of our review will mainly be on the problem model and heuristic approach because it is also the model and approach we have studied in this paper.

Recently, the RHFSP with identical parallel machines has appealed wide attention from investigators in both academic and industrial fields. To solve the two-stage problem with a single machine at stage one and multiple machines at stage two, Sangsawang et al. [12] applied a hybrid GA and a hybrid particle swarm optimization (PSO) to the blocking problem for minimizing makespan. For multi-stage RHFSP, to minimize makespan, Hekmatfar et al. [13] proposed a hybrid GA for tackling the partial reentrant problem with setup time; Chamnanlor et al. [14] considered time window constraint and designed a hybrid GA using ACS algorithm and GA. Zhou et al. [15] suggested a hybrid differential evolution algorithm so as to minimize the total weighted completion time. Cho et al. [16] and Ying et al. [17], respectively, presented a Pareto GA and an iterative Pareto greedy algorithm so that the bi-objective of makespan and total tardiness were minimized. Lin et al. [18] utilized a hybrid harmony search and GA to minimize makespan and mean flowtime under the consideration of limited buffers and stockers. Cho and Jeong [19] investigated a two-level method to minimize the total of delayed customer demand and maximum throughput. Geng et al. [20] considered reentrant problem with machines turning on and off control strategy, and applied a modified multi-verse optimizer to minimize makespan, tardiness, and idle energy consumption. There are also some studies about RHFSP with unrelated parallel machines related to our problem. With the minimum makespan objective, Eskandaria and Hosseinzadeh [21] suggested some heuristic algorithms based on dispatch rules and variable neighborhood search (VNS) with the consideration of sequent-dependent setup times; Kim and Lee [22] minimized makespan and total tardiness via two heuristic algorithms based on NEH and CDS.

As for HFSP-LB, in terms of uniform parallel machine environment, Yu and Li [23] transformed the two-stage problem into a no-wait three-stage HFSP, and developed a heuristic algorithm so that the completion time was minimized. Abyaneh and Zandieh [24] considered sequent-dependent setup times, and an improved algorithm was applied to minimize two goals as makespan and total tardiness. Rooeinfar et al. [25] presented a hybrid algorithm where it was combined with GA, simulated annealing and PSO so as to minimize total flowtime. Han et al. [26] utilized an improved imperialism competition algorithm to optimize makespan. Jiang and Zhang [27] studied total weighted tardiness and non-processing energy with energy-oriented constraints and presented a multi-objective optimization algorithm. Lin et al. [18] investigated a hybrid method to deal with the reentrant problem. Regarding unrelated parallel machine environment, Rabiee et al. [28] adopted a hybrid meta-heuristic algorithm for no-wait problem with makespan minimization. Almeder and Hartl [29] considered the two-stage discrete random problem with a single machine at stage 1 and two parallel machines at stage 2. A meta-heuristic optimization algorithm with VNS was introduced to minimize average expected utilization of machine 1, average expected utilization of the buffer, and average expected completion time. Soltani and Karimi [30] proposed some heuristics and meta-heuristics to solve cyclic scheduling with machine eligibility constraint so that the cycle time was minimized. To optimize the makespan, Yaurima et al. [31] developed a GA for the problem with sequence-dependent setup time and availability constraints. Li and Pan [7] suggested a novel hybrid algorithm integrating artificial bee colony and tabu search. Xuan et al. [5] designed a hybrid heuristic algorithm based on GA and tabu search.

In the past decades, a lot of attempts on HFSP-SDST with identical parallel machines have been done. For example, Wang et al. [32] studied a two-stage problem with batch processing constraint and utilized a heuristic algorithm to minimize the total weighted completion time. Marichelvam et al. [33] presented a discrete firefly algorithm for minimizing the total tardiness. To makespan minimization, Moccellin et al. [34] proposed a heuristic algorithm based on SPT and LPT rules to tackle the blocking problem; Behnamian [35] introduced a diversified PSO method combined with multiple local search algorithms; Ramezani et al. [36] considered a no-wait problem, where a novel meta-heuristic method integrating invasive weed optimization, VNS and simulated annealing was developed. Khare and Agrawal [37] studied time window constraints, and proposed three approaches including squirrel search algorithm, whale optimization algorithm and grey wolf optimization to optimize two objectives as total weighted earliness and tardiness. Aqil and Allali [38] designed two metaheuristics methods based on migratory bird and water wave optimization in blocking problem to minimize total tardiness and earliness. Tian et al. [39] utilized an adaptive multi-objective VNS so as to minimize the total weighted tardiness and total setup time. Naderi and Yazdani [40] proposed an imperialist competition algorithm so that total tardiness was minimized. To minimize the bi-criteria of makespan and total tardiness, Ebrahimi et al. [41] suggested two algorithms, including non-dominated sorting GA and multi-objective GA, for the problem with uncertain due dates, and Abyaneh and Zandieh [24] applied an improved GA to the problem with finite buffers.

In recent years, some studies about more complicated HFSP-SDST with unrelated parallel machines appeared. For the problem with makespan, Sbihi and Chemangui [42] extracted a three-stage HFSP with identical parallel machines at first two stages and unrelated machines at the third stage from steel continuous casting production, and a regeneration GA was developed; Li et al. [43] proposed an improved artificial bee colony algorithm for the distributed heterogeneous shop problem; Garavito-Hernández et al. [44] presented a meta-heuristic algorithm based on GA and imperialist competition algorithm; Eskandari and Hosseinzadeh [21] used some heuristic algorithms for the reentrant problem and Yaurima et al. [31] suggested a GA to address the problem with limited buffers. Aqil and Allali [45] utilized an iterative greedy meta-heuristic and iterative local search to minimize the total tardiness of all jobs. Zhou and Liu [46] applied an energy-efficient differential evolution algorithm to the problem with fuzzy processing time for minimizing total weighted delivery penalty and total energy consumption. Yu et al. [47] considered machine eligibility, and developed an evolutionary algorithm using four decoding methods so that the bi-objective of total tardiness and total setup time were optimized. Mollaei et al. [48] designed a bio-criteria mixed integer programming model for minimizing makespan and machine cost under the consideration of blocking. Liu and Yang [49] investigated the problem of worker assignments, in which used a ε-constraint method for minimizing makespan and total flowtime. Xuan et al. [50] addressed the problem with step deteriorating, and introduced an improved artificial bee colony algorithm to minimize total weighted completion time where it was combined with GA, VNS and GH.

To the best of our knowledge, there was considerable attention on developing efficient methods for HFSP at reentrant constraint, finite buffers, and sequence-dependent setup time, respectively. However, to date, there is no study on HFSP considering the reentrant constraint, finite buffers, and sequence-dependent setup time at the same time. Hence, it is of great significance to further investigate the work for the integration HFSP of the above three constraints. Table 1 gives comparison of this study with previous literature. From this table, we observe that most of the literature studied the HFSP with two attributes and made conditional assumptions pertaining to RHFSP. Little work has been conducted on the minimization of total weight completion time in the HFSP, while the studies with total weight completion time only consider the HFSP with unlimited buffer capacities. Given the above, this paper proposes a mathematical model of RHFSP-SDST&LB with the objective of total weight completion time, and designs a novel cooperative adaptive genetic algorithm (CAGA) to deal with this complex problem.

Table 1 Comparison of this study with previous literature

Problem statement and formulation

Problem description

The RHFSP-SDST&LB can be defined as: there are N jobs to go through G processing stages in a same order. Each stage g contains UMg unrelated parallel machines (UMg ≥ 2 for at least one stage). The problem studied in this paper involves a reentrant shop in which all jobs should visit the same stage H (H > 1) times. The buffer sizes between two contiguous stages, represented as Vg, are not able to exceed a predetermined upper bound. Additionally, the setup time is dependent on job sequence and the release time is considered. After the job reaches each production stage, an idle machine at incumbent stage will be randomly selected for processing. The layout of reentrant hybrid flow shop with limited buffers is depicted in Fig. 1.

Fig. 1
figure 1

Layout of reentrant hybrid flow shop with limited buffers

Since the intermediate buffer between stages g and g + 1 is limited, after the job i is processed by the machine m at stage g, the following four situations may occur:

  1. (a)

    If there is an idle machine at stage g + 1, the job i will directly enter the next stage for processing (see job 1 in Fig. 2);

  2. (b)

    If all machines at stage g + 1 are occupied, but there exists an available buffer, the job i will enter the buffer and wait until a machine is available at the downstream stage (see job 4 in Fig. 2);

  3. (c)

    If all machines at stage g + 1 and the buffer zone are occupied, but a machine is firstly released at the downstream stage, the finished job i will stay on the incumbent machine until there are idle machines at stage g + 1 (see job 5 in Fig. 2);

  4. (d)

    If there first exist a buffer and other conditions are the same as c), the completed job i will be blocked on the incumbent machine until an available buffer occurs (see job 3 in Fig. 2).

Fig. 2
figure 2

Gantt chart for HFSP-LB

A sequence-dependent setup time Tsijgmh occurs when job j is processed immediately before job i on machine m at stage g of layer h (see job 2 in Fig. 3). Tsijgmh = 0 implies that no setup time is required if job i is the first job handled on machine m at stage g of layer h (see job 4 in Fig. 3).

Fig. 3
figure 3

Gantt chart for HFSP-SDST

The reentrant characteristic is considered. More specifically, the job i will return back to the first stage for continuous processing sequentially when it completes the last stage of layer h (h < H). Each job should visit every processing stage H times, each of which have G stages. Hence, jobs must be processed through (H × G) stages. As shown in Fig. 4, the job 4 undergoes 6 operations in total.

Fig. 4
figure 4

Gantt chart for RHFSP

According to standard three-field notation α|β|γ [51, 52], the mentioned problem can be described as HFG (UM1, UM2, …, UMG)| Ts, recrc, block |ΣWiCiGH where the vector (UM1, UM2,…, UMG) implies the number of non-identical parallel machines at each stage and CiGH indicates the completion time of job i at stage G of the last layer. The optimization target is to minimize total weighted completion time.

Model assumptions

  1. 1.

    All machines are available at time zero, with no breakdowns or maintenance delays.

  2. 2.

    No preemptive priorities are allowed. That is, once an operation is initiated, it must be fully completed without any interruption.

  3. 3.

    A machine cannot process more than one job at any time. At the same time, any job is allowed to be processed by only one machine.

  4. 4.

    Delivery time of jobs is negligible between two adjacent stages.

Notations

i, j: index for different jobs, i, j ∈ \(\overline{N}\), \(\overline{N} = \left\{ {i,j\left| {i,j = 1, \cdots ,N} \right.} \right\}\).

g: index for stage, g ∈ \(\overline{G}\), \(\overline{G} = \left\{ {g\left| {g = 1, \cdots ,G} \right.} \right\}\).

h: index for layer, h ∈ \(\overline{H}\), \(\overline{H} = \left\{ {h\left| {h = 1, \cdots ,H} \right.} \right\}.\)

t: index for time, t ∈ \(\overline{T}\), \(\overline{T} = \left\{ {t\left| {t = 1, \cdots ,T} \right.} \right\}.\)

m: index for machines, m ∈ \(\overline{UM}_{g}\), \(\overline{{{{UM}}}}_{g} = \left\{ {m\left| {m = 1, \cdots ,{{UM}}_{g} } \right.} \right\}\), UMg is the total number of available parallel machines at stage g.

BM: a very big number.

Wi: weight of job i.

Rti: release time of job i.

Vg: buffer sizes between two contiguous stages, VG is the buffer capacities between stages G and 1 when the job revisits production system.

Ptigmh: processing time of job i on machine m at stage g of layer h.

Tsijgmh: setup time when job j is handled immediately before job i on machine m at stage g of layer h.

Cigh: completion time of job i at stage g of layer h.

Stigmh: starting time of job i on machine m at stage g of layer h.

Xigmh: binary, set to 1 if job i is allocated to machine m at stage g of layer h; 0 otherwise.

Yijgmh: binary, set to 1 if job j is handled immediately before job i on machine m at stage g of layer h; 0 otherwise.

Zight: binary, set to 1 if job i at stage g of layer h is processed or block on machine m at time t; 0 otherwise.

Right: binary, set to 1 if job i waits between stages g and g + 1 of layer h at time t; 0 otherwise. Noted that, the job stays between stages G and 1 of layer h at time t if RiGht = 1 (1 < h < H).

Uh: binary, set to 1 if h > 1; 0 otherwise.

Model formulation

From the problem characteristic of the RHFSP-SDST&LB, it can be concluded that reentrant operation, sequence-dependent setup time and limited buffers are interactive and collaborative. It signifies that the variation of one of them will influence the other two constraints. It is not just a simple incorporation of the reentrant constraint, finite buffers, and sequence-dependent setup time from the existing studies. For example, the completion time of each job at previous stage is related to the ending time of previous job at current stage on the same machine, which also directly influences the occurrence of sequence-dependent setup time at this stage. Therefore, the key to constructing the mathematical model is how to solve these interactive and collaborative factors. Based on this, applying the above problem statement and notations, a mathematical model for the studied problem is formulated as follows:

$$ {\text{min }}E = \mathop \sum \limits_{i = 1}^{N} W_{i} C_{iGH} , $$
(1)

s.t.

$$ \mathop \sum \limits_{m = 1}^{{{{UM}}_{g} }} X_{igmh} = 1,{ }\forall i \in \overline{N},g \in \overline{G},h \in \overline{H}, $$
(2)
$$ \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{h = 1}^{H} Z_{ight} \le {{UM}}_{g} ,\forall g \in \overline{G},t \in \overline{T}, $$
(3)
$$ \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{N} Y_{ijgmh} {{St}}_{igmh} \ge \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{N} Y_{ijgmh} \left( {C_{jgh} + Ts_{ijgmh} } \right),i \ne j,\forall g \in \overline{G},m \in \overline{{{{UM}}}}_{g} ,h \in \overline{H}, $$
(4)
$$ C_{igh} - C_{{i,g{ - 1,}h}} \ge \mathop \sum \limits_{{m{^{\prime}} = 1}}^{{{{UM}}_{g - 1} }} X_{{igm{^{\prime}}h}} {{Pt}}_{{igm{^{\prime}}h}} { },\forall i \in \overline{N},g \in \overline{G},h \in \overline{H}, $$
(5)
$$ C_{{i{,0,}h}} = U_{h} C_{{i,G,h{ - 1}}} ,\forall i \in \overline{N},h \in \overline{H}, $$
(6)
$$ \mathop \sum \limits_{i = 1}^{N} R_{ight} \le V_{g} ,\forall g \in \overline{G},h \in \overline{H},t \in \overline{T}, $$
(7)
$$ C_{i^{\prime}gh^{\prime}} - C_{igh} + {{BM}}\left( {2 - X_{i^{\prime}gmh^{\prime}} - X_{igmh} } \right) \ge {{Pt}}_{i^{\prime}gmh^{\prime}} ,\forall i \in \overline{N}, i^{\prime} \in \overline{N}\left( {i < i^{\prime}} \right), g \in \overline{G}, h,h^{\prime} \in \overline{H}, m \in \overline{{{{UM}}}}_{g} , $$
(8)
$$\begin{aligned} & C_{igh^{\prime}} - C_{igh} + {{BM}}\left( {2 - X_{igmh} - X_{{igmh^{\prime}}} } \right) \\ &\quad \ge \mathop \sum \limits_{{g^{\prime} = g + 1}}^{G} X_{{ig^{\prime}mh}} {{Pt}}_{{ig^{\prime}m^{\prime}h}} + \mathop \sum \limits_{{g^{\prime} = 1}}^{g} X_{{ig^{\prime}m^{\prime}h^{\prime}}} {{Pt}}_{{ij^{\prime}m^{\prime}h^{\prime}}} \\ &\qquad + \mathop \sum \limits_{r = h + 1}^{{h^{\prime} - 1}} \mathop \sum \limits_{g = 1}^{G} X_{{igm^{\prime}r}} {{Pt}}_{{igm^{\prime}r}} , \forall i \in \overline{N},g \\ &\qquad \, \in \overline{G},m \in \overline{{{{UM}}}}_{g} , 1 \le h < h^{\prime} \le H,\end{aligned} $$
(9)
$$\begin{aligned} & \mathop \sum \limits_{t = 1}^{T} \left( {Z_{ight} + R_{ight} } \right) + \mathop \sum \limits_{j = 1}^{N} Y_{jigmh} Ts_{jigmh}\\ &\quad = \mathop \sum \limits_{m = 1}^{{{{UM}}_{g} }} {{St}}_{igmh} X_{igmh} - \mathop \sum \limits_{{m^{\prime} = 1}}^{{{{UM}}_{g - 1} }} {{St}}_{{i,g - 1,m^{\prime}h}} X_{{i,g - 1,m^{\prime}h}} , \\ &\qquad \forall i \in \overline{N},g \in \overline{G},h \in \overline{H}(g > 1),\end{aligned} $$
(10)
$$ C_{i11} - \mathop \sum \limits_{m = 1}^{{{{UM}}_{1} }} {\text{Pt}}_{i1m1} X_{i1m1} \ge {{Rt}}_{i} {, }\forall i \in \overline{N}, $$
(11)
$$\begin{aligned} & X_{igmh} ,Y_{ijgmh},\\ &\quad Z_{ight} ,R_{ight} ,U_{h} \in \left\{ {0,1} \right\}{, 0} \le i < N,\forall j\\ &\quad \in \overline{N},g \in \overline{G},h \in \overline{H},t \in \overline{T}.\end{aligned} $$
(12)

Constraint (1) defines the objective function which is to minimize the sum of weighed completion time of all jobs. Constraint (2) ensures each job can only be processed on one machine at any stage. Constraint (3) states each machine cannot process more than one job at the same time. Constraint (4) models the relationship among the starting time, completion time and setup time of two adjacent jobs processed by the same machine at the same level and stage. Specifically, the completion time of the previous job plus the setup time is less than or equal to the completion time of the job. Constraints (5) and (6) make sure that the beginning time for processing the job which is positioned in the order of related operation. Constraint (7) denotes buffer capacity restraint, i.e., the number of jobs waiting at adjacent stages will not be greater than upper bound of buffers. Constraint (8) guarantees the priority relation of two different jobs processed in different layers of the same machine at the same stage. Constraint (9) represents the operation precedence of a job at different lays in the same stage. In other words, the beginning time of job i on machine m' at layer h' must not be earlier than the ending time of job i on machine m at layer h at the same stage. Constraint (10) represents the relationship of the blocking time, waiting time in buffer and setup time. Constraint (11) imposes that the starting time of a job at the first stage must exceed the release time. Constraint (12) determines the conditions of all decision variables.

Cooperative adaptive genetic algorithm

The GA is one of evolutionary algorithms well-known for tackling multi-class complex HFSPs [13, 14]. That is because of its simple structure as well as features such as short computational time, high search ability, strong robustness, and extendibility to other problems [31]. Inspired by the strength of GA in dealing with related problems, this paper proposes a novel cooperative adaptive genetic algorithm (i.e., CAGA) for the RHFSP-SDST&LB.

The framework of the proposed CAGA

The proposed CAGA has the following main parts: encoding, population initialization, decoding, cooperative evolutionary procedure, adaptive adjustment strategy, and reset method.

In the algorithm, a dual chain coding scheme with job sequence chain and machine selection chain is firstly designed for encoding. It should be noted that the initial population quality often influences convergence speed of algorithm. Under normal circumstances, the initialization operation should can generate a diverse population. Thus, two initialization methods including the heuristic rules based on job sequence (HR-JS) and dynamic adjustment based on machine assignment (DA-MA) is then introduced to enhance the quality of the randomly produced initial population. After which, a staged-hierarchical decoding strategy is designed to acquire the solutions. On the other hand, most traditional algorithms including the GA, are easy to converging to a local optimum as the size of the problem increases [53, 54]. How to obtain an effective balance between exploration (global search) and exploitation (local search) is a critical issue for designing an optimization algorithm [24]. In scheduling domain, many heuristics or meta-heuristics are used to handle this issue, such as ACS algorithm and GH [50, 55,56,57]. In our CAGA, some methods are introduced to balance the global and local search capabilities of CAGA, which involves two crossover and mutation operators with cooperative mechanism, an adaptive adjustment strategy using ACS and modified GH (MGH), and a reset method with dynamic variable step size factor. More specifically, we adopt two cooperative crossover and mutation operators to generate new individual for enhancing the local search efficiency. We intelligently switch the ACS and MGH for achieving the global search ability and avoid falling into prematurity after mutation. If the quality of population has not to be enhanced within the limit generations, we will introduce the reset strategy for non-improved solutions to guarantee the population diversity. The complete framework of the proposed CAGA is outlined in Fig. 5, where IT, Imax, α and E indicate the iteration number, maximum iteration number, step control factor and target value, respectively.

Fig. 5
figure 5

Flowchart of the proposed CAGA

Encoding

Due to the complex nature of RHFSP, most of the studies suggest decomposing it into two decision problems: jobs-permutation and machine allocation. Figure 6 provides the processing machine flow for job i reentered 2 times in 3 production stages where Mgh refers to the machine number assigned to the job at stage g of layer h.

Fig. 6
figure 6

Processing machine flow

From the above description, a dual chain coding scheme which consists of job sequence chain and machine selection chain is designed. The job sequence chain is an existing vector π = [π[1], π[2],…, π[i],…, π[N]], in which π[i] denotes the job arranged at the i-th position. The machine selection chain is the two-dimensional matrix cell group of randomly chosen machine vectors at all stages ω ∈ {ω[1], ω[2]…, ω[i],…, ω[N]} where ω[i] denotes the machine matrix of the i-th job at all stages of all lays. To this end, a solution consists of π and ω is formed, namely Δ = [π, ω], which is defined as follows:

$$ \Delta = = \left( {\pi ,\omega } \right) = \overbrace {{\left[ {\pi_{\left[ 1 \right]} ,\pi_{\left[ 2 \right]} ,...,\pi_{\left[ N \right]} } \right]}}^{{{\text{Job sequence chain }}\pi }} + \overbrace {{\underbrace {{\left[ {\begin{array}{*{20}c} {M_{{{11}}}^{1} } & \cdots & {M_{1H}^{1} } \\ {M_{{{21}}}^{1} } & \cdots & {M_{2H}^{1} } \\ \vdots & \vdots & \vdots \\ {M_{G1}^{1} } & \cdots & {M_{GH}^{1} } \\ \end{array} } \right]}}_{{{\text{For stage 1 to }}G{\text{ of job }}\left[ 1 \right]\left( {\omega^{\left[ 1 \right]} } \right)}} \cdots \underbrace {{\left[ {\begin{array}{*{20}c} {M_{{{11}}}^{i} } & \cdots & {M_{1H}^{i} } \\ {M_{{{21}}}^{i} } & \cdots & {M_{2H}^{i} } \\ \vdots & \vdots & \vdots \\ {M_{G1}^{i} } & \cdots & {M_{GH}^{i} } \\ \end{array} } \right]}}_{{{\text{For stage 1 to }}G{\text{ of job }}\left[ i \right]\left( {\omega^{\left[ i \right]} } \right)}} \cdots \underbrace {{\left[ {\begin{array}{*{20}c} {M_{{{11}}}^{N} } & \cdots & {M_{1H}^{N} } \\ {M_{{{21}}}^{N} } & \cdots & {M_{2H}^{N} } \\ \vdots & \vdots & \vdots \\ {M_{G1}^{N} } & \cdots & {M_{GH}^{N} } \\ \end{array} } \right]}}_{{{\text{For stage 1 to }}G{\text{ of job }}\left[ N \right]\left( {\omega^{\left[ N \right]} } \right)}}}}^{{{\text{Machine selection chain }}\omega }} $$

where \({M}_{gh}^{i}\) denotes the machine number of the i-th job arranged at stage g of level h.

Population initialization

The initial population has a great impact on the quality of the final solution and search efficiency. A random procedure is commonly used to determine the population, which is simple and easy to understand. However, it ignores the large fluctuations of the processing time on the results. In this paper, according to the characteristics of chromosome, two initialization schemes including HR-JS and DA-MA, are designed to refine the two parts of chromosome, respectively.

Initialization scheme using HR-JS

For the part of job sequence chain, we introduce six dispatch rules: SPT, NEH heuristic [22], MJSN [13], FIFO, BWT, and random rule (RR). These approaches are described below:

  1. (a)

    Shortest processing time (SPT): the job with the shortest processing time is served first. The detailed steps are:

  2. (b)

    Calculate the average processing time of job i at stage g of layer h (see Eq. (13)).

    $$ \overline{Pt}_{igh} = \frac{{\mathop \sum \nolimits_{m = 1}^{{UM_{g} }} Pt_{igmh} }}{{UM_{g} }},\forall i,g,h $$
    (13)
  3. (c)

    With Eq. (13), calculate the sum of processing times of job i at all stages of all reentrant layers, given by Eq. (14).

    $$ SPt_{i} = \mathop \sum \limits_{h = 1}^{H} \mathop \sum \limits_{g = 1}^{G} \overline{Pt}_{igh} ,\forall i $$
    (14)
  4. (d)

    Sort the jobs in increasing order of the total processing time, whereby the job sequence part of the chromosome is obtained.

    We use a job sequence with 12–13–1–3–6–15–4–7– 9–2–6–14–10–5–8–11 to demonstrate the SPT. The corresponding total processing time is 71– 65–82–55–42–99–33–45–62–93–51–89–79–21–68. Then, according to the SPT rules, the processing sequence of the job is adjusted to 8(21)–4(33)–6(42)–7(45)–14(51)–3(55)–9(62)–13(65) –11(68)–12(71)–5(79)–1(82)–10(89)–6(93)–15(99).

  5. (e)

    NEH heuristic: the NEH heuristic is well-known for solving shop scheduling problem to define job processing sequence. In general, the NEH heuristic contains three major steps:

  6. (f)

    The total processing times of all jobs are calculated by Eqs. (13) and (14) and sorted according to a non-increasing order. A new job sequence π' ∈ {π'[1], π'[2],…, π'[i],…, π'[N]} is then obtained.

  7. (g)

    Find the first two jobs π'[1] and π'[2] from the above order and arrange them in all possible ways. Pick out a permutation of the two jobs with the minimum weighted completion time as the current partial sequence.

  8. (h)

    Sequentially select the j-th job (j = 3, 4, …, N) from the π' and insert it to all possible positions of the partial sequence. Take the permutation with the minimum total weighted completion time as the job ordering of the j-th job.

  9. (i)

    Modified gh/2 Johnson's rule (MJSN): the MJSN procedures can be briefly explained as follows:

  10. (j)

    The w (w = G × H) tasks of job i are evenly divided into two groups to creat improved processing times \({P}_{i1}^{1}\) and \({P}_{i2}^{gh}\).

  11. (k)

    Record U = {i|\({P}_{i1}^{1}\)>\({P}_{i2}^{gh}\)} and V = {i|\({P}_{i1}^{1}\)\({P}_{i2}^{gh}\)}. In set U, all jobs are arranged in the increasing order of \({P}_{i1}^{1}\), while \({P}_{i2}^{gh}\) in set V is arranged in the decreasing order. A new sequence is generated by adding V to the end of U.

  12. (l)

    The Johnson scheduling rules is introduced to obtain the job sequence.

    To caculate \(P_{i1}^{1}\) and \(P_{i2}^{gh}\), the average processing time of job i at task τ (τ = w/2) is provided by Eq. (15), where \(P_{i1}^{1}\) and \(P_{i2}^{gh}\) represent the sum of average processing time from task 1 to task w/2 (see Eq. (16)) and total processing time from task w/2 + 1 to the last task (see Eq. (17)), respectively.

    $$ \overline{P}_{i\tau } = \overline{{{{Pt}}}}_{igh} {, }\tau = g + \left( {h{ - 1}} \right){*}G,\forall i \in \overline{N},h \in \overline{H}, $$
    (15)
    $$ P_{i1}^{1} = \mathop \sum \limits_{\tau = 1}^{\delta } \overline{P}_{i\tau } ,\forall i \in \overline{N}, $$
    (16)
    $$ P_{i2}^{gh} = \mathop \sum \limits_{\tau = 1}^{\delta } \overline{P}_{i,w - \tau + 1} ,\forall i \in \overline{N}. $$
    (17)
  13. (m)

    First in first out (FIFO rule): The first arrived job is served first. More specifically, it is sorted in ascending order according to the arrival time of all jobs.

  14. (n)

    BWT rule: the job with biggest weight is served first. More precisely, sort the jobs in the descending order of weight, and assign the jobs to the available machines in turn.

Initialization scheme using DE-MA

The machine selection chain is updated by means of dynamic adjustment. On one hand, with a certain probability, we determine the assigned machine number based on the processing time; on the other hand, the machine number is randomly generated with a certain probability. Subsequently, the bottleneck elimination method is employed to eliminate the machines with a higher scheduling frequency. The specific method is shown in Procedure 1 where PS represents the population size and the job sequence chain is generated according to "Initialization scheme using HR-JS".

figure a

Decoding

The chromosome determines the job processing order and machine allocation, thus we can record the job sequence πgmh (πgmh ∈ {πg[1], πg[2],…, πg[I]}) and the total number of jobs I arranged on machine m at stage g of level h, in which πg[2] refers to the second job processed on the machine m at stage g of layer h. For simplicity, we assume that the setup time Tsijgmh is dependent on jobs and irrelevant to machines and referred to as Ts'ij. The num (i, g, M) denotes the accumulated job number when the job i starts processing on machine M at stage g. To obtain the scheduling solution for all jobs on each machine, a staged-hierarchical decoding approach is designed as follows.

  1. 1.

    At stage 1 of layer 1, we first need to identify the accumulated job number u when the job π1[u] starts processing on the machine m. If u = 1 or the release time is greater than the completion time of previous job π1[u − 1] (u > 1), the starting time of the job is the Rt (π1[u]); otherwise, the starting time of this job is equal to the ending time of the previous job C (π1[u − 1], g, h) plus the setup time Ts'(π1[u − 1], π1[u]).

  2. 2.

    At stage g of layer h (h < H or 1 < g < G), it is necessary to calculate the completion time of the job πg[v] in πgmh at the previous operation and the starting time of this operation. The completion time of this job at the previous process depends on the relationship among the sum of the starting time and processing time at previous stage (Tc), the idle time at the buffer stage (bt), and the ending time C (πg[v 1], g, h) of job v 1 (v > 1). If Tc > C (πg[v 1], g, h), the completion of the previous process for this job is Tc. Additionally, to calculate the completion time at previous operation, we need to further judge the relationship between bt and C (πg[v 1], g, h). If bt ≥ C (πg[v 1], g, h), the completion time at previous operation is C (πg[v 1], g, h); otherwise, the time is Tc. At this time, the idle time (bt) in buffer is updated to C (πg[v 1], g, h).

  3. 3.

    At stage G of layer H, since the job leaves the production system directly after the process is completed, the completion time of this job is the sum of starting time and processing time. The analysis of the completion time at stage G − 1 and starting time at stage G for this job is the same as above.

The computation of RHFSP-SDST&LB is elaborated as follows:

Step 1:

Let i = 1, M = ω (i, g, h), bt = 0.

Step 2:

If g = 1, h = 1, go to Step 2.1; if 1 < g ≤ G, go to Step 2.2; otherwise, perform Step 2.3.

Step 2.1:

St (π(i), g, h, M) = Rt (π (i)).

Step 2.2:

St (π (i), g, h, M) = C (π (i), g − 1, h).

Step 2.3:

St (π (i), g, h, M) = C (π (i), G, h − 1).

C (π (i), g, h) = St (π (i), g, h, M) + Pt (π (i), g, h, M).

Step 3:

Update i = i + 1, j = num (i, g, M).

Step 4:

For g = 1, h = 1, if j = 1, perform Step 2.1; otherwise, go to Step 4.1.

Step 4.1:

Let λ1 = C (πg[j − 1], g, h).

St (π (i), g, h, M) = Ts' (πg[j − 1], πg[j]) + max (λ1, Rt (π (i))), λ1 ≥ Rt (π (i)).

St (π (i), g, h, M) = max (λ1, Rt (π (i))), λ1 < Rt (π (i)).

Step 5:

For 1 < g ≤ G, if j = 1, perform Step 2.2; otherwise, go to Step 5.1.

Step 5.1:

Let λ2 = C (π (i), g − 1, h).

St (π (i), g, h, M) = Ts' (πg[j − 1], πg[j]) + max(λ1, λ2), λ1 ≥ λ2.

St (π (i), g, h, M) = max (λ1, λ2), λ1 < λ2.

Step 6:

For 1 < h ≤ H, g = 1, if j = 1, perform Step 2.3; otherwise, go to Step 6.1.

Step 6.1:

Let λ3 = C (π (i), G, h − 1).

St (π (i), g, h, M) = Ts'(πg[j − 1], πg[j]) + max(λ1, λ3), λ1 ≥ λ3.

St (π (i), g, h, M) = max (λ1, λ3), λ1 < λ3.

Step 7:

Let Tc = St (π (i), g, h, M) + Pt (π (i), g, h, M), M1 = ω (i, 1, h + 1), M2 = ω (i, g + 1, h).

Step 8:

For 1 ≤ h < H, g = G, let J1 = num (i, 1, M1). If J1 = 1, go to Step 8.1; otherwise, let λ4=C (πg[J1 − 1], 1, h + 1), for Tc ≥ λ4 or Tc < λ4 ≤ bt01, perform Step 8.2, for Tc < bt01 < λ4, perform Step 8.3.

Step 8.1:

C (π (i), g, h) = Tc.

Step 8.2:

C (π (i), g, h) = max{Tc, λ4}.

Step 8.3:

C (π (i), g, h) = bt01, bt01 = λ4.

Step 9:

For 1 ≤ g < G, let J2 = num (i, g + 1, M2). If J2 = 1, C (π (i), g, h) = Tc; otherwise, let λ5=C(πg[J2 − 1], g + 1, h), for Tc ≥ λ5 or Tc < λ5 ≤ btg,g+1, go to Step 9.2, for Tc < btg,g+1 < λ5, go to Step 9.3.

Step 9.1:

C (π (i), g, h) = Tc.

Step 9.2:

C (π (i), g, h) = max{Tc, λ5}.

Step 9.3:

C (π (i), g, h) = btg,g+1, btg,g+1 = λ5.

Step 10:

For h = H, g = G, C(π (i), g, h) = Tc.

Step 11:

If i > N, stop and output St and C; otherwise, go back to Step 3.

To clearly illustrate the above decoding scheme, we use the chromosome with 5 jobs, 2 stages and 2 layers in Fig. 7 as an example. It can be observed from this figure that the five jobs are routed in the sequence 3 → 1 → 5 → 4 → 2. In initial and re-entrant layers of stage 1, the job 3 is arranged on machine 1 and machine 2, respectively. At stage 2 of all layers, the job 3 is processed on machine 3 and machine 2, respectively. The machine assignments of other four jobs are similarly defined.

Fig. 7
figure 7

An encoding example with five jobs

The release times and weight of jobs are provided by Rti = [5, 3, 1, 6, 4] and Wi = [3, 6, 5, 8, 4]. The buffer capacity is 1. The setup time Ts'ij and the processing time Ptigmh are shown below.

figure b
figure c

The schedule Ʊ = {Oigh, Mm, Sigh-Cigh} of the above example is detailed as follows: Ʊ = {(O311, M1, 1–4), (O321, M3, 4–28), (O312, M2, 28–30), (O322, M2, 30–39), (O111, M2, 37–45), (O121, M2, 45–49), (O112, M1, 49–62), (O122, M3, 62–67), (O511, M1, 66–75), (O521, M1, 75–91), (O512, M1, 91–103), (O522, M3, 103–117), (O411, M2, 51–66), (O421, M2, 66–71), (O412, M3, 71–75), (O422, M1, 94–107), (O211, M3, 78–83), (O221, M3, 120–127), (O212, M1, 127–139), (O222, M3, 139–156)} where Oigh denotes the operation of job i on machine m at stage g of layer h. The corresponding starting time Sigh and completion time Cigh will be computed and denoted for each operation. The scheduling Gantt chart is displayed in Fig. 8.

Fig. 8
figure 8

A Gantt chart example

Cooperative evolutionary procedure

Fitness evaluation and selection

Since the objective is to optimize the total weighted completion time, the reciprocal of the target value is applied to define the fitness function. Consequently, the fitness function of the n-th individual is estimated with the following formula:

$$ {\text{eval}}\left( {\Delta_{n} } \right) = 1/\mathop \sum \limits_{i = 1}^{N} W_{i} C_{iGH} {, }n = 1{, 2, } \ldots {{, PS}}{.} $$
(18)

Based on the Eq. (18), the selection probability pn and cumulative probability qn of the n-th individual are firstly calculated by Eqs. (1920). Next, a rand number rand from the range [0,1] is generated. Finally, compare the cumulative probability of a chromosome with the rand value. If rand ≤ q1, the first chromosome Δ1 is chosen; if qn − 1 < rand < qn, select the n-th chromosome Δn (2 ≤ n ≤ PS).

$$ p_{n} = {\text{eval}}\left( {\Delta_{n} } \right)/\mathop \sum \limits_{n = 1}^{{{\text{PS}}}} {\text{eval}}\left( {\Delta_{n} } \right), $$
(19)
$$ q_{n} = \mathop \sum \limits_{k = 1}^{n} p_{k} { ,}n = 1{, 2,} \ldots {{, PS}}. $$
(20)

Crossover

To explore more promising area, this crossover phase adopts two crossover approaches, i.e., collaborative single-point crossover (Co-SPX) and collaborative partial matching crossover (Co-PMX) based on job sequence chain and machine selection chain. The crossover method is selected according to a random number η, which is randomly generated from 0 and 1. If η = 0, the Co-SPX is executed, as illustrated in Fig. 9a; otherwise, perform the Co-PMX, as shown in Fig. 9b. The detailed steps of two approaches are described as follows:

Fig. 9
figure 9

Crossover approaches with collaborative mechanism

  1. 1.

    Co-SPX approach

    Step 1: Two chromosomes Δ1 and Δ2 are randomly chosen from the population. Simultaneously, a random crossover point ε is generated (1 ≤ ε < N).

    Step 2: Exchange all columns from 1 to ε in chromosomes Δ1 and Δ2 to produce two new chromosomes L1 and L2.

    Step 3: Check the job number in the part of job sequence chain which emerges repeatedly for the illegal chromosomes L1 and L2.

    Step 4: Delete and move all repeated columns in L1 (L2) to all repeated columns in L2 (L1). Hence, two feasible offspring are formed.

  2. 2.

    Co-PMX approach

    Step 1: Randomly generate two different crossover point ε and ζ (1 ≤ ε < ζ < N) in two randomly selected parents Δ1 and Δ2 from the current population.

    Step 2: Exchange the gene fragments between ε and ζ in chromosomes Δ1 and Δ2 to generate two chromosomes L1 and L2.

    Step 3: Check the duplicate job numbers in the part of the job sequence chain for L1 and L2.

    Step 4: Delete the duplicate part of job sequence chain in L1 (L2) and the corresponding machine number, and place its gene fragment on chromosome L2 (L1) to form two feasible offspring L3 and L4.

Mutation

In this study, each chromosome consists of job sequence chain and machine selection chain. Hence, two mutation operators from the two aspects are designed here: single column mutation and column swap mutation. Specifically, the first mutation method can help explore more possible machine assignment, while the second method can make changes for the job sequence. The two mutation operators are presented as follows:

  1. 1.

    Single column mutation. Randomly select a job position ε (0 < ε ≤ N) and re-arrange their correspondingly machine numbers in the initial and re-entry layer, as shown in Fig. 10.

  2. 2.

    Column swap mutation. Two randomly selected positions (ε and ζ) are chosen and the corresponding jobs and machine numbers are exchanged, as shown in Fig. 11.

Fig. 10
figure 10

Single column mutation

Fig. 11
figure 11

Column swap mutation

Adaptive adjustment strategy

To balance the exploration and exploration and avoid premature of GA, an adaptive adjustment strategy is implemented for some better solutions after mutation by hybridizing step control factor α with two heuristic algorithms. More specifically, an array {Δn} is firstly constructed in the non-increasing order of total weighted completion time for solutions obtained by the above GA. Then, the first 20% of {Δn} are selected to perform ACS or MGH, where ACS algorithm is adopted to search for global optimal solution in the pre-iterative phase (0 < IT < α × Imax); MGH is used to interfere with population in the late-iterative phase (IT ≥ α × Imax). Finally, if the population is not being improved within the certain iterative generations, the reset method will be implemented. That means that some individuals will be regenerated to guarantee the diversity of population.

Ant colony search algorithm

ACS algorithm is a method that imitates the food-finding behavior of an ant and search for the shortest path through the change of pheromones. It has been successfully employed for tackling many complex combinatorial optimization problems [14]. In solving the shop scheduling problem, each job is viewed as a node and two different nodes is used to construct artificial paths, thus the sequence of jobs forms a directed graph. The ACS procedure is explained as follows.

Step 1: Solution construction.

At each generation, an ant path denoting the job sequence of a two-dimensional matrix chromosome is generated as a candidate solution.

Step 2: Parameter initialization.

Initialize all related parameters.

Step 3: Calculate transfer probability.

Pij(n) is the probability which ant n selects to move from node i to node j based on a stochastic mechanism, estimated by Eqs. (21) and (22). The roulette wheel rule is applied to choose next job, and update tabu list until all jobs are selected.

$$ j = \left\{ {\begin{array}{*{20}c} {{\text{arg}}\mathop {{\text{max}}}\limits_{{l \notin \pi^{n} \left( i \right)}} \left\{ {\tau_{il} } \right\}{,}} & {{\text{if }}r \le r_{0} } \\ {S,} & {{\text{otherwise}}} \\ \end{array} } \right., $$
(21)
$$ P_{ij} \left( n \right) = \left\{ {\begin{array}{*{20}c} {\frac{{\tau_{ij} }}{{\mathop \sum \nolimits_{{l \notin \pi^{n} \left( i \right)}} \tau_{il} }}{,}} & {{\text{if }}j \in \pi^{n} \left( i \right)} \\ {0,} & { {\text{otherwise}}} \\ \end{array} } \right., $$
(22)

where τil represents the pheromone trails of job l placed at i-th position of the job sequence; r and r0 respectively indicate a random number and a pre-specified parameter from the range of [0, 1], and jobs already selected by an ant are stored in the ants working memory πn(i) and are not considered for selection.

Step 4: Calculate the pheromone trail level.

The pheromone volume left of ant n from node i to node j is calculated by Eq. (23), where Q denotes a constant.

$$ \Delta \tau_{ij}^{n} = \left\{ {\begin{array}{*{20}c} {1/\mathop \sum \limits_{s = 1}^{N} W_{s} C_{sGH} ,} & {{\text{if }}\left( {i{, }j} \right) \in {\text{best solution}}} \\ {0,} & { {\text{otherwise}}} \\ \end{array} } \right.. $$
(23)

Step 5: Repeat Step 3 and Step 4. In other words, calculate the transfer probability and the pheromone trail level of all ants.

Step 6: Update pheromone.

To memorize the optimal solution, after all ants construct feasible individuals, the pheromone increment Δτij updating and pheromone trails τij updating are accomplished according to the following Eqs. (2425), respectively.

$$ \Delta \tau_{ij} = \mathop \sum \limits_{n = 1}^{{{\text{As}}}} \Delta \tau_{ij}^{n} , $$
(24)
$$ \tau_{ij} \left( {nc + \Delta nc} \right) = \left( {1 - \rho } \right) \times \tau_{ij} \left( {nc} \right) + \Delta \tau_{ij} , $$
(25)

where ρ is pheromone evaporation rate (0 < ρ < 1).

Step 7: Let ncnc + 1, record and update the optimal individual in this iteration. Simultaneously, empty the tabu list.

Step 8: If the individual is not improved within a certain number of iteration or nc > NCmax, output the optimal individual; otherwise, return to Step 3.

Modified greedy heuristic

In the original GH, the heuristic information using destruction and reconstruction is defined as the adjustment of job permutation [50]. Considering the problem characteristic, we propose a MGH method to explore more promising neighborhoods. With the MGH, the heuristic information is transformed to the behavior of simultaneously optimization of job sequence chain and machine selection chain. The MGH can be briefly described as follows.

Step 1: The optimal solution delivered by GA is taken as the current solution. The destruction operation is carried out on job sequence chain π in the current solution so as to form two subsequences π1 and π2.

Step 2: Find the subset of machine selection chain ω corresponding to all jobs in this sequence, using subsequence π1 as a reference. Perform destruction and reconstruction operations on them in turn and choose the optimal insertion position reconstructed in the current solutions. π1 is updated to πnew.

Step 3: Reconstruct πnew and π2, then the new chromosome is generated.

To describe the above steps more clearly, the MGH procedure is illustrated in Fig. 12. Based on the operation process, we observe that the MGH conducts collaborative optimization for a specific machine assignment and its job sequence during each generation of evolution.

Fig. 12
figure 12

The MGH procedure

Computational complexity

The main computational cost of proposed CAGA involves the population initialization, decoding, cooperative evolutionary procedure, and adaptive adjustment strategy.

For the population initialization, the DE-MA requires more computations compared with the initialization scheme using HR-JS. Hence, the overall computational complexity of population initialization is O(PS × N × G × H). Regarding the cooperative evolutionary procedure, O(PS × N × Imax) computations are required to decoding each solution in selection phase. The crossover operation requires O(0.5 × PS × N × Imax) computations, and then O(PS × N × Imax) computations are required to check and revise the job sequence. The mutation phase requires O(PS × N × Imax) computations. The overall complexity of cooperative evolutionary procedure is O(PS × N × Imax). In the adaptive adjustment strategy, we introduce the ACS algorithm and MGH method, where O(20% × PS × N × α × Imax) computations are required to conduct the ACS algorithm, and the MGH requires O(20% × PS × N × (1 − α) × Imax) computations. Therefore, the overall computational complexity for the adaptive adjustment strategy is O(20% × PS × N × max(α, (1 − α)) × Imax).

Other strategies acquire lower computational cost. Note that, in this article, we have Imax > G × H. Thus, the overall worst-case complexity of CAGA is O(PS × N × Imax).

Experiment and result analysis

This section is devoted to testing the performance of the presented CAGA for tackling RHFSP-SDST&LB. The test instances are firstly described in detail, and then orthogonal test method and Taguchi analysis are carried out to optimize the parameters in the proposed algorithm. Finally, a series of simulation experiments are well-designed and computational results are discussed. The CAGA is coded in MATLAB 2014a and compared with GH, traditional GA (TGA), ACS algorithm, improved GA using two crossover and mutation strategies (IGA). All the five algorithms are run on a personal computer with AMD i5-3210M, 2.50 GHz and 8.00G of RAM.

To facilitate the performance evaluation of the algorithm, we introduce two metrics: the percentage relative error (PRE) and standard deviation (SD).

  1. 1.

    PRE metric. This metric is adopted to compute how far the objective value of a solution is from the best objective value. The lower PRE value implies a better result. The PRE is defined as follows:

    $$ {\text{PRE}} = \frac{{E_{\Phi } - E_{{{\text{best}}}} }}{{E_{{{\text{best}}}} }} \times 100, $$
    (26)

    where EΦ is the objective value collected by a given heuristic algorithm (Φ ∈ {GH, GA, ACS, IGA, CAGA}), Ebest is the best objective value found by all six algorithms.

  2. 2.

    SD metric. The SD metric is used to measure the stability of solutions found by an algorithm. The lower SD value means a higher stability. The SD is computed below:

    $$ {{SD}}_{\varPhi } = \sqrt {\frac{1}{Q}\mathop \sum \limits_{\text q = 1}^{Q} \left( {E_{\varPhi {\text{q}}} - \overline{{E_{\varPhi } }} } \right)^2} $$
    (27)

    where \({SD}_{\Phi }\) and \(\overline{{E }_{\Phi }}\) respectively are the standard deviation and average objective value found by heuristic algorithm Φ in the Q independent runs, \({E}_{\Phi q}\) is the objective value collected by the algorithm Φ in the q-th run.

Test instances

Because no relevant studies on RHFSP-SDST&LB exists, test instances are randomly generated. The detailed data are reported as follows:

  1. 1.

    The number of jobs N are 30, 50, 80, 100, 150 and 200.

  2. 2.

    The number of stages G varies at two levels: 2 and 3.

  3. 3.

    The number of cycles H varies at two levels: 2 and 3.

  4. 4.

    The number of machines at all stages UMg is 3 and 4.

  5. 5.

    The buffer sizes Vg between consecutive stages are 0, 1 and 5.

  6. 6.

    The processing times Ptigmh are generated uniformly from [1, 25].

  7. 7.

    The setup times Tsijgmh are generated from a uniform distribution of the interval [1, 8].

  8. 8.

    The release times of job Rti are uniformly distributed in the interval [1, 6].

  9. 9.

    The weights Wi are generated from discrete uniform distribution U [1, 10].

Parameter tuning

In the developed CAGA algorithm, five main parameters are involved: crossover probability CP, mutation probability MP, reset number F, step control factor α, pheromone evaporation rate ρ. They are optimized through Taguchi method of design of experiment (DOE) [58, 59] where each parameter is viewed as a factor changing at five different levels (see Table 2). The data of the selected instance are N = 20, G = 2, H = 2, UMg = 3 and Vg = 1. We carry out 52 = 25 orthogonal experiments with different groups of parameters. The computational results are listed in Table 3 where Avg denotes the average objective values of 20 independent runs for each group (except the last row).

Table 2 Levels of the parameters for CAGA
Table 3 Orthogonal table and average statistics value

To analyze the effects of these parameters on the performance of the proposed CAGA, the response values of each parameter in CAGA are shown in Table 4. It can be clearly observed from this table that MP is the most important parameter (i.e., Rank = 1), CP, F and α followed, and ρ the worst. The result indicates that MP has a huge impact and plays the major role on the performance of CAGA compared with other four parameters.

Table 4 Response values of each parameter of CAGA

Figure 13 intuitively exhibits the trend of factor levels on the proposed algorithm. According to Fig. 13, we see the fourth factor level (i.e., 0.7) has least objective value in terms of CP. Hence, the parameter CP is set to 0.7. The other four parameters are similarly defined which can be observed from Fig. 13. Based on the above analysis, the parameter values for proposed CAGA are provided in Table 5.

Fig. 13
figure 13

The trend of factor levels

Table 5 The parameter values for the proposed CAGA

For a fair comparison, the crossover probability in TGA and IGA is also set to 0.7 and the mutation probability is 0.9. The population size of all GA-based algorithms is set to 80. For all algorithms, the maximum iteration number is 100 and the maximum computational time 500 s are used as the termination criterion.

Effect of the main components for the CAGA

In the CAGA, the main components contain the novel population initialization using HR-JS and DA-MA, two novel crossover and mutation operators with collaborative mechanism, adaptive adjustment strategy, and reset method. To analyze the effect of the four components, this section compares the CAGA with its four variants, including the CAGA without HR-JS and DA-MA (referred to as CAGA-1), the CAGA with one crossover and mutation operators (referred to as CAGA-2), the CAGA without adaptive adjustment strategy (referred to as CAGA-3), and the CAGA without reset method (referred to as CAGA-4). The CAGA and its four variants are applied to solve the problems with N = {30, 50, 80, 100, 150, 200}, UMg = 3, G = 2, H = 2, Vg = 1. The results of these examples are demonstrated in Table 6 where APRE represents the average PRE (APRE) of 20 independently runs of every algorithm (except the last row). The best mean result of each problem is highlighted in boldface.

Table 6 Computation results of the CAGA and its four variants

From the Table 6, we observe that the average objective values obtained by the CAGA-1, CAGA-2, CAGA-3, CAGA-4 and CAGA are 237,287.2, 225,049.1, 229,469.2, 230,092.2 and 218,818.3, respectively. On average, the CAGA yields the least APRE value (i.e., 1.92%), which is smaller than the CAGA-1, CAGA-2, CAGA-3 and CAGA-4. It means that the above four components can enhance the performance of our CAGA. Specifically, in terms of the average objective values and APRE values of four variants, the CAGA-2 performs the best, CAGA-4 and CAGA-3 followed, and CAGA-1 the worst. It implies that the population initialization strategy using HR-JS and DA-MA has a huge impact on the performance of CAGA compared to other three methods. Therefore, we can conclude the population initialization with the HR-JS and DA-MA is the most significant component in the promising performance during the whole search process of CAGA.

Performance comparison with other algorithms

To verify the quality of the proposed CAGA, we compare the performance of CAGA with four well-known algorithms including GH, TGA, ACS and IGA. The stopping criterion of four competitive algorithm is set as the utilized computational time of our CAGA under the restriction of 100 iterations. Considering the importance of time, the maximum running time of all the algorithms is limited to 500 s. All the cited parameters {N, G, H, UMg, Vg} mentioned above results in a total of 6 × 2 × 2 × 2 × 3 = 144 different combinations. For each combination, ten replicates are generated and solved. Thus, there are 1440 small and medium scale instances and 1440 large scale instances to run all algorithms.

Experimental test for small and medium scale problems

In this section, the above five algorithms are tested and compared for small and medium scale problems with N = {30, 50, 80}, G = {2, 4}, H = {2, 3}, UMg = {2, 3} and Vg = {0, 1, 5}. The-state-of-art computational results under three buffer sizes (i.e., 0, 1, and 5) are, respectively, listed in Tables 7, 8, 9 where CPU is the average computational times of 20 runs for each algorithm (except the last row), the bold in each combination implies the best results.

Table 7 Testing results of small and medium scale problems when Vg = 0
Table 8 Testing results of small and medium scale problems when Vg = 1
Table 9 Testing results of small and medium scale problems when Vg = 5

The following observations can be found from Tables 7, 8, 9.

  1. 1.

    According to Table 7, the average objective values of five algorithms in Vg ∈ 0 are 119,336.5, 95,333.1, 128,743.9, 94,721.6 and 92,227.1, respectively. Meanwhile, the average PRE (APRE) are 33.00%, 4.90%, 43.54%, 3.61% and 2.34%, respectively, within the average computational time of 75.09 s, obtained by GH, GA, ACS, IGA, and CAGA. In this case, the proposed CAGA show better performance on 18 out of 24 test problems.

  2. 2.

    For the results of small-scale problems in Table 8, the average objective values and APRE calculated by five algorithms in Vg ∈ 1 are 73,426.0, 64,375.2, 75,904.4, 63,380.1, 61,706.0 and 24.06%, 5.41%, 27.89%, 4.53%, 1.47%, respectively, within the average CPU time of 84.05 s. In this case, the CAGA finds the best results on 22 out of 24 test problems.

  3. 3.

    For the results in Table 9, within the average CPU time of 95.28 s, the average objective values and APRE collected by five algorithms in Vg ∈ 5, respectively, are 60,801.8, 49,671.2, 62,534.9, 48,385.9, 46,173.4 and 35.15%, 7.69%, 38.71%, 6.47% 1.58%. In this case, the CAGA performs excellently than the other four algorithms except one instance (80 × 4 × 3 × 3) where the CAGA has a slightly worse solution quality than IGA.

Experimental test for large scale problems

This section tests large scale problems with N = {100, 150, 200}, G = {2, 4}, H = {2, 3}, UMg = {2, 3} and Vg = {0, 1, 5}. Tables 10, 11, 12 report the computational results under three buffer sizes. The following conclusions from these table are able to be obtained.

  1. 1.

    Based on Table 10, the APRE of the five algorithms in Vg ∈ 0 are 29.16%, 6.25%, 37.97%, 2.17% and 1.05%, respectively, within the average running time of 224.88 s. The average total weighted completion time of CAGA is 676,410.6 and the solution quality is best. ACS algorithm exhibits worst performance among all algorithms where the average total weighted completion time is 915,842.0. In this case, the CAGA obtains better results on 20 out of 24 test problems.

  2. 2.

    Based on Table 11, within the average CPU times of 253.90 s, the average objective value of CAGA in Vg ∈ 1 is 434,948.0, which is the best one. While the one of ACS algorithm is 515,210.9 that it is worst. The APRE of five algorithms is 16.09%, 6.47%, 20.90%, 2.82% and 0.89%, respectively. In this case, CAGA consistently produces better performance than other algorithms except three combinations (i.e., 100 × 4 × 3 × 2, 200 × 4 × 3 × 2, and 200 × 4 × 2 × 3) where the solution quality of the proposed algorithm is slightly worse than IGA.

  3. 3.

    For the testing results of large-scale problems in Table 12, within the average calculation time of 281.55 s, the APRE yielded by the five algorithms are 32.09%, 11.62%, 35.68%, 5.17%, and 0.33%, respectively. The proposed CAGA performs better than other competitors consistently for all instances in this case except one combination (200 × 4 × 3 × 3) where the average objective value obtained by the CAGA is slightly worse than that of IGA.

Table 10 Testing results of large scale problems when Vg = 0
Table 11 Testing results of large scale problems when Vg = 1
Table 12 Testing results of large scale problems when Vg = 5

Based on the discussed above, we can summarize that the proposed CAGA exhibits superior searching performances within reasonable running time with respect to solution quality. Precisely, the CAGA has the least APRE 1.80% and 0.76% respectively for small-medium sized problems and large sized problems, which are smaller than the APRE of the GH, TGA, ACS and IGA, 30.74% and 25.78%, 6.00% and 8.12%, 36.72% and 31.52%, 5.20% and 3.39%.

Table 13 presents the average SD values collected by five algorithms under different scale problems. From this table, we can observe that the average SD values of the five algorithms are 15,598.2, 14,521.9, 16,157.4, 13,692.5 and 13,114.9, respectively. The proposed CAGA performs better stability since it obtains the least average SD value (i.e., 13,114.9) among all algorithms; ACS shows worst stability performance where the average standard deviation value is 16,157.4. To be specific, the CAGA performs the best, IGA, TGA and GH followed, and ACS the worst. CAGA outperformed the other four peers on 13 out of 18 test problems in terms of stability.

Table 13 Average SD values collected by five algorithms

Although the CAGA performs slightly worse than the TGA and IGA for several sized problems in terms of solution quality and stability performance, the performance difference among these algorithms is very small. The CAGA exhibits better performance with the increasing of buffer sizes in the all cases. Moreover, these experimental results are generated under the premise of the same running time. If the same number of iterations (100 iterations) is used as the stopping criteria of all algorithms, the objective values found by CAGA are superior to those of TGA and IGA. This conclusion can also be found from Fig. 18 below.

To check the statistical significance of the experimental results, the analysis of variance (ANOVA) method is conducted to test the difference among five compared algorithms. The computational results support the main hypotheses of the ANOVA. Figure 14 reports the means plot of APRE and SD with 95% confidence level for all five algorithms. It can be clearly observed from the figure that the intervals achieved by the CAGA algorithm differ significantly from the other algorithms. In addition, the APRE and SD achieved by the proposed CAGA are smaller than other four algorithms for different scale problems especially in large scale problem, which poses our algorithm significantly outperform the other four peers and it provides better stability.

Fig. 14
figure 14

Means plot of APRE and SD with 95% confidence level for the all five algorithms

Effect analysis under different buffer capacities

To analyze the impact of buffer sizes on algorithm performance, we implement a performance comparison under different buffer capacities based on the experiment results in Tables 7, 8, 9, 10, 11, 12. The comparison results are reported in Figs. 15, 16, 17.

  1. 1.

    Figure 15 depicts the trend of APRE for five algorithms under different scale problems. It can be seen from this figure: (a) The APRE obtained by CAGA are significantly smaller than those of GH, TGA, ACS and IGA under any buffer sizes. This result implies our algorithm is valuable for improving efficiency and capacity of HFSP under all three buffer capacities and is consistent with the previous result analysis. (b) As the problem scale increases, the APRE values obtained by TGA have upward trend. That is to say, the TGA is easy to fall into premature in solving large-scale problems. Meanwhile, the APRE values of IGA become smaller. This means that the IGA has superior performance than TGD when applied to solve large-scale problems. It can be further concluded the introduction of the crossover and mutation strategies designed in this paper can help TGA avoid being trapped into local optimization. (c) The APRE of ACS and GH shows a downward trend. That indicates the two algorithms are more suitable for solving large-scale problems. (d) The APRE of CAGA is close to zero when the number of jobs increases, which proves that our CAGA has excellent efficiency in solving large-scale problems.

  2. 2.

    Figure 16 shows the trend of the objective values obtained by CAGA for solving different scale problems under three buffer capacities. It is clear from this figure that the objective values with large buffer sizes is no more than that with small buffer sizes. That indicates the buffer capacity plays an important role in the production industry. Meanwhile, we also observe that the target values can be obviously reduced when the buffer capacity from 0 increases to 1, while it is not very distinguished with increasing the buffer sizes to larger sizes. Namely, it does not exhibit a proportional relationship between the growth of buffer capacity and the improvement of the objective value, the conclusion is consistent with the literature [60]. Hence, it is of practical significance to analyze the relationship between the completion time and cost of buffer in the shop production.

  3. 3.

    Figure 17 illustrates the box plots of APRE obtained by CAGA under three buffer capacities. In (a), when job quantity is 30, the median APRE of 3.54, 2.24, and 3.90 are obtained by CAGA under three buffer capacities (Vg = 0, 1, 5). The result indicates that our algorithm has best solution performance under the buffer capacity of 1. From (b), the corresponding median APRI values are 2.19, 0.99 and 0.63 when the job quantity is set as 50. In this case, the performance of the CAGA when Vg = 5 performs best among the three buffer types. We can observe from (c) and (d) that, the corresponding median values are 0.81, 1.17, 0.32 and 0.94, 0.90, 0.12, respectively, when the number of jobs N is set to 80 and 100. It is obvious that CAGA is very effective in solving the problems under the buffer sizes of 5. In (e) and (f), the median values of CAGA are 1.07, 0.60, 0.00 and 0.65, 0.89, 0.04, respectively, for the two problems with N = 80 and N = 100. We can conclude from the two plots that the CAGA with the buffer size of 5 has the least APRE and the solution quality is best, while CAGA shows the poor performance under the buffer capacity of 0. To sum up, the APRE values of CAGA show a downward trend as the buffer capacity increases. This suggests that our CAGA has widespread and outstanding capacity to solve the scheduling problem with finite buffers in this study.

Fig. 15
figure 15

The trend of APRE for five algorithms under different scale problems

Fig. 16
figure 16

The trend of objective values obtained by CAGA under three buffer capacities

Fig. 17
figure 17

Box plots of APRE obtained by CAGA under three buffer capacities

Evolution of the CAGA algorithm

To study the relationship between solution quality and iteration number, we run the above five algorithms (i.e., GH, TGA, ACS, IGA, and CAGA) under the limitation of 100 generations. Take six instances with N = {30, 50, 80, 100, 150, 200}, G = 2, H = 2, UMg = 3, Vg = 1 as examples, the convergence trend of objective values is illustrated in Fig. 18. From the figure, it can be clearly observed that our CAGA can obtain initial population with higher quality and optimal objective values compared to other algorithms.

Fig. 18
figure 18

The convergence trend of objective values for different problem sizes

Further discussions

The above analysis implies that our CAGA outperforms other four algorithms (i.e., GH, TGA, ACS and IGA) in terms of solution quality especially for large scale problems. The main reasons for superiority of CAGA can be attributed as follows:

  1. 1.

    The HR-JS and DA-MA approaches are utilized for population initialization. These approaches significantly enhance the quality of initial scheduling solutions and the diversity of population, which also help to improve the convergence speed of the CAGA to a certain extent.

  2. 2.

    Multiple collaborative crossover and mutation operations are incorporated into original GA. Therefore, our algorithms can explore more possible areas, which contributes to improving solution diversity and search efficiency of the suggested algorithm.

  3. 3.

    An adaptive adjustment strategy by switching the execution of ACS and GH is applied to our CAGA. In addition, a reset operation is applied to the algorithm with dynamic variable step strategy. Both the two strategies can help to balance the exploration (whole search) and exploitation (local search) abilities of the algorithm with only a relatively small computing cost.

In sum, our CAGA is very effective in solving the FFSSP-LB&SDJ. However, as suggested by “No Free Lunch” principle means that no algorithm is able to outperform other approaches on all points. The following two aspects should be further investigated to enhance the effectiveness of CAGA: First, the proposed CAGA involves many parameters, which need be calibrated. Consequently, how to reduce some unnecessary parameters is extremely indispensable. Second, since the FFSSP-LB&SDJ utilizes the dual chain decoding method, and the new individuals produced in each iteration of CAGA need be checked and repaired, which makes our algorithm more complicated. It is a meaningful work to develop a simple encoding scheme and illegal individual repair mechanisms.

Conclusions and future works

In this research, the reentrant hybrid flow shop problem with sequence-dependent setup time and limited buffers (RHFSP-SDST&LB) is studied. A mathematical model is firstly proposed for RHFSP-SDST&LB, aiming to optimize the total weighed completion time. To solve this NP-hard problem, we develop a cooperative adaptive genetic algorithm (CAGA) based on three popular meta-heuristic algorithms, including GH, GA, and ACS.

In the algorithm, with a dual-chain coding scheme, an approach of defining initial population using HR-JS and DE-MA is introduced. Next, a staged-hierarchical decoding approach is proposed to decode each solution. After which, to enhance the explore capability of our algorithm, the Co-SPX and Co-PMX crossover as well as two novel mutation methods are imposed on genetic operators. An adaptive adjustment strategy by intelligent switching between ACS and GH is applied to balance global search and local search. Moreover, a dynamic reset strategy is inserted to reconstruct some inferior solutions to jump out of local optima. Finally, the comparison with four competitors demonstrates the effectiveness and efficiency of the CAGA, especially for large-scale problems.

In the future, we plan to design simpler and effective algorithm to solve RHFSP-SDST&LB in the light of the limitations of CAGA as suggested in “Further discussions”. We also intend to extend our CAGA to other scheduling problems, such as the co-scheduling problem of cascaded locks and ship lift at the Three Gorges Dam. In addition, our study lacks real field experiments of modern manufacturing industries, namely, it does not prove the total weighted completion time performance compared with the current models. The main reason behind this is that the RHFSP-SDST&LB is rarely investigated in the existing literature. In the future, we would like to further collect real-world data in the production workshop and build a simulation model to simulate the RHFSP-SDST&LB production system, so that the performance comparison with real scene experiments is able to be conducted.