Introduction

The welding process plays a crucial role in modern manufacturing, spanning various industries such as shipbuilding, aerospace, construction machinery, automotive, and others [1,2,3]. The efficiency of welding operations significantly impacts overall production timelines, with welded structures constituting over 50% of machinery parts [4, 5]. Notably, the welding of large-scale structural components is a critical aspect of manufacturing construction machinery. Enhancing the efficiency of welding shops has become a pressing concern for manufacturing enterprises. However, there is a scarcity of research addressing welding shop scheduling problems (WSP) from an engineering application perspective. Furthermore, as global trade continues to expand, distributed manufacturing has become the prevailing mode in modern manufacturing [6,7,8]. Consequently, different factories exhibit diverse production environments, encompassing variations in machine numbers, types, workers’ skills, and more. Hence, the study of distributed heterogeneous WSP (DHWSP) holds significance [9].

The WSP is an extension of the permutation flow shop scheduling problem [10], taking into account that a job can be processed by multiple welders in parallel [11]. The DHWSP further extends the WSP by considering multiple heterogeneous factories to handle a substantial volume of orders [12, 13]. Mainstream approaches for DHWSP encompass decomposition-based methods [9, 12] and Pareto domination-based methods [1, 13]. However, previous approaches have several notable disadvantages: (1) Global Search: In previous methods for solving energy-aware DHWSP, random elite selection is often employed for crossover during the global search phase. This approach can result in suboptimal solutions learning from other suboptimal solutions, leading to less-than-ideal outcomes. (2) Local Search: Previous methods, such as those mentioned in references [12, 13], apply local search operators either to the offspring or the entire population. This can cause the population to converge toward local optima, limiting the efficient utilization of computational resources. (3) Solver Utilization: Due to the complex nature of the search space in DHWSP, previous algorithms struggle to sufficiently explore the objective space using a single solver. Each solver, whether it’s an evolutionary algorithm as in [12] or a reinforcement learning-based method as in [9], can only explore a portion of the space due to their inherent characteristics. As a result, cooperating with different solvers can combine their search capabilities to thoroughly explore various parts of the objective space, leading to more comprehensive and effective optimization.

The competitive swarm optimizer (CSO), as introduced by Zhang [14], is designed to accelerate the convergence of global search. Unlike traditional particle swarm optimization methods, CSO divides each generation’s population into two distinct groups: winners and losers. In this process, only the winners undergo mutation to improve their own characteristics, while the losers gain valuable knowledge from the selected winners to generate new solutions. However, it is important to note that CSO was initially developed for continuous optimization problems and is not directly applicable to the context of the DHWSP. Nonetheless, the concept of competition-based cooperation among individual solutions, as employed in CSO, can be integrated into the DHWSP algorithm to overcome the limitations of previous global search methods. This adaptation may lead to improvements in solving the DHWSP by harnessing the power of competitive swarm optimization principles.

Therefore, this study proposes a learning-driven cooperative and competitive multi-objective optimizer (LCCMO) to address these challenges for DHWSP. The optimization objectives include makespan and total energy computation (TEC). LCCMO makes the following contributions:

  1. (1)

    Cooperative Optimization Framework: We propose a learning-based cooperative and competitive multi-objective optimizer that encompasses cooperation at multiple levels, including the collaboration of strategies, populations, and solvers.

  2. (2)

    Cooperation of Population: In our approach, we partition the primary population into winner and loser subpopulations within the mating pool. We utilize a competitive and cooperative strategy to expedite the convergence of the global search. Additionally, we manage two distinct solver populations that retain elite solutions, thereby contributing to the overall performance improvement of the final non-dominated solution set.

  3. (3)

    Cooperation of Solvers: We leverage a cooperative approach that combines the strengths of both the reinforcement learning-based method and the evolution-based method. By doing so, we effectively explore distinct regions of the objective space, promoting improved convergence and diversity in our optimization process.

  4. (4)

    Cooperation of Strategies: We incorporate three heuristic rules to collaboratively produce a high-quality population. Additionally, we employ five knowledge-driven local search operators to enhance the convergence of our algorithm. Furthermore, we combine three crossover operators to create larger steps, facilitating a more rapid exploration of the objective space.

LCCMO is systematically compared with six state-of-the-art algorithms using 20 instances from [12, 13] to assess its effectiveness. The numerical experimental results unequivocally demonstrate the significant superiority of LCCMO.

The structure of this paper is organized as follows: Section “Literature review” provides a comprehensive literature review, emphasizing the research gap in the field. In Section “Problem statement and modeling”, we intricately construct the model for DHWSP. Section “Our approach: LCCMO” delves into the specific details of the proposed method, referred to as LCCMO. We present the experimental results in detail in Section “Experimental results”. Finally, the paper concludes in Section “Conclusions”, summarizing the findings and outlining future research directions.

Literature review

Welding shop scheduling

Lu [15] was the first to investigate WSP from a real-world factory perspective, considering controllable processing time, job-dependent transportation times, and sequence-dependent setup times. Furthermore, Lu [16] delved into WSP with dynamic events in a real-world environment, factoring in issues such as jobs with poor quality, machine breakdown, and jobs with release time delay. Lu designed a gray wolf optimizer for solving WSP, aiming to simultaneously minimize machine load, makespan, and instability. Li [17] proposed a multi-objective artificial bee colony algorithm with knowledge-based local search operators for this problem. Rao [18] studied WSP with the goal of minimizing the machine interaction effect and the total tardiness simultaneously. Wang [12, 13] designed an effective whale swarm optimizer and a decomposition-based multi-objective evolutionary algorithm (MOEA/D) for DHWSP. Wang [19] constructed a welding shop inverse scheduling problem with dynamic events and developed an enhanced gray wolf optimizer for solving it. Wang [11] proposed MOEA/D with adaptive resource allocation for solving energy-efficient WSP. Wang [9] designed a cooperative memetic algorithm for energy-efficient DHWSP and achieved favorable results. Lu [1] introduced human-robot collaborative scheduling in WSP and designed a Pareto-based memetic algorithm for solving it.

Competitive swarm optimizer

$$\begin{aligned} \left\{ \begin{aligned}&v_l(t+1)=r_0v_l(t)+r_1\left( x_w(t)'-x_l(t)'\right) ,\\&x_l(t+1)=x_l(t)+v_l(t+1),\\&x_w(t)'=x_w(t)+r_0v_w(t),\\&x_l(t)'=x_l(t)+r_0v_l(t),\\ \end{aligned}\right. \end{aligned}$$
(1)

The CSO was originally proposed for continuous optimization problems, with the primary learning schema being vector difference as depicted in Eq. (1). Leveraging its rapid-learning feature, CSO is frequently employed for solving multi/many-objective optimization problems (MOPs) [20, 21]. Gu improved initialization and learning strategies of CSO which can get better balance convergence and diversity [22]. Huang introduced parameter adaptive CSO to enhance its intelligence [23]. Large-scale MOPs constitute the primary research domain of CSO [24], leading to the proposal of various improved versions of CSO. Mohapatra designed a tri-competitive schema-based CSO to enhance exploration [25,26,27], while Ge introduced inverse modeling to update winners and accelerate the convergence of CSO [28]. Liu devised three distinct competitive schemas to improve the diversity of CSO [29], and Qi incorporated a neighborhood search strategy to enhance CSO [30]. Huang divided CSO into three phases, yielding superior results compared to state-of-the-art approaches. Additionally, CSO finds applications in constrained MOPs [31], many-objective optimization problems [32], feature selection [33], and wireless sensor networks [34].

Meta-heuristics with learning strategies

Learning-based meta-heuristics for addressing shop scheduling problems have garnered significant attention. These learning strategies encompass machine learning-based solution generation [35], machine learning-based operator selection [7, 36], reinforcement learning (RL)-based parameter selection [37], RL-based operator selection [38, 39], and deep RL-based operator selection [40, 41]. Recently, the integration of RL with meta-heuristics has been thoroughly explored. This collaboration is gaining traction because RL can also serve as a solver for optimization problems [42]. Therefore, this research focuses on the synergy between RL and meta-heuristic approaches, examining novel ways of cooperation between these two paradigms.

Problem statement and modeling

Problem statement

The DHWSP mainly solves following sub-problems: (i) Dispatching each job to every heterogeneous factory; (ii) Arranging the job sequence in all factories; and (iii) Determining the number of processing machines for each operation on each stage. A DHWSP instance has n jobs and \(n_f\) factories. Each job has to be processed in sequence on \(n_s\) stage in every factory. There are \(m_{f,s}\) welders which can process an operation in parallel in each stage. Each job’s original processing time is \(p_{f,i,s}\) in stage s in factory f. Moreover, each job’s real processing time \(r_{f,i,s}\) is \(p_{f,i,s}/\mu _{f,i,s}\), where \(\mu _{f,i,s}\) is the welder number used in stage s. However, the welding energy consumption of operation \(O_{i,s}\) does not increase in linearity. A nonlinear coefficient \(\eta \) is multiplied by the number of welders \(\mu _{f,i,s}\) and the welding energy consumption of operation \(O_{i,s}\) is \(r_{f,i,s}\times \mu _{f,i,s}\times \eta \). The more \(\mu _{f,i,s}\), the higher welding energy consumption. This modification is different from previous works [9, 12]. The primary goal of DHWSP is to solve these sub-problems in order to find a Pareto solutions set that minimizes both the makespan and TEC. Figure 1 illustrates a Gantt chart representing a solution for DHWSP. Importantly, each job can be processed by multiple welders simultaneously.

DHWSP operates under the following assumptions: (1) Each welder is assigned to process only one job at each stage. However, it’s important to note that each job can be simultaneously processed by multiple welders. Additionally, each welder is linked to a single machine. (2) The processing times and associated energy consumption are deterministic and known in advance. (3) Each job cannot be split and processed in two different factories. It must be entirely processed within a single factory. (4) The problem does not consider transportation times or dynamic events that might affect the scheduling.

Fig. 1
figure 1

The gantt chart of a DHWSP solution

Mathematical formulation

DHWSP’s notations are described as follows:

Indices:

  • i: index of each job, \(i\in \{1,...,n\}\);

  • s: index of each stage, \(k\in \{1,...,n_s\}\);

  • f: index of each factory, \(f\in \{1,...,n_f\}\);

Parameters:

  • n: the number of jobs;

  • \(n_f\): the number of factories;

  • \(n_s\): the number of stage;

  • \(m_{f,s}\): maximum welders in each stage in f;

  • \(p_{f,i,s}\): The original processing time for job i on stage s in factory f;

  • \(s_{f,i,s}\): The setup time for job i on stage s in factory f;

  • \(W_{B}\) the basic power of whole factories;

  • \(W_{I}\) machine idle power;

  • \(W_{O}\) machine welding power;

  • \(W_{S}\) machine setup power;

  • \(K_{w}\) welding duty cycle;

  • L: a integer large enough;

Decision variables:

  • \(E_{B}\) the basic energy consumption;

  • \(E_{I}\) machine idle energy consumption;

  • \(E_{O}\) machine welding energy consumption;

  • \(E_{S}\) machine setup energy consumption;

  • \(\mu _{f,i,s}\): The number of welders used by job i on stage s in factory f;

  • \(C_{f,i,s}\): the completion time of job i on stage s in factory f;

  • \(C_{\max }\): the makespan of a schedule;

  • TEC: the total energy consumption of a schedule;

  • \({\textbf{X}}_{f,i,s}\): The value is equals to 1, if job i is processed on stage s in factory f; otherwise it is set to 0;

In the context of DHWSP, the primary objectives to be considered are makespan (the time taken to complete all jobs) and total energy consumption (TEC). An interesting trade-off arises in this problem: (1) Makespan: When you have more welders, the processing time for each job (denoted as job i) tends to decrease, which in turn leads to a lower makespan. (2) Total Energy Consumption (TEC): However, a higher number of machines (welders) results in increased energy consumption, which grows linearly. In this case, the energy consumption is inversely related to the processing time because, with reduced processing time, machines are active for shorter durations. This relationship between TEC and makespan indicates a conflict between the two objectives. When one objective is optimized, it often leads to a compromise in the other. Formulations are designed to address this conflict and find the best compromise solution for DHWSP.

$$\begin{aligned}{} & {} \min ~F_1=C_{\max }=\max \left\{ C_{f,i,n_s}\right\} , {\forall i, f}. \end{aligned}$$
(2)
$$\begin{aligned}{} & {} \min ~F_2=TEC=E_B+E_S+E_I+E_O, \end{aligned}$$
(3)
$$\begin{aligned}{} & {} E_B=W_B\times C_{\max }. \end{aligned}$$
(4)
$$\begin{aligned}{} & {} E_S=W_S\times \sum _{f=1}^{n_f}\sum _{i=1}^{n}\sum _{s=1}^{n_s}s_{f,i,s}. \end{aligned}$$
(5)
$$\begin{aligned}{} & {} E_I=W_{I}\times \sum _{f=1}^{n_f}\sum _{s=1}^{n_s}\sum _{i=2}^{n}\cdot \left( C_{f,i,s}-C_{f,i-1,s}\right. \nonumber \\{} & {} \qquad \quad \left. -s_{f,i,s}-\frac{p_{f,i,s}}{\mu _{f,i,s}}\right) \end{aligned}$$
(6)
$$\begin{aligned}{} & {} E_O=\left[ W_I\cdot (1-K_w)+W_O\cdot K_w\right] \nonumber \\{} & {} \qquad \quad \times \sum _{f=1}^{n_f}\sum _{s=1}^{n_s}\sum _{i=1}^{n}p_{f,i,s}\cdot \left( 1+0.5\cdot \ln (\mu _{f,i,s})\right) \Big )\nonumber \\ \end{aligned}$$
(7)

The TEC in DHWSP comprises four components: (1) \(E_B\): This represents the basic energy consumption, which includes energy usage for lighting and air conditioning. (2) \(E_S\): It accounts for the energy consumption associated with machine setup before the welding process begins. (3) \(E_I\): This component represents the energy consumption when machines are in an idle state, waiting for the previous job to be completed before starting the next one. (4) \(E_O\): This part corresponds to the welding energy consumption. It’s noteworthy that each welder has a cooling period after welding, lasting for \(K_w\%\) of the time. Furthermore, there’s a non-linear relationship between the number of welders and \(E_O\), which is modeled as a logarithmic function. The mathematical formulation of DHWSP is introduced as follows, outlining the problem’s variables, constraints, and objectives.

$$\begin{aligned} \left\{ \begin{aligned}&\min ~F_1=C_{\max }\\&\min ~F_2=TEC \end{aligned}\right. \end{aligned}$$
(8)

subject to:

$$\begin{aligned}{} & {} \sum _{f=1}^{n_f}\sum _{i=1}^{n}{\textbf{X}}_{f,i,s}=1,\forall s \end{aligned}$$
(9)
$$\begin{aligned}{} & {} \sum _{f=1}^{n_f}\sum _{s=1}^{n_s}{\textbf{X}}_{f,i,s}=1,\forall i \end{aligned}$$
(10)
$$\begin{aligned}{} & {} 1\leqslant \sum _{i=1}^{n}\sum _{s=1}^{n_s}{\textbf{X}}_{f,i,s}\cdot \mu _{f,i,s}\leqslant m_{f,s},\forall f,s \end{aligned}$$
(11)
$$\begin{aligned}{} & {} C_{f,i,1}=\sum _{i=1}^{n}{\textbf{X}}_{f,i,1}\cdot \left( \frac{p_{f,i,1}}{\mu _{f,i,1}}+s_{f,i,1}\right) ,\forall f,i \end{aligned}$$
(12)
$$\begin{aligned}{} & {} C_{f,i,s+1}\geqslant C_{f,i,s}\forall f,i,s=\{1,...,n_s-1\}\nonumber \\{} & {} + \sum _{i=1}^{n}{\textbf{X}}_{f,i,s}\cdot \frac{p_{f,i,s}}{\mu _{f,i,s}}, \forall f,i,s=\{1,...,n_s-1\} \end{aligned}$$
(13)
$$\begin{aligned}{} & {} C_{f,i+1,s}\geqslant C_{f,i,s}+ \sum _{i=1}^{n}{\textbf{X}}_{f,i,s}\cdot \left( \frac{p_{f,i,s}}{\mu _{f,i,s}}+s_{f,i,s}\right) ,\nonumber \\{} & {} f,s,i=\{1,...,n-1\} \end{aligned}$$
(14)
$$\begin{aligned}{} & {} {\textbf{X}}_{f,i,s}\in \{0,1\}, \forall f,i,s \end{aligned}$$
(15)

where Eq. (8) represents the objective function, including both the makespan and TEC. Equations (9) and (10) specify that each job cannot be dispatched to two factories simultaneously. Equation (11) imposes a constraint ensuring that the number of welders used for each job does not exceed the capacity limit. Equation (12) defines the finish time for the first job at each factory. Equation (13) enforces that each operation can only start once the preceding operation is completed. Equation (14) guarantees that each machine can process only one job at a time after the setup phase. Finally, Eq. (15) pertains to binary decision variables, which are part of the problem formulation.

Fig. 2
figure 2

LCCMO’s Framework for solving DHWSP

Our approach: LCCMO

Framework of LCCMO

Figure 2 provides an overview of the LCCMO framework. The algorithm is structured around two populations, each serving a distinct role in the optimization process. The primary population, denoted as \(P_1\), utilizes a competitive and cooperative multi-objective optimizer to explore the decision space. The process begins with the initialization of \(P_1\) using multiple heuristic rules. Subsequently, \(P_1\) is split into two subpopulations: winners and losers, employing an implicit competitive strategy. The loser population learns from the winners, while the winners continue to evolve. Furthermore, the selection of offspring and parent for the next generation follows the NSGA-II strategy [43]. Additionally, an auxiliary population, labeled as \(P_2\), employs Q-learning-based multi-operator cooperative search to explore a limited range, complementing \(P_1\)’s broader but less focused search. Both \(P_1\) and \(P_2\) contribute their elite solutions to the archive \(\Omega \), where local search techniques are applied. Finally, the Pareto solutions are extracted from \(\Omega \) as the final output.

Encoding and decoding

DHWSP represents a solution by two vectors and a welding scheduling matrix. Figure 3 shows DHWSP’s encoding schema. JS is the job sequencing, FA is the factory assignment, and WN is the welding scheduling matrix, representing the number of welders utilized by each job in each stage.

Decoding schema: (1) Dispatching each job to the selected factory according to the FA. (2) Determining the job sequence in each factory according to the JS. (3) Calculating each job’s real processing time in each stage according to the WN. (4) Calculating each job’s start and finish time under the constraints. (5) The idle time between two adjacent jobs can be obtained. (6) The makespan is the maximum completion time of all jobs in the final stage, and the TEC can be calculated according to Eq. 3.

Fig. 3
figure 3

An example for encoding schema for DHWSP

Multi-rule cooperative initialization

High-quality initialization plays a crucial role in generating elite solutions with strong convergence, ultimately enhancing the search efficiency [37, 44, 45]. To achieve this, three heuristic rules have been designed to construct solutions with minimized makespan and TEC.

Rule1: Selecting more welders for each job can reduce processing time to decrease Cmax. Thus, initial JS and FA randomly, and each job in each stage selects the maximum number of welders.

Rule2: Each job chooses fewer welders can reduce welding energy consumption. Thus, randomly initial JS and FA, and every job in every stage selects the minimum number of welders.

Rule3: Balance each factory’s workload can reduce the makespan. The workload gap between all factories is smaller, the makespan is lower. Thus, initial JS and WN randomly. Calculate each job’s average processing time in each factory. Then, transferring the average time to selection probability P. The more average processing time, the lower P value. Finally, each job selects a factory with the minimum workload in the current scheduling. If several factories have the same workload, select the factory with P by roulette strategy.

The initial population is partitioned into four sub-populations, each consisting of ps/4 individuals. The first three subpopulations are initialized using the proposed heuristic rules to ensure high-quality solutions. In contrast, the fourth subpopulation is initialized randomly, introducing diversity into the population. In LCCMO, all populations are initialized using a cooperative initialization. It’s important to note that these two populations have different sizes, tailored to their respective roles in the algorithm.

Fig. 4
figure 4

Precedence operation crossover for JS

Fig. 5
figure 5

Universal crossover for FA

Fig. 6
figure 6

Universal crossover for MN

Cooperative and competitive global search

In this section, we introduce a cooperative and competitive multi-objective optimizer, which is improved from CSO. This method is specifically tailored for addressing the DHWSP and involves three key enhancements over the original CSO [14]: (1) Improved Crossover Strategy: The original CSO, designed for continuous optimization problems, uses a vector difference crossover strategy that is not suitable for DHWSP. To adapt CSO for efficient global search in the context of DHWSP, two alternative crossover strategies are adopted: Precedence Operation Crossover (POX) [46] and Universal Crossover (UX) citeLR2022CAIE. (2) Implicit Many-to-Many Competition: In the original CSO, explicit one-to-one competition is employed, which can lead to suboptimal solutions learning from other suboptimal solutions. To overcome this limitation, an implicit many-to-many competition strategy is proposed. (3) Self-Evolution for Winners: In the original CSO, the winning solutions exclusively undergo mutation. In the improved CSO, a self-evolution strategy is introduced for the winner population, which accelerates the convergence of global search. The following sections provide detailed explanations of these improvements in the CSO, highlighting their adaptation to the DHWSP.

Algorithm 1
figure a

Implicit Competition.

Implicit Competition: Algorithm 1 describes the detail of implicit competition. First, the number of being dominated \(R_i\) of each solution is calculated, regarding as a convergence metric. Second, each solution calculates the distance with the other solutions, and utilizes the closet distance’s reciprocal as diversity metric \(D_i\). Finally, the comprehensive metric is calculated by adding \(R_i\) to \(D_i\). The smaller \(SF_i\), the better comprehensive performance. Sorting population with ascending SF. The first half is divided into the winner swarm, and the other half is the loser swarm.

Co-evolution: In order to promote the loser to converge, the loser learns from the winner. For each loser, randomly selecting a winner, and adopting POX and UX to generate two offspring. The procedure of each vector’s crossover is shown in Figs. 45, and 6. Moreover, before calculating the fitness of offspring, a repair operator is adopted for them to fix the infeasible solutions generated by crossover and mutation operators, because the number of welders in each stage in each factory is different. The out-of-range values in WN are fixed to the maximum number of welders. The co-evolution strategy is described in Fig. 7.

Fig. 7
figure 7

Co-evolution and self-evolution

Self-evolution: As shown in Fig. 7, each winner randomly selects another winner and adopts crossover operators to generate two offspring. Moreover, each offspring employs three mutation strategies with probability \(P_m\). (1) JS mutation: randomly choose two jobs and exchange their positions in JS. (2) FA mutation: randomly select a job and assign it to another factory. If infeasible solutions occur, generate a FA by random rule. (3) WN mutation: randomly select a job’s stage and change the number of utilized welders.

Environmental Selection: The improved CSO adopts NSGA-II’s environmental selection for population updating. First, the offspring is combined with the parent. Then, the fast nondominated sorting is employed and the crowding distance strategy is adopted to keep diversity. Finally, the first \(ps_1\) solutions are accepted and saved to next generation. In this work, \(ps_1=100\).

Learning-driven cooperative local search

An agent-based population \(P_2\) is employed to assist improved CSO to converge. \(P_2\) focuses on searching elite solutions in a small range of objective space. \(P_2\) sizes 20 and employed the Q-learning algorithm to select the optimal local search operator. Five knowledge-driven operators are designed to cooperatively search. The operators are introduced as follows:

\({\mathcal {N}}_1\): Randomly choose two jobs and exchange their position to increase the diversity of population \(P_2\).

\({\mathcal {N}}_2\): Find the critical path in critical factory, referring to [9]. Randomly select two critical jobs and exchange their positions to reduce makespan.

\({\mathcal {N}}_3\): Choose two critical jobs randomly and insert the latter job into the front of the former job.

\({\mathcal {N}}_4\): Randomly choose a critical job’s stage and increase its number of used welders to reduce makespan. Because the more welders, the shorter the processing time, and the smaller makespan might be.

\({\mathcal {N}}_5\): In order to reduce makespan, choose a critical job randomly and dispatch it to another factory.

Each solution employed a search operator to generate offspring. Moreover, Q-learning is adopted to improve the autonomy of cooperative search, and the design’s details are as follows:

Q-table Design: The agent’s states in the Q-learning framework are used to represent the success or failure of the current search operator. Consequently, each operator has two states, indicating whether it succeeded or failed to update the current solution. The agent itself has ten states, reflecting the different outcomes of the search operations. The available actions for the agent correspond to five local search operators. The structure of the Q-table, which guides the learning process, is presented in Fig. 8. For instance, when the agent is in its first state, denoted as \(S_1\), and selects action \(A_2\), if action \(A_2\) fails to update the current solution, the agent transitions to the next state, \(S_4\). In this new state, the agent chooses the next action, and the process continues. This cycling of states within the agent’s decision-making process ensures a continuous exploration of the solution space while adapting its actions based on past successes and failures.

Fig. 8
figure 8

Detail of Q-table

Training Agent: The reward is 10 when the action successfully updates the current solution, but the reward is 0 when it fails. Then, the solution gives the reward feedback to the agent, and the Q-table is updated by following function:

$$\begin{aligned}{} & {} \begin{aligned}&Q(S_t,A_t)=Q(S_t,A_t)+\alpha \left[ {R_{t}}+\gamma \max \right. \\&\left. \left( Q\left( S_{t+1},A_t\right) -Q\left( S_t,A_t\right) \right) \right] \\ \end{aligned} \end{aligned}$$
(16)

The environmental selection process in our approach aligns with NSGA-II [43]. The learning strategy comes into play when the subpopulation, with a size of 20, engages in cooperative search to explore the objective space. As depicted in Fig. 2, the learning mechanism is applied in the action selection phase, just before executing local search. Here is a breakdown of the steps involved in invoking the learning strategy: (1) Obtain the current state of the agent. (2) The agent selects the action with the highest q-value from the Q-table in the row corresponding to the current state. (3) The outcome of this action, whether it succeeds or fails, determines the agent’s next state. (4) A reward is assigned based on the success or failure of the selected action. (5) The agent’s Q-table is updated using the Q-learning equation. The additional computational cost incurred by the learning strategy primarily pertains to action selection and Q-table updates. Importantly, the computational complexity of this strategy is O(N), which is notably smaller than the computational overhead of the evolutionary process.

Elite strategy

After the environmental selection, the two populations preserve their elite solutions to the elite archive \(\Omega \). In order to find more Pareto solutions, each elite solution randomly executes a local search operator \({\mathcal {N}}_i, i\in \left[ 1,5\right] \). The elite archive deletes the repeat solutions and updates itself by saving the Pareto solutions. The final results of LCCMO are output by the elite archive.

Fig. 9
figure 9

Main effects plot of HV metric

Experimental results

Section “Our approach: LCCMO” has described the LCCMO in detail. Numerical experiments are employed to validate the effectiveness of LCCMO. Every algorithm is coded in MATLAB and tested on an CPU of Intel(R) Xeon(R) Gold 6246R with 3.4GHz and 384 G RAM.

Instances and metrics

Wang [12] initially introduced the DHWSP and prepared a set of 20 instances for assessing algorithm performance. However, it appears that the benchmark dataset from [12, 13] is currently unavailable. Therefore, this study has created an additional set of 20 instances using the same methodology and parameters as described in [12, 13]. This approach ensures consistency and provides a reliable foundation for evaluating algorithms. The job number is selected from \(n\in \{20,40,60,80,100\}\) and the number of factories ranges from \(n_f\in \{2,3\}\). The stage number is from \(n_s\in \{2,5\}\). The processing time \(p_{f,i,s}\) ranges from [10, 50]. The setup time \(s_{f,i,s}\) ranges from [1, 10]. The number of welders in each stage \(\mu _{f,i,s}\) ranges from [2, 5]. The processing power is \(W_{O}=28\) kWh, the setup power is \(W_{S}=10\) kWh, the idle power is \(W_{I}=0.3\) 6kWh, the basic power is \(W_{B}=5\) kWh, and the welding duty cycle is \(K_w=0.8\).

Hypervolume (HV) [47], Generation distance (GD) [43], and Spread [48] are utilized to represent different algorithms’ comprehensive performance, convergence, and diversity. The higher HV, the better comprehensive performance. The lower GD and Spread, the better convergence and diversity.

Experiment of parameters

LCCMO’s parameter setting affects its performance for solving DHWSP. The LCCMO contains six parameters which are ICSO population size \(ps_1\), agent-based population size \(ps_2\), mutation rate \(P_m\), learning rate \(\alpha \), discount factor \(\gamma \), and greedy factor \(\epsilon \). A Taguchi approach of design-of-experiment (DOE) [49] is adopted and an orthogonal array \(L_{27}(3^6)\) is generated. The parameters’ levels are indicated as follows: \(ps_1=\{100,150,200\}\); \(P_m=\{0.1,0.15,0.2\}\); \(\alpha =\{0.1,0.2,0.3\}\); \(\gamma =\{0.7,0.8,0.9\}\); \(\epsilon =\{0.7,0.8,0.9\}\); \(ps_2=\{10,20,30\}\). Every variant algorithm with different parameter settings runs 10 times independently. The same stop criterion is MaxNFEs \(=400*n\). All metrics’ average values for 10 runs are collected for Taguchi analysis. Figure 9 shows the HV metric’s main parameters’ effects plot. Based on the figure, the best configuration of parameter setting is that \(ps_1=100\), \(ps_2=20\), \(\gamma =0.8\), and \(\epsilon =0.8\). However, the \(P_m\) and \(\alpha \) have no inflection point, thus their settings are discussed further. The control variable method is adopted. The first group is \(P_m=\{0.1,0.15,0.2\}\), \(\alpha =0.2\), and other parameters are the best configuration in Fig. 9. The second group is \(\alpha =\{0.1,0.2,0.3\}\), \(P_m=0.2\), and other parameters are the best setting. The results of HV metric among ten times run are shown in Fig. 10. In two groups, each parameter setting has no significant difference. Thus, the \(\alpha \) and \(P_m\) are not sensitive and the \(\alpha \) is set to 0.3 and \(P_m\) is 0.1.

Fig. 10
figure 10

Box plot for LCCMO with different settings of \(P_m\) and \(\alpha \)

Table 1 Results of Friedman rank-and-sum test for all variant algorithms

Ablation experiment

Some variant algorithms are generated to validate the effectiveness of LCCMO’s improvements. (1) CSO has no improvements utilized to compared with other variants; (2) CSO+I1: CSO with initial rule 1 and random rule; (3) CSO+I2 CSO with initial rule 2 and random rule; (4) CSO+I3 CSO with initial rule 3 and random rule; (5) CSO+HI is CSO with multi-rule cooperative initialization strategy; (6) CSO+L1: CSO with \({\mathcal {N}}_1\); (7) CSO+L2: CSO with \({\mathcal {N}}_2\); (8) CSO+L3: CSO with \({\mathcal {N}}_3\); (9) CSO+L4: CSO with \({\mathcal {N}}_4\); (10) CSO+L5: CSO with \({\mathcal {N}}_5\); (11) CSO+LS is CSO with multi-operator cooperative local search; (12) ICSO is CSO with initialization and local search strategies. (13) LCCMO is ICSO with an agent-based population. All algorithms independently run twenty times and the stop criterion is MaxNFEs=\(400*n\geqslant 2*10^4\). All algorithms are coded in MATLAB.

Tables S-I, S-II, S-III show all variant algorithmic statistical results of all metrics. Moreover, the bold values mean the best. The \(p-\)values indicate the significance between the best rank algorithm and the current variant. Based on observation, the LCCMO is better than all variant algorithms on HV metrics. However, LCCMO is worse than CSI+I3 and CSO+L4 for GD metric, and LCCMO is worse than CSI+I1 for Spread metric. Furthermore, Table 1 lists the Friedman rank-and-sum test results. Some conclusions can be obtained as follows: (1) LCCMO has the best HV rank, indicating LCCMO has the best comprehensive performance. Moreover, \(p-\)values are smaller than 0.05 means that LCCMO is significantly better than other algorithms. (2) Comparing CSO, CSO+I1, CSO+I2, CSO+I3, and CSO+HI on all metrics indicates that proposed initial rules and their cooperation can increase the comprehensive performance. Compared to CSO+HI, CSO+I1, CSO+I2, and CSO+I3 obtain better GD and Spread because the signal rule makes the population uniformly covers a part of Pareto Front. (3) Comparing CSO, CSO+L1, CSO+L2, CSO+L3, CSO+L4, CSO+L5, CSO+LS, and ICSO demonstrate that the single local search operator can improve HV, GD, and Spread metric, and the random cooperation of each operator decreases the all metrics. However, combining multi-operator cooperative local search and multi-rule cooperative operators obtains the better HV metric. The key components are the cooperation of initialization rules and local search operators. (4) Comparing LCCMO and ICSO indicates the solvers cooperation can improve all metrics because different solvers can search different areas of objective space and the final non-dominated solution sets combine each solver’s results to enhance convergence and diversity.

Comparison and discussions

Table 2 Results of Friedman rank-and-sum test for all comparison algorithms
Fig. 11
figure 11

Pareto Front comparison results on 20J2F2S

Fig. 12
figure 12

Pareto Front comparison results on 100J3F5S

To further evaluate the effectiveness of our approach, LCCMO is compared to the state-of-the-art algorithms including MOEA/D [50], NSGA-II [43], SPEA2 [51], IMOEA/D [12], MOWSA [13], and CMA [9]. The parameters are set with the best configuration in their references—crossover rate \(P_c=1.0\), mutation rate \(P_m=0.1\), and population size \(ps=100\) for all algorithms. The neighborhood updating range is \(T=10\) for MOEA/D, IMOEA/D, and CMA. The parameter setting of LCCMO is the same as Sect. 5.2. To conduct a fair comparison, all MOEAs run 20 times and share the same stop criteria (MaxNFEs=\(400*n\geqslant \)20,000).

Tables S-IV, S-V, and S-VI record the statistical results of all comparison algorithms for three metrics. Symbols "\(-/=/+\)" means significantly inferior, equal, or superior to LCCMO. Meanwhile, the best value is marked in bold, and the last row of each table summarize the number of "\(-/=/+\)" symbols. As Tables S-IV and S-V show, LCCMO is significantly superior to compared algorithms for HV and GD metrics, indicating LCCMO has the best comprehensive performance and convergence for solving DHWSP. As for the Spread metric, Table S-VI indicates no significant difference between LCCMO and other algorithms, due to the complex and large search space of DHWSP. The local search step is too small for the WN matrix. Thus, the Pareto solutions are close to each others for all algorithms, resulting in the results of Spread. Table 2 indicates the results of the Friedman rank-and-sum test for the comparison experiment. LCCMO ranks the best for HV and GD metrics, indicating LCCMO has the best comprehensive performance and convergence for solving DHWSP.

The effectiveness of LCCMO is contingent on its design. First, the proposed multi-rule cooperative initialization method yields a population with exceptional convergence and diversity by constructing solutions closely aligned with the lower bound. Second, the multi-operator cooperative local search enables solutions to explore different directions, thus preventing them from getting trapped in local optima. Furthermore, the collaboration of various solvers effectively amalgamates the search results from each solver, enhancing both the diversity and convergence of the final non-dominated solution set. In addition, Figs. 11 and 12 present a comparison of the Pareto front results for all algorithms, with the PF results selected based on the optimal HV value from twenty runs. As depicted in Figs. 11 and 12, LCCMO consistently yields Pareto solutions with superior convergence and comprehensive performance, underscoring its ability to effectively address DHWSP.

Model evaluation

The model is evaluated by CPLEX solver on ten generated small instances. The results are shown in Table 3. The CPLEX can solve instances 1–7 in 60 s and the other cannot be solved because of the large amount of parameter. The proposed LCCMO obtains better results than mixed integer linear programming model because the cooperation of different solvers, multiple operators, and multiple rules. The CPLEX code can be downloaded from https://github.com/CUGLiRui/CPLEX_DHWSP.

Table 3 Comparison results by CPLEX solver and LCCMO

Conclusions

This study introduced a competitive swarm optimizer driven by cooperative learning and knowledge for addressing energy-aware distributed heterogeneous welding shop scheduling problems. From the experimental results, several key conclusions have been drawn: (1) The implementation of multiple strategies and their cooperative interaction can significantly enhance the performance of algorithms. In the proposed algorithm, cooperative initialization, global search, and local search processes effectively contribute to population exploration. (2) Leveraging the cooperation of different solvers, including reinforcement learning and evolutionary algorithms, can effectively enhance the overall performance of the algorithms. (3) Building upon the principle of cooperation, the proposed algorithm outperforms the compared algorithms, showcasing its superior performance. In summary, this research demonstrates that a collaborative approach, involving multiple strategies and diverse solvers, can lead to more effective solutions for energy-aware distributed heterogeneous welding shop scheduling problems.

Some topics will be considered further. (i) Adopting LCCMO to other distributed heterogeneous shop scheduling problems; (ii) Considering dynamical events for DHWSP; and (iii) Studying end-to-end model for DHWSP.