An improved fruit fly optimization algorithm with Q-learning for solving distributed permutation flow shop scheduling problems

Zhao, Cai; Wu, Lianghong; Zuo, Cili; Zhang, Hongqiang

doi:10.1007/s40747-024-01482-4

An improved fruit fly optimization algorithm with Q-learning for solving distributed permutation flow shop scheduling problems

Original Article
Open access
Published: 25 May 2024

Volume 10, pages 5965–5988, (2024)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

An improved fruit fly optimization algorithm with Q-learning for solving distributed permutation flow shop scheduling problems

Download PDF

Cai Zhao¹,
Lianghong Wu ORCID: orcid.org/0009-0000-9773-2280²,
Cili Zuo² &
…
Hongqiang Zhang²

590 Accesses
Explore all metrics

Abstract

The distributed permutation flow shop scheduling problem (DPFSP) is one of the hottest issues in the context of economic globalization. In this paper, a Q-learning enhanced fruit fly optimization algorithm (QFOA) is proposed to solve the DPFSP with the goal of minimizing the makespan. First, a hybrid strategy is used to cooperatively initialize the position of the fruit fly in the solution space and the boundary properties are used to improve the operation efficiency of QFOA. Second, the neighborhood structure based on problem knowledge is designed in the smell stage to generate neighborhood solutions, and the Q-learning method is conducive to the selection of high-quality neighborhood structures. Moreover, a local search algorithm based on key factories is designed to improve the solution accuracy by processing sequences of subjobs from key factories. Finally, the proposed QFOA is compared with the state-of-the-art algorithms for solving 720 well-known large-scale benchmark instances. The experimental results demonstrate the most outstanding performance of QFOA.

A cooperated shuffled frog-leaping algorithm for distributed energy-efficient hybrid flow shop scheduling with fuzzy processing time

Article Open access 28 May 2021

Q-learning-based multi-objective particle swarm optimization with local search within factories for energy-efficient distributed flow-shop scheduling problem

Article 25 October 2023

The Hybrid Shuffle Frog Leaping Algorithm Based on Cuckoo Search for Flow Shop Scheduling with the Consideration of Energy Consumption

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Due to the rapid development of the global economy and the increasing market demand, the traditional single-factory production model has been challenged and can no longer cope with the current changes. Distributed manufacturing has solved the above problems well, and it has now become a major trend [46, 51]. Its advantages lie in strengthening the connections between various enterprises, cooperating with multiple factories, allocating resources well, greatly reducing production time [6, 30]. In this way, it can better cope with the current situation, so distributed manufacturing is becoming more and more common and gradually occupying a dominant position, while the traditional single-factory production model is gradually withdrawing from the historical stage. In this context, research on distributed scheduling is particularly necessary as it directly affects the future development of the entire enterprise. In recent years, research on distributed scheduling has become more and more diversified, such as distributed hybrid flow shop scheduling [25], distributed blocking flow shop scheduling [53], distributed no-wait flow shop scheduling [31], distributed permutation flow shop scheduling [1], etc. At the same time, the goals are also becoming more complex, from single goals to multiple goals. More complex than a single factory, multi-factory production processing must consider at least two sub-problems: factory allocation and job scheduling order. Therefore, this brings great challenges to traditional algorithm design. Permutation constraints are one of the most important constraints in flow shop scheduling. Because only the first machine is sorted for jobs and no consideration is given to subsequent machines, it is suitable for sustainable large-scale production and has received widespread attention. The single-factory flow shop problem has been proven to be NP-hard [2, 42, 55, 56]. As an extension of this problem, the distributed permutation flow shop scheduling problem (DPFSP) is inherently more complex and more in line with the current environment. Therefore, research on DPFSP problems has strong engineering value [19, 36].

The fruit fly optimisation algorithm (FOA) is a highly competitive algorithm proposed by Pan [23]. Wang et al. [41] compared FOA with over ten heuristic algorithms on benchmark problems. Their findings validated that FOA exhibits strong optimization capabilities, requires fewer parameters, features simple operations and is easy to implement. Currently, FOA has found successful applications in various domains, such as power load forecasting, parameter identification and scheduling field. Ibrahim et al. [13] designed a wind-driven variant of FOA for parameter identification in photovoltaic cell models. Saminathan et al. [29] integrated the whale optimization algorithm with FOA to address energy-saving problems with delays. Zhang et al. [50] designed a problem-specific initialization method, four neighborhood structures and a local search structure in the sniffing phase, and proposed a discrete FOA algorithm for the scheduling problem of distributed manufacturing systems. The experimental results showed the benefit of the multi-neighborhood strategy for expanding the global search space. Huang et al. [12] proposed a FOA algorithm based on the elimination mechanism to maintain population diversity by eliminating and generating some drosophila with poor performance, and used it for the traveler problem. Shao et al. [32] designed a new FOA algorithm to solve blocked flow shop problems. Zheng et al. [57] proposed an enhance FOA to solve unrelated parallel machine scheduling problems.

With the rapid rise of artificial intelligence, which improves great convenience for all walks of life. The Q-learning, as the most typical algorithm of reinforcement learning (RL), originates from dynamic programming. It makes the best decision at every step to make the whole process optimal. Due to its simple operation and high efficiency, it is increasingly favored by researchers [3]. Below are some related literature on the use of Q-learning in various fields. Chen et al. [4] applied the Q-leaning algorithm to the parameter tuning of swarm intelligence optimization algorithms to better solve the assembly workshop problem. Similarly, Wang et al. [38] also studied and designed a dual Q-learning for the assembly flowshop problem to improve its adaptability to the environment. Hsieh and Su [11] designed a complementary solution to the economic dispatch problem with the goal of reducing energy consumption by combining the Q-learning with intelligent optimization algorithms. Zhang et al. [49] designed a new Q-learning model and proposed a method to solve scheduling problems in power systems. Shen et al. [33] also solved this type of problem well by designing a meme algorithm based on Q-learning for dynamic software project scheduling. From the above literature, it is learnt that Q-learning generally uses the greedy strategy e-greedy for the problem solving process, which selects the corresponding action with the largest reward value in each selection, but it does not consider from the global perspective, and it sets the Q-value as an equal value or a random value during initialisation, i.e., it learns in the environment without a priori knowledge, which leads to uncertainty in the learning effect.

Q-learning belongs to unsupervised learning, which selects the optimal action from the action set according to the current state and information feedback in the environment. Therefore, Q-learning is embedded into FOA to jointly solve the actual manufacturing decision-making problem, and the neighborhood structure is designed according to the properties of DPFSP. The Q-learning dynamically selects the most suitable neighborhood structure during the population evolution process based on the current state and historical information feedback. Furthermore, the “Sigmoid” function is introduced as a dynamic selection strategy instead of the greedy strategy of Q-learning, which is used to improve the evolutionary ability of the population.

In view of the engineering value and practicality of DPFSP, this paper attempts to address the DPFSP with the objective of minimizing makespan, a QFOA is proposed by combining the advantages of FOA and Q-learning and the properties of DPFSP.

1.
Firstly, a hybrid strategy is used to cooperatively initialize the fruit fly population, and the boundary properties are used to improve the operational efficiency of QFOA [10].
2.
Multiple neighbourhood structures are designed based on the problem attributes in the smell phase, and the mechanism of Q-learning is assembled to select the most appropriate neighbourhood structure. Notably, the "Sigmoid" function is introduced as a dynamic selection strategy, replacing the greedy strategy of Q-learning to further improve the algorithm’s ability to find the optimal.
3.
A simulated annealing acceptance criterion is embedded in the visual phase to avoid premature convergence.

The remaining structure of the paper is as follows. In Sect. “Literature review” reviews the solution methods for DPFSP. The mathematical model of the DPFSP is described in Sect. “Problem description”. Section “QFOA for DPFSP with minimizing makespan” gives the details of QFOA solving for DPFSPs. Section “Experimental analysis” introduces parameter calibration and performance evaluation for QFOA. Section “Conclusion” concludes and looks forward to further research.

Literature review

Distributed manufacturing systems are widely used as society grows fast and cooperation between companies is enhanced. DPFSP has more engineering and academic value as a generalisation of shop flow scheduling. In the decade or so since the concept of DPFSP was proposed, it has received much attention, with a variety of solution methods and goals changing from single to multiple. Below we review and summarize the relevant literature.

Research gap

DPFSP as an extension of PFSP is more in line with the current manufacturing environment than it is. Meanwhile, DPFSP is one of the most common production modes used by companies in the manufacturing process, which has been widely used in welding, medical and automotive fields. Therefore, studying DPFSP has significant engineering value.

Through the review of the related problems of DPFSP and the solution methods in Sect. “Related scheduling problems and solutions, it can be found that up to now, various methods have been used to solve DPFSP, and many aspects are mature. However, the schemes given are generally acceptable for small and medium-sized instances, but not ideal for large-scale instances. Not only is it time-consuming and inefficient, but the results obtained are often significantly different from the currently obtained optimal solutions. According to the literature [44], it takes about 70 days to solve a large-scale instance of 500*20, which is generally not accepted. However, the possible reason is that when designing the solution method, the properties of the problem are not fully considered, so the effect is not very obvious when solving large-scale instances. Therefore, the purpose of this paper is to combine the characteristics of the problem to develop an effective algorithm so that in solving large-scale problems, a satisfactory solution can be quickly found in a short time.

In recent years, Q-learning is getting more and more attention from scholars due to its simplicity and high efficiency, and has been successfully applied in many fields, especially in solving complex large-scale problems. And there are related literatures using Q-learning combined with meta-heuristics to solve distributed flow shop problems, but it is only used to adjust the algorithm parameters, and for other aspects of the research is relatively small at present. Meanwhile, this gives us ideas to combine the advantages of meta-heuristic algorithms with Q-learning and consider the properties of the problem to design effective operators to solve the DPFSP.

To sum up, the above analyses provide a preliminary basis for the combination of Q-learning and meta-heuristic algorithms. To this end, an enhanced FOA algorithm (QFOA) based on the Q-learning mechanism is proposed to solve the DPFSP with the objective of the makespan. According to the properties of the problem, four neighbourhood perturbation operators are designed. The mechanism of Q-learning is assembled to select the most appropriate neighbourhood structure. Notably, the “Sigmoid” function is introduced as a dynamic selection strategy, replacing the greedy strategy of Q-learning to further improve the algorithm’s ability to find the optimal.

Problem description

According to the description of literature [54], the DPFSP can be described as: n jobs $ \{j_1,j_2,\ldots ,j_n \}$ needs to be in f factories processing $ F=\{F_1,F_2,\ldots ,F_f \}$. All factories have exactly the same assembly line, which contains m processing machines $ M=\{M_1,M_2, \ldots ,M_m \}$, and in each factory, represents the processing time of job j on machine i, and in different factories, after the job sequence is arranged on the first machine, the subsequent processing sequence is the same, that is, it presents the characteristics of replacement pipeline. There is no difference in job processing time among different factories. It is worth noting that a job can only be processed by one factory, and the processing process cannot be interrupted. Below are the meanings represented by each symbol in this article.

The classical DPFSP minimizes the makespan as the scheduling objective and is usually represented as $DP|prmu|C_{max}$ where distributed factories are represented as DP; Permutation features are represented as prmu; the makespan is expressed as $C_{max}$.

Symbol meaning:

i: denotes the index of the machine, where $i=1,2,\ldots ,m$.

j: denotes the index of the job, where $j=1,2,\ldots ,n$.

F: denotes the number of factories.

f: denotes the index of the factory, where $f=1,2,\ldots ,l$.

n: denotes the number of jobs.

m: denotes the number of machines.

$C_{max}$: Indicates the completion time of all assignments.

$P_{j,i}$:denotes the processing time of job j in machine i.

$I_{i,k,f}$:denotes the idle time.

$W_{i,k,f}$:denotes the completion time of the kth job in factory f on machine i.

Psize:Population size.

$\alpha $:learning rate.

$\gamma $:discount factor.

$T_P$:annealing factor.

Decision variable

$X_{i,k,f}$:the decision variable with a value of 1 if job j is assigned to the kth position in factory f and 0 otherwise.

Objective:

$$\begin{aligned} min C_{\max } \end{aligned}$$

(1)

Constraint:

$$\begin{aligned}{} & {} \sum \limits _{k = 1}^n {\sum \limits _{f = 1}^F {X(j,k,f)} } = 1,{\forall _j} \end{aligned}$$

(2)

$$\begin{aligned}{} & {} \sum \limits _{j = 1}^n {\sum \limits _{f = 1}^F {X(j,k,f)} } = 1,{\forall _k} \end{aligned}$$

(3)

$$\begin{aligned}{} & {} \begin{array}{l} {I_{i,k,f}} + \sum \limits _{j = 1}^n {{P_{j,i}} + {W_{i,k + 1,f}}} - {W_{i,k,f}} - \\ \sum \limits _{j = 1}^n {X(j,k,f) \cdot {P_{j,i + 1}} - {I_{i + 1,k,f}} = 0,{\forall _{k< n,i < m,f}}} \end{array} \end{aligned}$$

(4)

$$\begin{aligned}{} & {} {C_f} = \sum \limits _{i = 1}^{m - 1} {\sum \limits _{j = 1}^n {X(j,1,f)} } \cdot {P_{j,i}} + \sum \limits _{k = 1}^{n - 1} {I(m,k,f)}\nonumber \\{} & {} \qquad + \sum \limits _{j = 1}^n {\sum \limits _{k = 1}^n {X(j,k,f)} } \cdot {P_{j,m}},{\forall _f} \end{aligned}$$

(5)

$$\begin{aligned}{} & {} {C_{\max }} \ge {C_f},{\forall _f} \end{aligned}$$

(6)

$$\begin{aligned}{} & {} {I_{i,k,f}} \ge 0,{\forall _{i,k < n,f}} \end{aligned}$$

(7)

$$\begin{aligned}{} & {} {W_{i,k,f}} \ge 0,{\forall _{i,k < m,f}} \end{aligned}$$

(8)

$$\begin{aligned}{} & {} {X_{j,k,f}} \in \{ 0,1\},{\forall _{j,k,f}} \end{aligned}$$

(9)

where Eq. (1) indicates the minimize makespan. The constraint set is Eqs. (2)–(9), Eqs. (2) and (3) for all jobs can only occur in this way and each job must be processed in one factory. Equation (4) represents the idle time constraint. Equation (5) denotes the completion time for each factory. Equation (6) denotes the completion time of the entire problem, and Eqs. (7) and (8) indicate that the machine idle time after the completion of the current job and before the next processed job is greater than or equal to 0. Equation (9) is the range of values for the binary take variables.

In order to understand this scheduling process more clearly, an example of 5 jobs, 2 machines, and 2 factories is given below. Table 1 shows the processing time of each job. Figure 1 shows the gantt chart of the example.

Table 1 Processing time of the job

Full size table

As can be seen from Fig. 1, Factory 1: The makespan $C_{max} $ of the job sequence 2, 5 is 6. Factory 2: The makespan $C_{max}$ of the job sequence 3, 1, 4 is 9. So, the makespan in the example is 9.

QFOA for DPFSP with minimizing makespan

Due to the excellent performance of FOA, it has been combined with other methods by researchers to solve discrete problems. We combine FOA and Q-learning together to propose a new hybrid algorithm, QFOA, which aims to minimize makespan by selecting high-quality neighborhood structures during the iterative process. It is worth noting that this paper makes corresponding adjustments to Q-learning algorithm and introduces "Sigmoid" function as a dynamic selection strategy instead of greedy strategy to improve the probability of the algorithm jumping out of the local optimal. The detailed details of the QFOA include the population initialization, the design of the smell and visual phases, and the local search composition.The detailed QFOA process is shown in Fig. 2.

Representation of solutions

For this study, we adopted a representation method based on job sequences to represent a complete solution in the form of two-dimensional vectors. Thus, a solution can be represented as ${\pi = \{ {\pi _1},{\pi _2},...,{\pi _f}\} }$, ${\pi _l} = \{ {\pi _{l,1}},{\pi _{l,2}},...,{\pi _{l.{n_l}}}\} $, $l = 1,2,...,f$. where $n_l$ indicates the number of jobs assigned to factory $F_l$. Based on the example in Fig. 1 above, the solution is represented as, $\pi = \{ {\pi _1},{\pi _2}\}$, ${\pi _1} = \{ 2,5\}$, ${\pi _2} = \{3,1,4\}$, $n_1$ = 2, $n_2$ = 3.

Initialization

Excellent initial population helps to improve the rate of convergence and accuracy of the algorithm. NEH [21] can quickly obtain high-quality solutions. When it was first proposed, it was used to solve the PFSP. The NEH mechanism is described as follows: first, a sequence of jobs is sorted in descending order to obtain a seed sequence, second, the two longest jobs are found, and finally, the remaining jobs are inserted one by one into the two jobs taken out to obtain the minimum makespan. Here the NEH algorithm is extended to multiple factories for initialisation of the population. Inspired by literature [32], this paper designs a new NEH heuristic method (NEH_R), moreover, considering population diversity, the NEH_R method is used to initialize 10% of the individuals, and the remaining is generated randomly. The detailed steps of NEH_R are as follows.

The jobs are sorted in non-increasing order according to the total processing time to generate the seed sequence $\sigma = [{\sigma _1},{\sigma _2},...,{\sigma _n}]$, where ${\sigma _j} \in j,j = 1,2,....n$. Then, we will evenly distribute the first f jobs (${\sigma _1},{\sigma _2},...,{\sigma _f}$) to f factories. Next, test all possible positions of all factories one by one for the remaining jobs ${\sigma _{f + 1}},{\sigma _{f + 2}},...,{\sigma _n}$, and finally insert the job into the best position.

It is worth noting that the boundary property in the literature [10] is used in the process of insertion to save time and improve efficiency, and NEH_R pseudo-code Algorithm 1. To illustrate the NEH_R process more clearly, an example of a 5 jobs, 2 machines, and 2 factories is given for illustration, and Fig. 3 shows the exact scheduling process.

Smell phase

During the smell search phase, individual fruit fly within the population are updated mainly by neighborhood search. Fruit flies search for food sources, and once the best food source is found, all fruit flies will concentrate and fly here. Therefore, we can find that the generation method of neighborhood solutions is crucial for the algorithm. According to existing references, there is currently no neighborhood structure that is suitable for solving all problems. Considering the problem structure, we propose four types of neighborhood operators to generate neighborhood solutions. And combined with the mechanism of Q-learning, the most suitable operator is selected for each individual with different adaptations to generate better individuals.

Neighborhood structure

Given the specificity of minimizing $C_{max}$ in a distributed environment, it is assumed in the design of the neighborhood that the solution’s are not optimized if the schedule of the largest factory (denoted as key factory $F_c$) is not changed. Therefore for the design of the neighborhood structure are operated based on the key factory. In QFOA, four neighborhood operators are introduced to generate neighborhood solutions to improve the algorithm’s ability to search for solutions. One is based on factory assignment, including ${N_1}$: insertion of key factories with other factories $Fab\_insert$); ${N_2}$: swap of key factories with other factories ($Fab\_swap$); the second is based on jobs sequence assignment, including ${N_3}$: insertion within key factories ($Jab\_insert$); ${N_4}$: swap within key factories ($Jab\_swap$). Figure 4 shows a detailed example to illustrate the process.

$Fab\_swap$: Randomly select a job n from the key factory ${F_c}$, and another job $n'$ from other factories. Swap jobs n and $n'$ to generate $f-1$ neighborhood solutions and select the best one.

$Fab\_insert$: A job n is randomly selected from the key factory ${F_c}$, and another job $n'$ is randomly selected from other factories. The job n is inserted before the location of $n'$ to generate $f-1$ neighborhood solutions and select the best one.

$Jab\_insert$: Randomly select l jobs from the key factory without duplication, and insert one of them before the position of $l-1$ jobs. This generates $l-1$ neighborhood solutions and selects the best one.

$Jab\_swap$:Randomly select l jobs from the key factory without duplication, and swap one of them with $l-1$ jobs. This generates $l-1$ neighborhood solutions and selects the best one.

Combining with Q-learning

Reinforcement learning (RL) is a decision-making method that involves agents interacting with environments through states, actions, and rewards to achieve optimal outcomes [3]. The basic idea is that the agent obtains the current state $S_t$ and selects an appropriate action at from the action set at time t to execute. The environment then provides a reward R (positive or negative) for executing a certain action. Subsequently, the agent performs a new action $a_{t+1}$ based on the new state $S_{t+1}$ and the reward R fed back by the environment according to a certain strategy. A detailed diagram of this principle is shown in Fig. 5.

Q-Learning is a value-based algorithm in RL that uses a Q-value to represent the expected gain of taking an action in a given state [14, 43]. The Q-value, denoted as $Q(S_t,a_t)$, is based on the agent’s action ${a_t}({a_t} \in A)$ at state ${S_t}({S_t} \in S)$ and the reward R provided by the environment. The algorithm constructs a Q-table to store the Q-value for each State and Action pair and selects the action with the maximum gain based on the Q-value. The Q-value is updated using equation (10).

$$\begin{aligned} Q({S_t},{a_t})&= (1 - \alpha )Q({S_t},{a_t}) \nonumber \\&\quad + \alpha (R + \gamma \max (Q({S_{t + 1}},{a_{t + 1}}))) \end{aligned}$$

(10)

where $a_t$, $S_t$, $a_{t+1}$ and $S_{t+1}$ respectively refer to the actions and states at time t and $t+1$; $\alpha $ are learning rate and $\gamma $ discount factor respectively.

Q-learning algorithms for problem solving generally make use of the greedy strategy e-greedy for selection, which selects the reward in each choice The greedy strategy e-greedy is used in the problem solving process of Q-learning algorithms. Since the Q-value is set to be equal or random in the initialization process, the learning process is carried out in an environment with no prior knowledge. At the same time, the learning rate is not dynamically adjusted after initialization, so the decision space of the learning process is large, the learning speed is slow, the learning effect is uncertain, and the action with the largest Q-value is selected every time, which may lead to the failure of finding the optimal solution. In this paper, we introduce the “Sigmoid” function as a dynamic selection strategy, so that the improved Q-learning algorithm selects the actions randomly in the early stage, and then dynamically selects the optimal actions in the later stage. This improves the phenomenon that the greedy strategy may fall into the local optimum in the early stage and may still select the second best action in the later stage of the learning process.

This paper introduces “Sigmoid” as an action selection strategy, and its expression is as follows:

$$\begin{aligned} P({S_\mathrm{{i}}},{a_t}) = \frac{{\exp (Q({S_\mathrm{{i}}},{a_t}))}}{{1 + \exp (Q({S_\mathrm{{i}}},{a_t}))}} \end{aligned}$$

(11)

According to Eq. (11), when the Q-value of the initial state is 0, in the early stage of selection, each action has the same selection probability and is more random. When the learning reaches a certain degree, the reward value of each action begins to stand out, and the probability P increases with the new Q-value. As the system changes, it will gradually tilt towards the action of selecting the high reward value.

In this paper, three kinds of selection strategies are discussed, which are “completely random” strategy, “semi-random” strategy and greedy strategy.

Introduce the parameter $\psi $,where $\psi $ is a random number between [0, 1], the choice of parameter $\psi $ will be corrected below, there are the following choices.

1.
When $\psi $=0, when $P > \psi $, the greedy strategy is implemented.
2.
When $\psi $=1, when $P < \psi $, a “completely random” strategy is implemented, i.e., the actions are chosen randomly throughout the selection process.
3.
when $\psi \in (0,1)$, then the implementation of “semi-random” strategy, that is, the introduction of a “Sigmoid” function as a dynamic selection strategy, when $P \ge \psi $, the algorithm performs a greedy strategy to select the action with the largest value of the current reward; when $P < \psi $, the algorithm adopts a “completely random” strategy to select actions randomly.

In the initialization phase of the QFOA algorithm, the Q-value is set to 0, and the “Sigmoid” function is used as a dynamic selection strategy, so that the improved algorithm selects actions randomly in the early stage, and then dynamically changes between randomly selecting actions and selecting the action with the highest reward value in the later stage, in order to improve the ability of population evolution. In this paper, four kinds of neighbourhood perturbation operators are used as the action set, and the completion time of the job is the state set, and the Q-learning algorithm selects the appropriate action according to the current state, and then provides feedback on the result of the execution, and updates the Q-table according to the current searching state, in order to select the next perturbation operator in a reasonable way. The specific design mainly consists of four parts: state set, action set, reward function and selection strategy.

(1) State set

In QFOA, the definition of the state space is the key to the selection of actions. At different times, the set of states is different, which has a great influence on the selection of subsequent actions. the selection of the subsequent action has a great influence. In this paper, the state space S = $\{ 0,1\} $ is represented in binary according to whether the quality of the population is improved after executing the action, and the state is s = 1 if the quality of the solution is improved, and s = 0 otherwise.

(2) Action set

In scheduling, Agent will choose the action according to the current state. In QFOA,four kinds of neighbourhood perturbation operators $({N_1-N_4})$ are defined in as action sets.

(3) Reward function

After taking action, individual fruit fly can receive a reward R. In QFOA, the reward R is calculated based on the makespan, i.e.

$$\begin{aligned} R = \left\{ {\begin{array}{*{20}{l}} {1,\mathrm{{ }}{f_{new}} > {f_{old}}}\\ {0,\mathrm{{ }}{f_{new}} = {f_{old}}}\\ { - 1,\mathrm{{ }}{f_{new}} < {f_{old}}} \end{array}} \right. \end{aligned}$$

(12)

where $f_{new}$ and $f_{old}$ represent the old and new individual fitness values.

(4) Selection strategy

A dynamic selection strategy for improving the Q-learning algorithm is proposed, and the introduction of the “Sigmoid” function as a probability function for selection is introduced in the context of the objectives of this paper.

(5) Update the Q-value in the Q-table according to Equation (10). At the completion of each iteration, update so individuals and states. Table 2 shows the Q-table.

Table 2 Q-table

Full size table

Visual stage

In the traditional FOA algorithm, the fruit fly population flies to a new position if it is better than the original central position; otherwise, it stays at the current central position. However, if the central position is not updated for a long time, the search process will stagnate. In fact, those abandoned locations already have some search information close to the food source, and further search around them may yield better results. Therefore, we adopt a new location update strategy in QFOA. Specifically, if the new location is better than the original central location of the current fruit fly population, all individual fruit flies fly towards that location. Otherwise, an acceptance criterion of a class of simulated annealing [28] is used to decide whether to accept the new position as the central position. First, generate a random number $rand \in [0,1]$ that follows a uniform distribution, and if $rand \in [0,1]$ is smaller than the acceptance probability $\eta $, the position is accepted as the central position. The acceptance probability $\eta $ is calculated as follows.

$$\begin{aligned} \eta = {e^{ - \frac{{{{C'}_{\max }} - {C_{best}}}}{{Temp}}}} \end{aligned}$$

(13)

where ${C'}_{\max }$ is the current maximum completion time of the fruit fly and Temp is the temperature coefficient, calculated as follows.

$$\begin{aligned} Temp = {T_p} \times \sum \limits _{j = 1}^n {\sum \limits _{i = 1}^m {\frac{{{P_{j,i}}}}{{10 \times m \times n}}} } \end{aligned}$$

(14)

where $T_p$ is the temperature constant, the value of reference [32], $T_p$ is 0.6.

It can be seen from Eq. (13) that $\eta $ is close to 1 when ${C'}_{\max }$ is close to ${C}_{best}$, and the opposite is close to 0 and is discarded. The above reception strategy can maintain the new location in the current search area, while also avoiding the algorithm from falling into local optima by accepting some poor new locations.

Local search

For DPFSP, the most commonly used local search operations are insertion and exchange [1, 58], and these two methods are often accomplished by moving a job. However, for our problem, it has been experimentally proven that the solutions generated by local search methods that move a single job often do not show significant improvement in the later stages of the algorithm. The reason for this result is that moving a job often destroys the constructed job blocks, which carry more sequence features than a single job, and Zhang et al. [48] have proven that making full use of information can significantly enhance the caliber of the solutions.

Therefore, we design a local search algorithm based on job blocks (LsBlock). Its operating object is the key factory. We remove the job block $\delta $ consisting of c($c \in [1,3]$) consecutive jobs from the critical factory ($F_c$) and then reinsert the job blockinto the optimal position. Figure 6 provides an example for detailed explanation.To provide a solution ${\pi = \{ {\pi _1},{\pi _2},...,{\pi _f}\} }$, we first remove a job block $\delta $ with a length of c starting from s from the key factory. In Fig. 6, $n_{fc}$ represents the number of jobs in the factory $F_c$, here $n_{fc}$ = 4, the starting job sequence is $\{8,5,2,4\}$, the deleted job block is $\delta $ = $\{ 5,2\}$, the remaining job sequence is $\{8,4\}$, after testing, inserting the job block $\delta $ = $\{5,2\}$ before the job sequence $\{8,4\}$ has the best effect, thus forming a new job sequence $\{5,2,8,4\}$. By deleting job blocks, the sequence information of job blocks can be fully utilized, making it more suitable for solving DPFSP. LsBlock pseudo code is shown in algorithm 2.

The detailed details of the QFOA algorithm include the population initialization, the design of the smell and visual phases, and the local search composition. the pseudo-code of the QFOA algorithm is shown in Algorithm 3.

Experimental analysis

The experiment runs on Windows 10 64-bit OS, Intel Core i7-4790 CPU at 3.6 GHz, 8.0 GB RAM, and MATLAB R2019b.

To verify the effectiveness of QFOA, 720 large-scale test instances against DPFSP proposed by Naderi and Ruiz [20] are used in this paper to test the effectiveness of the algorithm. These 720 arithmetic instances are composed of test cases with 120 PFSPs proposed by Taillard [35] and 6 distributed scenarios ($f \in \{2,3,4,5,6,7\} $). Each scenario considers 120 test instances from Taillard. Each scale contains 10 different use instances, so that each scenario consists of 12 groups of use instances of different scales ($n \times m:\{ 20,50,100\} \times \{ 5,10,20\},\{ 200\} \times \{ 10,20\},\{ 500\} \times \{ 20\} $).

In order to better validate the performance of QFOA, the QFOA algorithm was analysed in terms of stability, accuracy, convergence, variability, and composition, and so I haverelative percentage deviation (RPD) [22, 54] has been taken as an evaluation metric in order to describe the problem more accurately and its equation is shown below:

$$\begin{aligned} RPD = \frac{{{C_i} - {C_{best}}}}{{{C_{best}}}} \times 100 \end{aligned}$$

(15)

where $C_i$ denotes the $C_{max}$ obtained by a particular algorithm at the ith time, and denotes $C_{best}$ the optimal solution of the algorithm instance.

We set the termination criterion for all algorithms to the maximum CPU runtime, i.e.: $T = C \times m \times n\times f(ms)$ and C is set to 5,15,30 levels. Each algorithm is run independently 10 times.

Parameter analysis

The parameters of QFOA algorithm are $ \alpha $ learning rate, $\gamma $ discount factor and Psize population size. The parameter settings are as follows: There are a total of $4 \times 4 \times 4 = 64$ different combinations of configurations of these three parameters. This is because the calibration algorithm using the same instances can lead to overfitting results. To calibrate the parameters of QFOA, this chapter regenerates a new set of test instances using the same generation method as the Tarllard instance [35], which contains 20 instances of different sizes, i.e. $\{ 40,80,120,60,200\} \times \{ 5 \times 10 \times 15 \times 20\} $. The machining time of the job on each machine is randomly generated in the interval [1, 99] according to a uniform distribution. In this chapter, three different distributed machining scenarios are considered in the parameter calibration phase, i.e.:$f \in \{2,4,6\} $. Therefore, we use 60 test instances for the parameter configuration of QFOA. Each configuration is executed 5 times on each instance. The termination criterion for all algorithms is the maximum CPU runtime, i.e., $T = C \times m \times n\times f(ms)$, where C is 10. RPD is the response value, and ANOVA is used to analyze the experimental results. Table 3 shows the ANOVA results, and Fig. 7 shows the main effect plots of all parameters.

Table 3 Results of ANOVA on QFOA parameters

Full size table

Table 3 shows that all three parameters have an impact on QFOA’s performance, as their P-values are less than 0.05 confidence level. The largest F-ratio indicates the greatest impact. More specifically, from Table 3, the F-ratio obtained from the 3 parameters is shown in descending order as $ \alpha $, Psize and $ \gamma $. In other words, the impact of the algorithm is ranked in this order. The variation curves of the parameters are then analysed in conjunction with Fig. 7, so that the most appropriate parameters can be selected. Firstly, the trend of changes in the parameter main effect graph is worth paying attention to. When the learning rate is $ \alpha $ = 0.6, the performance of the algorithm is the worst because the individual learning rate is low, which affects the performance of the algorithm. When the learning rate is $ \alpha $ = 0.9, the performance of the algorithm is optimal, but considering that if the learning rate is too high in the later stage of the algorithm, it will lead to a decrease in diversity and fall into local optimality, so we choose a learning rate of $ \alpha $ = 0.7. Secondly, for population size Psize, the trend changes as the number of individuals increases, and RPD value first decreases and then increases. The reason for the increase is that when population size is large, most of time is spent on selecting neighborhood structures, So that Psize = 15 is selected. Finally, for discount factor $\gamma $, we adopt the same method to combine Fig. 7 and select points with lowest change rate to determine values of $\gamma $, that is $\gamma $ =0.3. In summary, parameter selection is as follows: $ \alpha $ = 0.7, $\gamma $ = 0.3 and Psize = 15.

Algorithm analysis and comparison

To verify the performance of QFOA, QFOA is combined with 7 advanced intelligent algorithms to solve the 720 large-scale test cases described above. The comparison algorithms are HDDE [40], MDDE [54], DDE [48], CDE [47], IG [27], DABC [45] and DFFO [10]. The setting of termination condition T and evaluation standard RPD is the same as above.

Experimental comparative analysis when C = 5

Table 4 records the average RPD values of each algorithm under $T = 5 \times m \times n \times f (ms)$, and the statistical results are based on three key points: factory scale (f), job scale (n), and machine scale (m). From Table 4, it can be seen that no matter how many factories f and machines there are, the results of QFOA are better than those of other algorithms. It is worth noting that in Table 4, the average RPD values of QFOA for f = 2, 3, 4, 5, 6 and 7 are 0.149, 0.172, 0.176, 0.205, 0.292 and 0.240 respectively. The second-ranked is MDDE, which obtained 0.210(f = 2), 0.190(f = 3), 0.210(f = 4), 0.290(f = 5), 0.340(f = 6) and 0.310(f = 7) respectively. In general, QFOA has obvious advantages compared with other algorithms.

Table 4 Average RPD value when $C=5$

Full size table

In order to show more clearly, the data in Table 4 is statistically analyzed, and Fig. 8 shows the average scatter plot and interaction plot of the algorithm with f, m, and n. As shown in Fig. 8a, the average scatter plot of the algorithm, where QFOA performs the best with the smallest RPD value of 0.206, followed by MDDE with an RPD value of 0.258, and the remaining algorithms’ RPD values are ranked as DFFO, DABC, IG, DDE, CDE and HDDE have the worst effect. Figure 8b is the interaction diagram of the algorithm with f. The size of f has a greater impact on the performance of HDDE, CDE, DDE and IG, but has little impact on other algorithms. Figure 8c is the interaction diagram of the algorithm with n. For n = 50 and n = 200, IG and MDDE are slightly better than QFOA. HDDE, DDE, CDE and IG are greatly affected by the size of the job, and have a relatively small impact on the remaining algorithms such as QFOA and MDDE. Figure 8d tells us that considering different m, QFOA is statistically superior to other algorithms. And except for QFOA and MDDE, all other algorithms are affected by how many m values are taken.

Experimental comparative analysis when C = 15

Table 5 records the average RPD values of all algorithms under $T = 15 \times m \times n \times f (ms)$, and the statistical results are based on three key points: factory scale (f), job scale (n), and machine scale (m). From Table 5, it can be seen that no matter how many f and m there are, the results of QFOA are better than those of other algorithms. It is worth noting that in Table 5, the average RPD values of QFOA for f = 2, 3, 4, 5, 6 and 7 are 0.146, 0.153, 0.187, 0.182, 0.248 and 0.245 respectively. The algorithm after QFOA is MDDE, which obtained 0.300(f=2), 0.180(f = 3), 0.290(f = 4), 0.250(f = 5), 0.290(f = 6) and 0.390(f = 7) respectively. In general, QFOA has obvious advantages compared with other algorithms.

Table 5 Average RPD value when $C=15$

Full size table

Similarly, the data in Table 5 is statistically analyzed, and Fig. 9 shows the average scatter plot and interaction plot of the algorithm with m, n and f. As shown in Fig. 9a, the average scatter plot of the algorithm, where QFOA performs the best with the smallest RPD value of 0.194, followed by MDDE with an RPD value of 0.283, and the remaining algorithms’ RPD values are ranked as DFFO, DABC, IG, DDE, CDE and HDDE are also the two worst performing. Figure 9b–d are the interaction diagrams of the algorithm with m, n and f, respectively. Their development trends are similar to C = 5.

Experimental comparative analysis when C = 30

Table 6 records the average RPD values of all algorithms under $T = 30 \times m \times n \times f (ms)$, where the statistical perspective is the same as C = 10 and C = 20, respectively, based on three key points: factory scale (f), job scale (n), and machine scale (m). From Table 6, it can be seen that no matter how many m and f are, QFOA is the best. In Table 6, the average RPD values of QFOA for f = 2, 3, 4, 5, 6 and 7 are 0.140, 0.157, 0.161, 0.215, 0.195 and 0.203 respectively. The algorithm after QFOA is MDDE, which obtained 0.130 (f = 2), 0.220 (f = 3), 0.240 (f = 4), 0.300 (f = 5), 0.250 (f = 6) and 0.240 (f = 7) respectively. In general, QFOA has obvious advantages compared with other algorithms.

Table 6 Average RPD value when $C=30$

Full size table

Similarly, the data in Table 6 is statistically analyzed in the same way as Tables 4 and 5, and Fig. 10 shows the average scatter plot and interaction plot of the algorithm with m, n, and f. As shown in Fig. 10a, the average scatter plot of the algorithm, where QFOA performs the best with the smallest RPD value of 0.179, followed by MDDE with an RPD value of 0.247, and the remaining algorithms’ RPD values are ranked as, DFFO, DABC, IG, DDE, CDE and HDDE. Figure 10b–d are the interaction diagrams of the algorithm with m, n, and f, respectively. Their development trends are similar to C = 5 and C = 15.

Also the effect of different stopping times of C on different algorithms is given in Fig. 11. It can be seen from Fig. 11 that QFOA outperforms the other compared algorithms for different values of C. It is worth noting that MDDE and DFFO also work well, followed by DABC, IG, DDE, CDE and HDDE, which are shown by the above analysis to be the best performance of QFOA.

To further analyze the stability of algorithms such as QFOA and MDDE, we learned from literature [54] about the meaning represented by box plots. In summary, the narrower the box plot, the better the stability of the algorithm, and the closer to the bottom, the better the performance. Therefore, we statistically explain the RPD values obtained by algorithms such as QFOA and MDDE when solving large-scale examples from the perspective of box plots, thereby verifying the stability of QFOA.

Figures 12, 13 and 14 are box plots of algorithm and RPD values for different C values. We can clearly understand that the box plot of the QFOA algorithm is the narrowest under different C values, followed by MDDE, DFFO, DABC, IG, CDE, DDE and HDDE. Among them, HDDE has the worst effect when C is 5, 15 and 30. On the contrary, QFOA has the narrowest box plot when C is 5, 15 and 30, indicating that QFOA is the most stable. Secondly, both MDDE and QFOA’s boxes are closest to the bottom but QFOA’s median line is lower, indicating that its performance is optimal. This verifies the stability of QFOA.

Differential analysis

In order to illustrate the difference between QFOA and other advanced comparison algorithms, We use the RPD values obtained by solving 720 large-scale test instances with algorithms such as QFOA, and use two commonly used test methods, Mann–Whitney U and Friedman, to test the differences between QFOA and algorithms such as MDDE and DABC.

Mann–Whitney U test

The Mann–Whitney U test was mainly used to detect the difference between the two samples by first making the assumption that the difference between the two samples is not significant and then checking whether the P-value is less than 0.05, and less than 0.05 rejecting the original assumption that there is a difference between the samples. The results of our comparison using this method are shown in Table 7, from which it can be seen that all P-value are less than 0.05 regardless of whether the time step C is 5, 15 or 30, indicating that there are significant differences between QFOA and the other algorithms.

Table 7 Mann–Whitney U test results

Full size table

Friedman test

Friedman test is also often used to test the differences between samples. First, assume that the samples being tested are equivalent. If the P-value < 0.05, reject the null hypothesis, indicating that there is a difference. Based on this, we conducted a differential test on QFOA and algorithms such as MDDE and DFFO to see if the null hypothesis was rejected. The test results are shown in Table 8.

Table 8 Friedman test results

Full size table

In Table 8, statistical explanations are given from the total number(N), mean, standard deviation(Std), average ranks, maximum (Max) and minimum (Min) values. Among them, when C = 5, QFOA’s mean = 0.205, Std = 0.135, Max = 0.586 and Min = 0.000. Followed by DFFO, MDDE, DABC, IG, DDE, CDE and HDDE. In summary of the above four points, QFOA is the smallest. This shows that QFOA is better than comparison algorithms such as MDDE. As for P-value = 0.000 < 0.005 rejecting the null hypothesis, it indicates the difference between QFOA and other algorithms. As for C = 15 and C = 30, it is the same as when C = 5.

Furthermore, under different conditions of C = 5, 15 and 30, Bonferroni Dunn test was performed by controlling the QFOA algorithm and comparing it with algorithms such as MDDE and DFFO. The smaller the rank value of QFOA, the higher the ranking. The results are shown in Figs. 15, 16 and 17 respectively. When C = 5, it can be seen from Fig. 15 that QFOA (average rank = 1.85) ranks first and DFFO (average rank = 2.92) ranks second. This shows the superiority of the QFOA algorithm. When C = 15 is the same as when C = 5, it can be seen from Fig. 16 that QFOA (average rank = 1.92) ranks first and MDDE (average rank = 3.01) ranks second. When C = 30, it can be seen from Fig. 17 that QFOA (average rank = 2.11) ranks first and MDDE (average rank = 2.58) ranks second. In summary, there are not only significant differences between QFOA and algorithms such as MDDE and DFFO but also their performance is the most stable and optimal.

Convergence analysis

This section mainly discusses the convergence of QFOA, the evolution curves of two typical cases are shown in Figs. 18 and 19. The sizes of the four typical cases are $100 \times 20 \times 6$ and $200 \times 10 \times 7$ respectively. Only the six algorithms with the best performance, QFOA, MDDE, DFFO, DABC, IG and FOA are used for convergence analysis experiments. The algorithm termination criterion is $T=5\times m\times n \times f(ms)$.

Figure 18, it can be observed that for instances $100 \times 20 \times 6$, QFOA, FOA, DFFO, MDDE, IG and DABC, the disparities in initial solutions are minimal. With increasing runtime, all algorithms exhibit a noticeable downward trend. Particularly, the convergence curve of QFOA remains significantly lower compared to DFFO, MDDE, IG and DABC, achieving rapid convergence by 3000 ms. Conversely, FOA swiftly falls into local optima, necessitating a longer time to converge. Similarly, DFFO, MDDE, IG and DABC also exhibit extended convergence times. In summary, QFOA outperforms DFFO and MDDE, among other comparative algorithms, in both convergence speed and accuracy. Similarly, Fig. 19 shows the convergence curve of example $200 \times 10 \times 7$, and the analysis scheme is similar to Fig. 18, obtaining the same results. From these two different scale examples, it can be seen that QFOA is also optimal in terms of convergence.

Component analysis

In this section, we mainly discuss the contribution of the proposed strategy to the QFOA algorithm. QFOA is mainly composed of the initialization method, the smell phase combined with Q-learning and the visual phase with the addition of an annealing mechanism. This paper verifies the contribution of these strategies through for sets of experiments.

1.
QFOA/R: Random initialization of fruit fly population center position.
2.
QFOA/IN: Initialize the central position of fruit fly population using NEH_R.
3.
QFOA/RL: Q-learning algorithm selection strategy is removed and random selection strategy is used instead.
4.
QFOA/LC: The operations based on job blocks in the local search are removed and replaced by the operations of a single job.
5.
QFOA/SA: Remove the simulated annealing criterion.
6.
QFOA/CR:Using Q-learning algorithm, the strategy selection decision is complete ly random.
7.
QFOA/EG: The Q-learning algorithm is used to choose the greedy strategy.

Each algorithm is run 10 times independently on each arithmetic case with a stopping criterion of $T=5\times m\times n \times f(ms)$. The test arithmetic cases and evaluation metrics are the same as in Sect. “Algorithm analysis and comparison”. The results of QFOA as well as the variants are given in Table 9.

Table 9 Average RPD value of QFOA and its variants

Full size table

From Table 9, the results of QFOA/IN outperform QFOA/R illustrating that the improved NEH initialization rule helps to improve the performance of QFOA. The results of QFOA are superior to QFOA/LC, indicating that the designed local search method based on job blocks performs better than a single job. The performance of QFOA outperforms QFOA/RL illustrating that the inclusion of the Q-learning algorithm facilitates the selection of a high-quality neighborhood structure and ensures the feasibility of the solution. By comparing QFOA/CR, QFOA/EG and QFOA, we can see that the result of QFOA/EG is relatively poor. By adopting greedy strategy, QFOA/EG falls into local optimal and never jumps out of local optimal. The effect of QFOA/CR algorithm is better than that of QFOA/EG. The reason is that QFOA/CR adopts the “completely random” strategy by chance, and every selection is random. However, after multiple implementation comparisons, it can obtain relatively better results, and QFOA is the most effective. Using the “semi-random” strategy, which changes with iteration and jumps out of the local optimal strategy, relatively good results can be obtained and the speed is fast. Similarly, the performance of QFOA is also better than that of QFOA/SA, indicating that adding an annealing mechanism has a high probability of avoiding premature convergence in the QFOA.

To verify that the above findings are statistically significant, this section uses ANOVA to test the variance of the experimental results for the seven algorithms in Table 9. Figure 20 shows the 95% confidence intervals for each variant. Figure 20, it can be seen that the interval between QFOA and its variants does not overlap, indicating that QFOA has differences with other variants and has the best effect. To further illustrate, the effectiveness of the improved initialisation method can be seen from the comparison of QFOA/R and QFOA/IN in Fig. 20a. And the comparison of QFOA/IN with QFOA can indirectly reflect the importance of initialisation diversity. Therefore, we use the NEH_R method to initialize 10% of individuals. From Fig. 20b, we can see that QFOA results are better than QFOA/LC, which demonstrates that local search based on job blocks is better than single job, which is caused by the fact that moving a job tends to destroy constructed blocks of jobs that carry more sequence features than single job, especially when all the jobs are essentially sorted out by the later stages of the algorithm. From the comparison between QFOA/RL and QFOA in Fig. 20c, the effectiveness of introducing the Q-learning algorithm is verified. From the comparison between QFOA/SA and QFOA in Fig. 20d, the effectiveness of introducing the annealing algorithm is verified. Figure 20e and Fig. 20f show that the effect of QFOA is superior to that of QFOA/EG and QFOA/CR, because the “Sigmoid” function is introduced as a dynamic selection strategy, enabling the algorithm to randomly select actions in the early stage and dynamically select the optimal actions in the later stage. Jumping out of local optimal can get relatively good results and fast speed. From this, it can be seen that the strategies adopted by QFOA at each stage ensure QFOA’s ability to solve the problem being sought.

Experimental summary

This section focuses on summarising the above experiments. QFOA is tested against state-of-the-art algorithms such as MDDE, DABC and DFFO in terms of algorithmic accuracy and stability (Sect. “Algorithm analysis and comparison), variance (Sect. “Differential analysis”), convergence (Sect. “Convergence analysis”) and composition (Sect. “Component analysis”) in order to illustrate the effectiveness of QFOA.

Firstly, it can be learnt from Sect. “Algorithm analysis and comparison that QFOA outperforms algorithms such as MDDE, DFFO and DABC in solving large-scale examples when C = 5, 15 and 30.

Secondly, in Sect. “Differential analysis”, two tests, Mann-Whitney U and Friedman, are employed to demonstrate that QFOA is significantly different from algorithms such as MDDE, DFFO and DABC, and then, in Sect. “Convergence analysis”, the algorithms are tested for convergence, and QFOA is fast and efficient in solving large-scale examples. The reasons for this phenomenon are as follows: (1) The initial solution NEH_R produces a better initial solution for subsequent iterations; (2) The combination with the Q-learning is proposed so that it can choose the best way to generate candidate solutions in each iteration; (3) A new local search method LsBlock is designed. LsBock inserts job blocks to retain more sequence information.

Finally, in Sect. “Component analysis”, seven QFOA variants are designed to investigate the contribution of the three suggested strategies: QFOA/R, QFOA/IN, QFOA/NR, QFOA/RL and QFOA/SA. From the comparison between QFOA/R and QFOA/IN, the effectiveness of the improved initialization method can be seen. From the comparison between QFOA/IN and QFOA, the importance of initialization diversity is indirectly reflected. Therefore, the NEH_R method is used to initialise 10% of the individuals. The effectiveness of the local search approach based on job blocks is illustrated by comparing QFOA/LC and QFOA. The effectiveness of introducing the Q-learning is verified by comparing QFOA/RL with QFOA. From the comparison between QFOA/SA and QFOA, the effectiveness of introducing the annealing algorithm is verified. Through the comparison of QFOA/EG, QFOA/CR and QFOA, the effectiveness of introducing “Sigmoid” function as a dynamic selection strategy is verified.

In summary, QFOA compares favourably with seven algorithms such as MDDE, DFFO, HDDE, DDE, CDE, IG and DABC in terms of convergence speed, accuracy and stability. Especially when solving real production scheduling problems, all algorithms under the same stopping criterion, the experimental results show that QFOA’s performance is at least 28.1% better than the other algorithms. Thus, the effectiveness of QFOA algorithm is verified.

Conclusion

In this paper, a QFOA algorithm is proposed to solve the DPFSP problem. In QFOA, NEH_R is used for population initialization. In the smell phase, four neighborhoods are designed and combined with the Q-learning algorithm to quickly select high-quality neighborhoods to update the solution and enhance the development ability of the algorithm. It is noteworthy that the “Sigmoid” function is introduced as a dynamic selection strategy to replace the greedy strategy of Q-learning algorithm to improve the evolutionary ability of the population. In addition, a local search method is designed for the characteristics of the DPFSP problem. In the visual phase, an annealing mechanism is introduced to avoid falling into local optima. Finally, the proposed QFOA is compared with the state-of-the-art algorithms for solving 720 well-known large-scale benchmark instances. Experimental results show that the performance of QFOA has an at least 28.1% improvement in solving large-scale instances compared to other algorithms.

Data availability

The dataset used and/or analyzed during the current research period can be obtained from corresponding author reasonable request.

References

Ali A, Gajpal Y, Elmekkawy TY (2020) Distributed permutation flowshop scheduling problem with total completion time objective. Opsearch 58:425–447
MathSciNet Google Scholar
Alireza G, Ali A, Mostafa H (2023) Efficient multi-objective meta-heuristic algorithms for energy-aware non-permutation flow shop scheduling problem. Expert Syst Appl 213:119077
Google Scholar
Babu KS, Vemuru S (2019) Spectrum signals handoff in lte cognitive radio networks using reinforcement learning. Traitement du Signal 36:119–125
Google Scholar
Chen RH, Yang B, Li S (2020) A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem. Comput Ind Eng 149:106778
Google Scholar
Dong H, Wang ZB (2022) Distributed assembly replacement flow shop scheduling based on iiga. Manuf Technol Mach Tools 11:169–176
Google Scholar
Du SL, Zhou WJ, Wu DK (2023) An effective discrete monarch butterfly optimization algorithm for distributed blocking flow shop scheduling with an assembly machine. Expert Syst Appl 225:120113
Google Scholar
Fernandez-Viagas V, Framinan J (2015) A bounded-search iterated greedy algorithm for the distributed permutation flowshop scheduling problem. Int J Prod Res 53:1111–1123
Google Scholar
Fernandez-Viagas V, Perez-Gonzalez P, Framinan J (2018) The distributed permutation flow shop to minimise the total flowtime. Comput Ind Eng 118:464–477
Google Scholar
Fu Y, Zhou M, Guo X (2021) Stochastic multi-objective integrated disassembly-reprocessing-reassembly scheduling via fruit fly optimization algorithm. J Clean Prod 278:123364
Google Scholar
Guo HW, Sang HY, Zhang XJ (2023) An effective fruit fly optimization algorithm for the distributed permutation flowshop scheduling problem with total flowtime. Eng Appl Artif Intell 123:106347
Google Scholar
Hsieh YZ, Su MC (2016) A q-learning-based swarm optimization algorithm for economic dispatch problem. Neural Comput Appl 27:2333–2350
Google Scholar
Huang L, Wang G, Bai T (2017) An improved fruit fly optimization algorithm for solving traveling salesman problem. Front Inform Technol Electron Eng 18:1525–1533
Google Scholar
Ibrahim IA (2022) A hybrid wind driven-based fruit fly optimization algorithm for identifying the parameters of a double-diode photovoltaic cell model considering degradation effects. Sustain Energy Technol Assess 50:101685
Google Scholar
Jiang L, Huang H, Ding Z (2020) Path planning for intelligent robots based on deep q-learning with experience replay and heuristic knowledge. IEEE/CAA J Autom Sin 7:1179–1189
Google Scholar
Jing X, Pan Q, Gao L (2020) An effective iterated greedy algorithm for the distributed permutation flowshop scheduling with due windows. Appl Soft Comput 96:106629
Google Scholar
Komaki M, Malakooti B (2017) General variable neighborhood search algorithm to minimize makespan of the distributed no-wait flow shop scheduling problem. Prod Eng Res Dev 11:315–329
Google Scholar
Li JQ, Bai SC, Duan PY (2019) An improved artificial bee colony algorithm for addressing distributed flow shop with distance coefficient in a prefabricated system. Int J Prod Res 57:6922–6942
Google Scholar
Lin SW, Ying KC (2016) Minimizing makespan for solving the distributed no-wait flowshop scheduling problem. Comput Ind Eng 99:202–209
Google Scholar
Meng T, Pan QK, Wang L (2019) A distributed permutation flow shop scheduling problem with the customer order constraint. Knowl-Based Syst 184:104894
Google Scholar
Naderi B, Ruiz R (2010) The distributed permutation flowshop scheduling problem. Comput Oper Res 37:754–768
MathSciNet Google Scholar
Nawaz M, Enscore E, Ham I (1983) A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. Omega 11:91–95
Google Scholar
Pan Q, Gao L, Wang L (2019) Effective heuristics and metaheuristics to minimize total flowtime for the distributed permutation flowshop problem. Expert Syst Appl 124:309–324
Google Scholar
Pan WT (2012) A new fruit fly optimization algorithm: taking the financial distress model as an example. Knowl-Based Syst 26:69–74
Google Scholar
Qian B, She MZ, Hu R (2021) The hyperheuristic cross-entropy algorithm solves the fuzzy distributed pipeline green scheduling problem. Control Decis Mak 36:1387–1396
Google Scholar
Qin H, Li T, Teng Y (2021) Integrated production and distribution scheduling in distributed hybrid fow shops. Memetic Comput 13:185–202
Google Scholar
Ren JF, Ye CM, Li Y (2021) A new solution to distributed permutation flow shop scheduling problem based on nash q-learning. Adv Prod Eng Manag 13:136–146
Google Scholar
Ruiz R, Pan Q, Naderi B (2019) Iterated greedy methods for the distributed permutation flowshop scheduling problem. Omega 83:213–222
Google Scholar
Ruiz R, Stutzle T (2007) A simple and effective iterated greedy algorithm for the permutation flowshop scheduling problem. Eur J Oper Res 177:2023–2049
Google Scholar
Saminathan K, Thangavel R (2022) Energy efficient and delay aware clustering in mobile adhoc network: a hybrid fruit fly optimization algorithm and whale optimization algorithm approach. Concurr Comput Pract Exp 11:6867
Google Scholar
Sang HY, Pan QK, Li JQ (2019) Effective invasive weed optimization algorithms for distributed assembly permutation flow shop problem with total flowtime criterion. Swarm Evol Comput 44:64–73
Google Scholar
Shao W, Pi D, Shao Z (2017) Optimization of makespan for the distributed no-wait flow shop scheduling problem with iterated greedy algorithms. Knowl-Based Syst 137:163–181
Google Scholar
Shao Z, Pi D, Shao W (2019) Hybrid enhanced discrete fruit fly optimization algorithm for scheduling blocking flow-shop in distributed environment. Expert Syst Appl 145:113147
Google Scholar
Shen XN, Minku LL, Marturi N (2018) A q-learning-based memetic algorithm for multi-objective dynamic software project scheduling. Inf Sci 428:1–29
MathSciNet Google Scholar
Song HB, Yang YH, Lin J (2023) An effective hyper heuristic-based memetic algorithm for the distributed assembly permutation flow-shop scheduling problem. Appl Soft Comput 135:110022
Google Scholar
Taillard E (1993) Benchmarks for basic scheduling problems. Eur J Oper Res 64:278–285
Google Scholar
Tasgetiren MF, Pan QK, Kizilay D (2019) A variable block insertion heuristic for permutation flow shops with makespan criterion. In: 2017 IEEE congress on evolutionary computation, p 12
Wang G, Gao L, Li X (2020) Energy-efficient distributed permutation flow shop scheduling problem using a multi-objective whale swarm algorithm. Swarm Evol Comput 57:100716
Google Scholar
Wang HX, Sarker BR, Li J (2021) Adaptive scheduling for assembly job shop with uncertain assembly times based on dual q-learning. Int J Prod Res 59:5867–5883
Google Scholar
Wang JJ, Wang L (2018) A knowledge-based cooperative algorithm for energy efficient scheduling of distributed flow-shop. IEEE Trans Syst Man Cybern Syst 50:1–15
Google Scholar
Wang L, Pan QK, Suganthan PN (2010) A novel hybrid discrete differential evolution algorithm for blocking flow shop scheduling problems. Comput Oper Res 37:509–520
MathSciNet Google Scholar
Wang L, Xiong YN, Li SW (2019) New fruit fly optimization algorithm with joint search strategies for function optimization problems. Knowl-Based Syst 176:77–96
Google Scholar
Wang Z, Deng QW, Zhang LK (2023) Joint optimization of integrated mixed maintenance and distributed two-stage hybrid flow-shop production for multi-site maintenance requirements. Expert Syst Appl 215:119422
Google Scholar
Watkins C, Dayan P (1992) Q-learning. Mach Learn 8:279–292
Google Scholar
Yang XL, Hu R, Qian B (2019) Enhanced distribution estimation algorithm solves for distributed displacement flow workshop scheduling. Control Theory Appl 36:803–815
Google Scholar
Yang Y, Zhang F, Huang J (2023) Acceleration-based artificial bee colony optimizer for a distributed permutation flowshop scheduling problem with sequence-dependent setup times. Appl Soft Comput 135:110029
Google Scholar
Yang YH, Li X (2022) A knowledge-driven constructive heuristic algorithm for the distributed assembly blocking flow shop scheduling problem. Expert Syst Appl 202:117269
Google Scholar
Zhang G, Xing K (2019) Differential evolution metaheuristics for distributed limited-buffer flow shop scheduling with makespan criterion. Comput Oper Res 108:33–43
MathSciNet Google Scholar
Zhang GH, Xing KY, Cao F (2018) Discrete differential evolution algorithm for distributed blocking flow shop scheduling with makespan criterion. Eng Appl Artif Intell 76:96–107
Google Scholar
Zhang QC, Lin M, Yang LT (2019) Energy-efficient scheduling for realtime systems based on deep q-learning model. IEEE Trans Sustain Comput 4:132–141
Google Scholar
Zhang X, Liu X, Tang S (2019) Solving scheduling problem in a distributed manufacturing system using a discrete fruit fly optimization algorithm. Energies 12:3260
Google Scholar
Zhang XH, Liu XH, Cichon A (2022) Scheduling of energy-efficient distributed blocking flowshop using pareto-based estimation of distribution algorithm. Expert Syst Appl 200:116910
Google Scholar
Zhao F, Ma R, Wang L (2021) A self-learning discrete jaya algorithm for multi objective energy-efficient distributed no-idle flow-shop scheduling problem in heterogeneous factory system. IEEE Trans Cybern 52:20
Google Scholar
Zhao F, Zhao L, Wang L (2020) An ensemble discrete differential evolution for the distributed blocking flowshop scheduling with minimizing makespan criterion. Expert Syst Appl 160:1136781–11367821
Zhao FQ, Hu XT, Wang L (2022) A memetic discrete differential evolution algorithm for the distributed permutation flow shop scheduling problem. Complex Intell Syst 8:141–161
Google Scholar
Zhao FQ, Qin S, Zhang Y (2019) A hybrid bio-geography based optimization with variable neighborhood search mechanism for no-wait flow shop scheduling problem. Expert Syst Appl 126:321–339
Google Scholar
Zhao FQ, Xue FL, Zhang Y (2019) A discrete gravitational search algorithm for the blocking flow shop problem with total flow time minimization. Appl Intell 49:3362–3382
Google Scholar
Zheng X, Wang L (2016) A two-stage adaptive fruit fly optimization algorithm for unrelated parallel machine scheduling problem with additional resource constraints. Expert Syst Appl 65:28–39
Google Scholar
Zhu NN, Zhao FQ, Wang L (2022) A discrete learning fruit fly algorithm based on knowledge for the distributed no-wait flow shop scheduling with due windows. Expert Syst Appl 198:116921
Google Scholar

Download references

Acknowledgements

This research was supported by National Natural Science Foundation of China (grant nos. 62373146), Natural Science Foundation of Hunan Province (grant nos.2022JJ30265), Young Talent of Lifting Engineering for Science and Technology in Hunan Province (2022TJ-Q03), the Outstanding Youth Project of Education Department of Hunan Province (22B0476) and the Key Project of Education Department of Hunan Province of china (23A0382).

Author information

Authors and Affiliations

School of Mechanical Engineering, Hunan University of Science and Technology, Xiangtan, 411100, Hunan, China
Cai Zhao
School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, 411100, Hunan, China
Lianghong Wu, Cili Zuo & Hongqiang Zhang

Authors

Cai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lianghong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Cili Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Hongqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Cai Zhao: Conceptualization, Methodology, Software. Lianghong Wu: Data curation, Writing-Original Draft, Supervision. Cili Zuo: Visualization. Hongqiang Zhang: Supervision

Corresponding author

Correspondence to Lianghong Wu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, C., Wu, L., Zuo, C. et al. An improved fruit fly optimization algorithm with Q-learning for solving distributed permutation flow shop scheduling problems. Complex Intell. Syst. 10, 5965–5988 (2024). https://doi.org/10.1007/s40747-024-01482-4

Download citation

Received: 12 January 2024
Accepted: 01 May 2024
Published: 25 May 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s40747-024-01482-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An improved fruit fly optimization algorithm with Q-learning for solving distributed permutation flow shop scheduling problems

Abstract

Similar content being viewed by others

A cooperated shuffled frog-leaping algorithm for distributed energy-efficient hybrid flow shop scheduling with fuzzy processing time

Q-learning-based multi-objective particle swarm optimization with local search within factories for energy-efficient distributed flow-shop scheduling problem

The Hybrid Shuffle Frog Leaping Algorithm Based on Cuckoo Search for Flow Shop Scheduling with the Consideration of Energy Consumption

Explore related subjects

Introduction

Literature review

Related scheduling problems and solutions

Research gap

Problem description

QFOA for DPFSP with minimizing makespan

Representation of solutions

Initialization

Smell phase

Neighborhood structure

Combining with Q-learning

Visual stage

Local search

Experimental analysis

Parameter analysis

Algorithm analysis and comparison

Experimental comparative analysis when C = 5

Experimental comparative analysis when C = 15

Experimental comparative analysis when C = 30

Differential analysis

Mann–Whitney U test

Friedman test

Convergence analysis

Component analysis

Experimental summary

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation