1 Introduction

Researchers use optimization in nearly every study area. Optimization remains a fundamental challenge in science and engineering, primarily because of the difficulty of real-world problems and the limitations of traditional methods. Randomization Search Algorithms (RSAs) are among the most flexible and most efficient methodologies to solve complicated problems. These algorithms are mostly polynomial-time algorithms and have significantly lower computational complexity. As one of the most commonly used RSAs, metaheuristics are algorithms that are inspired by natural phenomena to perform optimization. Metaheuristics perform very well in exploring the feasible region and evade local optimum using effective movement processes.

Healthcare science is one of the top research areas in which metaheuristics have been widely applied to. Using these algorithms, scientists can optimize healthcare systems significantly in terms of several objectives, including minimizing cost, waiting time, service time, delivery time, and maximizing reliability or customer satisfaction. In December 2019, a novel strain of coronavirus called SARS-Cov-2 discovered in China. The virus causes COVID-19, a severe respiratory disease. Regardless of primary measures applied by the government of China, the disease spread quickly to many countries leading to 26,795,847 infected cases and 878,963 deaths. Currently, there are no effective medications and vaccines for the disease. However, the effectiveness of some treatment options is under study via clinical trials (Health Canada 2020; WHO 2020). Although most people with mild COVID-19 symptoms recover independently, some other people with severe and critical symptoms need hospitalization in wards and Intensive Care Units (Public Health Authority of Canada 2020). However, due to the limited capacity of the healthcare system, it is impossible to admit all the patients in hospitals.

Efficient modeling and prediction of the COVID-19 pandemic will meaningfully aid the officials and healthcare experts in making decisions to stop the spread of the disease. Besides, by forecasting the upcoming trend of the epidemic, we can also optimally allocate resources to hospitals that will avoid equipment shortages and save patients’ lives. Prediction of the COVID-19’s trend is challenging because of its uncertain nature and complication. Recently, scientists have provided a novel model to simulate the COVID-19 pandemic called SIDARTHE and was initially offered in research by Giordano et al. (2020) published in Nature Medicine. The researchers highlighted the efficiency of the proposed formulation in modeling the pandemic growth. Nevertheless, they emphasized that solving the presented set of differential equations is difficult because of the exceptional characteristics of the model.

In the current research, we offer a novel search methodology that will solve many complex optimization problems very efficiently in a short time. Our algorithm simultaneously benefits from the advantages of machine learning (ML) and evolutionary computation (EC). In our research, we propose a hybrid algorithm that involves a reinforcement learning (RL) technique as the main engine and several ECs as updating operators. The learning process achieved by the RL method accelerates the algorithm and enables us to resolve complicated large-scale problems. The procedure utilizes several operators to enhance the exploration and exploitation capabilities of the algorithm. These features help the proposed process to sidestep local optimum while exploiting the solution space intelligently. We implement the algorithm on several well-known recently developed benchmarks to show its efficiency in solving such complex problems. Moreover, we highlighted significant differences in the performance of the method comparing to other methodologies using robust statistical tests.

Furthermore, we use the proposed algorithm to model and predict the COVID-19 pandemic in Quebec, Canada, that has the most cases of COVID-19 in the country. Our results accurately predict the peak in the number of infected cases of COVID-19 in the province. The outcomes also determine the peak of the number of cases that develop life-threatening symptoms that will require hospitalization. We also perform complex sensitivity analyses to portray future scenarios that enable us to provide detailed information to policymakers and healthcare professionals. In our study, we also measure the effectiveness of implementing measures such as social distancing and partial lockdown on pandemic growth.

The rest of the current study is prepared as follows: In Sect. 2, we deliver a comprehensive review of existing research on the topic. In Sect. 3, we offer a novel algorithm to resolve many complex problems using RL and EC. In Sect. 4, we apply our algorithm to several well-known benchmark functions. We assess the performance of the suggested approach and compare its efficiency to state-of-the-art methods. In Sect. 5, we implement our method to model and predict the COVID-19 pandemic in Quebec, Canada. In Sect. 6, we perform sensitivity analyses and provide valuable managerial insights to fight the COVID-19 pandemic. In Sect. 7, we conclude the paper.

2 Survey on research conducted

The core idea behind most EC algorithms is to follow a swarm intelligence that is inspired by animal behavior and natural phenomenon (Khalilpourazari and Pasandideh 2019). Mirjalili and Lewis (2016) categorized metaheuristics into four main groups: evolutionary algorithms (EAs), physics-based algorithms (PAs), swarm algorithms (SAs), and machine learning-based algorithms (MLAs). EAs imitate the evolution procedure in nature to solve complex problems. PAs utilize laws of physics that enable this family of algorithms to handle complicated problems. On the other hand, SAs simulate the swarm behavior of many individuals in a group. Also, MLAs use artificial intelligence and machine learning to enhance the performance of the previous families of algorithms in terms of exploration and exploitation. Table 1 provides some of the most advanced algorithms on metaheuristics developed in recent years.

Table 1 Classification of the metaheuristics

Many researchers used these algorithms to solve complex optimization problems in different fields (Hoursan et al. 2020; Tirkolaee et al. 2020a, b; Sangaiah et al. 2020; Lotfi et al. 2018, 2019, 2020; Zare Mehrjerdi and Lotfi 2019; Khalilpourazari et al. 2020c; Hashemi Doulabi et al. 2020a, b). Defined by the no free lunch (NFL) theorem, we can logically prove that no single method performs optimally in resolving all problems (Adam et al. 2019; Wolpert and Macready 1997). Any metaheuristic algorithm may perform well in some benchmarks but weak in others. This theorem makes this field of study highly interesting for researchers searching for an algorithm that performs promising in many benchmarks. In our algorithm, we consider several ECs and operators and let an RL method decide which algorithm to use to relocate each element. Besides, learning during the optimization process will significantly reduce the computational burden and improve the quality of the results. The learning process accelerates the algorithm due to the fact that using the learning process over iterations, the algorithm adapts its operators to perform the best for each problem.

Moreover, we consider a proper framework to maintain a decent equilibrium between exploration and exploitation of the feasible region over generations of the algorithm to avoid local optima. We show the efficiency of our algorithm on the most complex benchmarks in the literature. Besides, we apply the suggested algorithm to predict the COVID-19 pandemic in Quebec, Canada. Our outcomes express that the designed algorithm robustly predicts the future trends of the pandemic.

3 Algorithm development

Metaheuristics work based on randomization. By randomization, we mean that these algorithms use random stepsizes while updating each particle’s position in the solution space. Metaheuristics use unique operators and strategies to update the position of each particle (solution). The efficiency of these operators and algorithms significantly depends on the solution space of the problem. For instance, some algorithms follow a direct updating procedure, such as water cycle algorithm (WCA), while some other encircle the best solution to update the position of a given particle, such as Grey–Wolf optimizer (GWO). Each of these operators and moving strategies has unique advantages that enable the algorithms to perform well in optimizing specific problems. Therefore, developing new algorithms that could efficiently solve a higher number of optimization problems is essential.

In this study, we use several operators (moving strategies) from various algorithms to update the particles’ position in the solution space. For updating each particle, we have to choose an operator from the given set of operators. Determining the best strategy and the most efficient operator for any given optimization problem is computationally challenging. Therefore, we use a reinforcement learning method that learns and optimizes the choice of operators during the optimization process to achieve optimal performance. In the following, we first describe all the features of the offered algorithm, and then we define the algorithm in a unique structure.

3.1 Q-learning

Reinforcement learning (RL) that approximates dynamic programming, and neuro-dynamic programming, is a type of machine learning which determines the best actions in a specific environment to maximize a reward (Bertsekas 2019). One of the main features of reinforcement learning is that the agent receives a reward or punishment after executing an action. The RL continues to interact with the environment to achieve the optimal policy via trial and error.

The Q-Learning is one of the most efficient reinforcement learning algorithms that determine an optimal policy by evaluating taken actions using the environment. Q-learning is a way to optimize solutions in a Markov decision process (MDP) problem (Akhtar and Farrukh 2017). The Q-learning algorithm aims to maximize the anticipated reward by determining the optimal state-action pairs. The algorithm uses a \( Q\left( {s,a} \right) \) table where \( s_{t} \) is the state and \( a_{t} \) is the action at time step t, and \( Q \) is the cumulative reward matrix. The algorithm updates the components of the Q-table \( Q\left( {s,a} \right) \) iteratively using Eq. (1).

$$ Q_{{\left( {t + 1} \right)}} \left( {s_{t} ,a_{t} } \right) = Q_{t} \left( {s_{t} ,a_{t} } \right) + \epsilon_{t} \left( {r_{t} + \gamma max\left( {Q_{t} (s_{t + 1} ,a_{t + 1} )} \right) - Q_{t} \left( {s_{t} ,a_{t} } \right)} \right), $$
(1)

In Eq. (1), \( \epsilon_{t} \) denotes the learning rate parameter and \( r_{t} \) is the obtained reward/punishment from the current action. Besides, the expression \( \gamma \) is the scaling factor. One of the main challenges in designing an efficient Q-learning algorithm is in determining the importance of the information gained throughout interactions with the environment. For instance, assigning a value close to 1 to the learning rate parameter means that we consider higher importance for the recent information gained. To optimize the value of the learning rate parameter throughout iterations, we use an adaptive methodology that uses Eq. (2) to intelligently tune the learning rate parameter to explore and exploit the search region (Zamli et al. 2018).

$$ \epsilon_{t} = 1 - 0.9\frac{t}{MT}, $$
(2)

In Eq. (2), t is the iteration index, and MT is the maximum generation. In this research, we consider a reward value of 1 if the current action improves the solution quality of the particle; otherwise, we consider a punishment value of − 1. Based on the given illustrations, we present the Pseudo-code of the Q-learning in Algorithm 1.

Algorithm 1: Pseudo Code for the Q-Learning Algorithm

 

1: input states, actions, gamma, and initial Q(s, a) table;

 

2: randomly select an initial state;

 

3: while stopping criteria not met do

 

4:           select the best action from the Q-table;

 

5:           execute the action and get a reward/punishment;

 

6:           determine the max Q-value for the next state;

 

7:           update a(t);

 

8:           update the Q-table;

 

9:           update the current state, s(t) = s(t + 1);

 

10: end

 

11: return the Q-table;

 

In this research, we consider several efficient operators from different algorithms and let the Q-learning algorithm determine the best action throughout the optimization process to modify the location of each particle in the feasible space. In the following subsections, we present the operators in detail.

3.2 Grey–Wolf optimizer

Similar to other swarm intelligence-based systems, GWO initially generates a set of primary solutions. Then, it sorts the solutions regarding their fitness and considers the three best solutions as the dominant wolves (alpha, beta, and delta). The following equations imitate the encircling behavior of the grey wolves around prey:

$$ \vec{D} = \left| {\vec{C}\vec{X}_{p} \left( t \right) - \vec{X}\left( t \right)} \right|, $$
(3)
$$ \bar{X}\left( {t + 1} \right) = \vec{X}_{p} \left( t \right) - \vec{A}\vec{D}, $$
(4)

In Eqs. (3) and (4), t represents the iteration index and \( \vec{A} \) and \( \vec{C} \) characterize location vectors of target and other grey wolves. \( \vec{X}_{p} \left( t \right) \) and \( \vec{X}\left( t \right) \) are the position of the prey and grey wolf, respectively. These coefficients are calculated as follows:

$$ \vec{A} = 2a_{1} \vec{r}_{1} - a_{1} $$
(5)
$$ \vec{C} = 2\vec{r}_{2} , $$
(6)

In Eqs. (5) and (6), \( a_{1} \) decreases over iterations from 2 to 0 and \( \vec{r}_{1} \) and \( \vec{r}_{2} \) are random numbers. After encircling the prey, the wolves start the hunting process. To mathematically express the movements of grey wolves in the hunting process, we consider that the alpha, beta, and delta have superior knowledge of the probable position of the target (possible optimal solution of the problem). In this framework, the following formulas are recommended to mimic the hunting process (Khalilpourazari et al. 2020a):

$$ \vec{D}_{\alpha } = \left| {\vec{C}_{1} \vec{X}_{\alpha } - \vec{X}} \right|, \vec{D}_{\beta } = \left| {\vec{C}_{2} \vec{X}_{\beta } - \vec{X}} \right|, \vec{D}_{\delta } = \left| {\vec{C}_{3} \vec{X}_{\delta } - \vec{X}} \right|, $$
(7)
$$ \bar{X}_{1} = \vec{X}_{\alpha } - \vec{A}_{1} \vec{D}_{\alpha } , \,\,\, \bar{X}_{2} = \vec{X}_{\beta } - \vec{A}_{2} \vec{D}_{\beta } , \,\,\,\,\bar{X}_{3} = \vec{X}_{\delta } - \vec{A}_{3} \vec{D}_{\delta } , $$
(8)
$$ \bar{X}\left( {t + 1} \right) = \frac{{\vec{X}_{1} + \vec{X}_{2} + \vec{X}_{3} }}{3} $$
(9)

Figure 1 depicts a graphical interpretation of the hunting action in 2D space.

Fig. 1
figure 1

Hunting behavior in GWO

3.3 Sine–cosine algorithm

SCA is a newly developed search procedure that mimics the sine and cosine like movements in the feasible space to modify the elements using Eq. (10):

$$ \begin{aligned} X_{i}^{t + 1} = \left\{ {\begin{array}{*{20}c} {X_{i}^{t} + r_{1} \times \sin (r_{2} ) \times \left| {r_{3} P_{i}^{t} - X_{i}^{t} } \right|, r_{4} < 0.5} \\ {X_{i}^{t} + r_{1} \times \cos (r_{2} ) \times \left| {r_{3} P_{i}^{t} - X_{i}^{t} } \right|, r_{4} \ge 0.5} \\ \end{array} } \right. \hfill \\ \hfill \\ \end{aligned} $$
(10)

In Eq. (10), \( X_{i}^{t} \) is the present location \( P_{i}^{t} \)is the location of the best particle and \( r_{1} ,r_{2} ,r_{3} \) are random numbers in (0,1]. Throughout iterations, \( r_{1} \) displays the movement path, \( r_{2} \) controls the moving distance, \( r_{3} \) guarantees a suitable equilibrium among underline or deemphasize the desalination, and \( r_{4} \) selects a sine or cosine measure for updating procedure. Figure 2 shows the movement behavior of the sine and cosine actions.

Fig. 2
figure 2

Updating procedure in SCA

SCA uses sine and cosine movements intelligently to evade local optima. In addition to adjusting the particles’ movements during the solution process, SCA reduces the value of \( r_{1} \) parameter using the below formula to sustain a proper equilibrium between exploration and exploitation as follows:

$$ r_{1} = a_{2} - t\frac{{a_{2} }}{T} $$
(11)

In Eq. (11), \( t \) displays present repetition,\( a_{2} \) is a constant, and \( T \)is the maximum generation.

3.4 Moth-flame optimization

Like most of the optimization paradigms, MFO initiates the optimization by creating random solutions. Then, it mimics the spiral flying actions of the moths around light sources using a logarithmic spiral function as follows:

$$ IP_{i}^{X + 1} = \left| {F_{i} - IP_{i}^{X} } \right|e^{bt} \cos \left( {2\pi t} \right) + F_{i} , $$
(12)

In Eq. (12), t is a constant in [− 1, 1], and \( b \) is a constant for determining the form of the logarithmic spiral. \( F_{i} \) is the flame (the best solution), \( IP_{i}^{X} \) is the moth, and \( \left| {F_{i} - IP_{i}^{X} } \right| \) calculates the distance between the moth and flame. Figure 3 shows the spiral movement around the flame.

Fig. 3
figure 3

The spiral fly path of the moths around the flame

In order to maintain a suitable balance among its exploration and exploitation, the MFO algorithm reduces the search radius using the following equations (Khalilpourazari et al. 2019; Mohammadi and Khalilpourazari 2017):

$$ a_{3} = - 1 + t\left( {\frac{ - 1}{MT}} \right) $$
(13)
$$ tt = \left( {a_{3} - 1} \right) \times rand + 1 $$
(14)

In Eqs. (13) and (14), \( t \) displays present repetition, and \( MT \) is the maximum generation.

3.5 Particle swarm optimization

Particle swarm optimization (PSO) is one of the most efficient procedures for optimization, and it performs promising in solving many complex problems. PSO is a population-based algorithm that uses Eq. (15) to update the location of a given particle in the solution space:

$$ x_{i} \left( {t + 1} \right) = x_{i} \left( t \right) + v_{i} \left( {t + 1} \right) $$
(15)

In Eq. (15), \( x_{i} \left( t \right) \) presents the current location of the particle and \( v_{i} \left( {t + 1} \right) \) determines the velocity of the particle. \( v_{i} \) vector is the main component of the updating operator that is calculated as follows:

$$ v_{i} \left( {t + 1} \right) = \omega v_{i} \left( t \right) + C_{1} r_{1} \left( {p_{i} \left( t \right) - x_{i} \left( t \right)} \right) + C_{2} r_{2} \left( {G\left( t \right) - x_{i} \left( t \right)} \right) $$
(16)

In Eq. (16), \( r_{1} , r_{2} \) are random numbers in (0, 1], \( C_{1} , C_{2} \) are coefficients, \( p_{i} \left( t \right) \) is the best solution found by the same solution so far, and \( G\left( t \right) \) is the best solution attained so far.

3.6 Water cycle algorithm

Water cycle algorithm (WCA) is one of the best algorithms for solving complex problems. WCA is a population-based nature-inspired metaheuristic that mimics the flow of streams to rivers and sea to perform optimization.

$$ x_{current}^{i + 1} = x_{current}^{i} + C\left( {x_{best\_sol}^{i} - x_{current}^{i} } \right) $$
(17)

In Eq. (17), \( C \) is a random value. We used the updating operator on the WCA as one of the means to update the location of a given particle in the feasible space. Figure 4 represents the updating procedure in WCA.

Fig. 4
figure 4

Updating procedure in WCA

3.7 Gaussian walks and Lévy flight

In this subsection, we use the leading operators of the stochastic fractal search (SFS) offered by Salimi (2015). SFS utilizes an important scientific property called “fractal”. Fractals are complicated geometric shapes that generally have a “fractional dimension,” resulting in self-similarity. SFS follows diffusion limited aggregation (DLA), which is an efficient technique to create fractals. To simulate the DLA, we use the Gaussian walk and Lévy flight. Figure 5 presents a fractal shape produced through the DLA scheme.

Fig. 5
figure 5

A fractal produced through the DLA method

We use the following equation to simulate the diffusion process in the DLA method.

$$ x_{i}^{q} = x_{i} + \beta \times Gaussian\left( {\left| {BP} \right|,\sigma } \right) - \left( {\varepsilon \times BP - \varepsilon^{\prime} \times x_{i} } \right), $$
(18)

In Eq. (18), \( q \) describes the number of new solutions created through the diffusion of each particle, \( \sigma \) is the standard deviation of the Gaussian walk, and \( BP \) is the best solution. Also, \( x_{i}^{q} \) denotes new particles produced via diffusion process and \( x_{i} \) is the \( i \)th solution. Besides, \( \varepsilon^{\prime} \) and \( \varepsilon \) are randomly generated numbers in (0, 1]. Element \( \sigma \) is defined using the below formula (Khalilpourazari et al. 2020b):

$$ \sigma = \left| {\frac{\log (g)}{g} \times \left( {P_{i} - BP} \right)} \right|, $$
(19)

where \( \frac{\log (g)}{g} \) decreases the length of Gaussian jumps over iterations. Element \( g \) is the iteration number and \( P_{i} \) is the current position of the particle.

In order to enhance exploration and randomness in the population, we also apply a Lévy flight based updating procedure to the particle under consideration as follows:

$$ X_{new}^{i} = X_{c}^{i} + X_{c}^{i} \otimes Levy\left( D \right)\,\,\,\,\,for\,\,\, i = 1, \ldots ,m $$
(20)

The expression \( X_{new}^{i} \) is the new location and \( X_{c}^{i} \) is the current location of the particle, respectively. We calculate the Lévy flight using Eq. (13).

$$ Levy\left( x \right) = \frac{{0.01 \times \sigma \times r_{1} }}{{\left| {r_{2} } \right|^{{\frac{1}{\beta }}} }}, $$
(21)

where \( r_{1} \)and \( r_{2} \) are random numbers. In Eq. (13) \( \sigma \) is obtained as follows:

$$ \sigma = \left( {\frac{{\varGamma \left( {1 + \beta } \right)\sin \left( {\frac{\pi \beta }{2}} \right)}}{{\varGamma \left( {\frac{1 + \beta }{2}} \right)\beta 2^{{\left( {\frac{\beta - 1}{2}} \right)}} }}} \right)^{{\frac{1}{\beta }}} . $$
(22)

3.8 The developed hybrid Q-learning based algorithm

In this section, we propose a novel Hybrid Q-learning based algorithm (HQLA) to solve complicated optimization problems. The idea behind this algorithm is to design a solution methodology that is capable of solving complicated problems by adapting its operators to any solution space. We designed an optimization procedure that can benefit from the advantages of several algorithms. The process can use any movement strategy based on each updating operator. However, a reinforcement learning based algorithm (Q-learning) selects the best action in each iteration for each particle. When optimization begins, the algorithm performs several random actions to evaluate the efficiency of each type of operator. As the iterations continue, Q-learning learns how to employ different actions to achieve the best possible solution. For each action, we consider a reward equal to 1 if the current operators improve the solution quality; Otherwise, the algorithm assigns a punishment value of − 1 to the action. The Pseudo-code of the algorithm is available in Algorithm 2.

Algorithm 2: Pseudo Code for the HQLA

 

1.  input parameters of the algorithm;

2.  create a set of randomly generated solutions;

3.  while stopping criteria not met

4.     check for infeasibility of the particles;

5.     bring infeasible particles to the feasible solution space;

6.     sort the solutions based on their fitness value;

7.     for each particle

8.      if it is the first iteration

9.         select a random action;

10.     else

11.       select the action using the Q-table;

12.     end

13.     if the action is GWO

14.       use GWO operators to update the position of the particle;

15.     else if the action is SFS

16.       use SFS operators to update the position of the particle;

17.     else if the action is WCA

18.       use WCA operators to update the position of the particle;

19.     else if the action is PSO

20.       use PSO operators to update the position of the particle;

21.     else if the action is MFO

22.       use MFO operators to update the position of the particle;

23.     else if the action is SCA

24.       use SCA operators to update the position of the particle;

25.     end

26.   check for infeasibility of the particle;

27.   bring infeasible particle to the feasible solution space;

28.   calculate the objective function value of the particle;

29.   determine the reward/punishment value;

30.   determine the max Q-value for the next state;

31.   update a(t);

32.   update the Q-table;

33.   update the current state, s(t) = s(t + 1);

34.   end

35. end

36. return the best solution

 

4 Results and discussion

Metaheuristics are approximation algorithms that are based on randomized movements. Therefore, their performance may differ from one problem to another. Thus, to show the efficiency of a metaheuristic algorithm, we should apply them to many benchmark functions. For this purpose, we use 29 benchmarks, including Unimodal, multimodal, fixed dimensional multimodal, and hybrid composite functions that are among the most complex benchmarks in the literature. For more details regarding these benchmarks, see “Appendix 1”.

We compare our algorithm to state-of-the-art methods, including Crow Search Algorithm (CSA), Artificial Bee Colony (ABC), Cuckoo Search (CS), Genetic Algorithm (GA), Moth-Flame optimization (MFO), Gravitational Search Algorithm (GSA), and Dragonfly Algorithm (DA). In order to solve the benchmark functions, we considered 15,000 Number of Function Evaluations (NFEs) to perform a reasonable assessment. Besides, to draw a reliable conclusion, we apply each algorithm on each benchmark 30 times, and report mean, standard deviation, worst and best results. Table 2 provides more details about the values of the main parameters of the algorithms.

Table 2 The values of the parameters of the algorithms

In the first step, we assess the performance of the HQLA on unimodal benchmarks (F1–F7). Figure 6 depicts a 2D representation of these benchmark functions.

Fig. 6
figure 6

2D representation of F1–F7

Unimodal benchmarks do not have several local optima and are considered to assess the exploration ability of metaheuristic algorithms. This set of test suites are difficult to solve since the algorithms should first locate the global optima approximately and then perform exploitation to provide the best approximation of the location of the global optima. Table 3 provides detailed information on the performance of the algorithms in unimodal benchmarks. Based on the results of Table 3, we observe that the HQLA performs the best in most of the benchmarks. In F1–F5 and F7, the HQLA outperforms all other algorithms by obtaining the best possible solution for all benchmarks. In F6, HQLA ranks third in providing the best solution for this benchmark. Besides, HQLA provides the lowest standard deviation that shows very low variability in the performance of this algorithm. Moreover, considering the boxplot of the results of Table 3 present in Fig. 7a–c, it becomes apparent that HQLA has the lowest and narrowest boxplot among the algorithms.

Table 3 Computational outcomes of the algorithms in solving unimodal benchmark functions
Fig. 7
figure 7figure 7

a Boxplot of the results in F1–F9 benchmarks. b Boxplot of the results in F10–F18 benchmarks. c Boxplot of the results in F19–F23 benchmarks

Furthermore, Fig. 8a–c, that present the convergence plots of the methods, disclose that the HQLA can maintain a perfect balance among exploration and exploitation of the solution space. In Fig. 8a–c, we observe that the HQLA can continually improve the best solution attained in each iteration by choosing the best operator to explore the solution space. The learning process in HQLA enables the algorithm to determine the best operator to change the location of the particles in the solution region by evaluating the efficiency of each updating mechanism. Based on these observations, we conclude that HQLA is a reliable technique for this family of benchmark functions.

Fig. 8
figure 8figure 8

a Convergence plot of the algorithms. b Convergence plot of the algorithms. c Convergence plot of the algorithms

The second and third family of the benchmarks are multimodal and fixed-dimension multimodal benchmarks. These benchmarks contain several local optima that make the solution process a complicated task. To perform well in solving these benchmark functions, the algorithms should maintain an excellent balance among exploration and exploitation. This will help the algorithms avoid local optimum. Figures 9 and 10 shows a schematic view of these benchmark functions in 2D.

Fig. 9
figure 9

2D representation of F8–F13

Fig. 10
figure 10

2D representation of F14–F18

We provide detailed information on the performance of the algorithms on solving multimodal and fixed-dimension multimodal benchmarks in Tables 4 and 5.

Table 4 Computational outcomes of the algorithms in solving multimodal benchmark functions
Table 5 Computational outcomes of the algorithms in solving multimodal benchmarks

Based on the results, we observe that in F8–F12, F14, and F16–F23 (14 out of 16) benchmark functions, the HQLA outperforms other algorithms considering average, standard deviation, best and worst values over 30 repetitions. In some of these benchmarks, such as F9 and F11, the standard deviation of the results provided by HQLA is zero. In these benchmark functions, the HQLA achieves global optima in all the repetitions. These results indicate that HQLA is a robust and reliable algorithm in solving complex optimization problems. HQLA is able to choose between several operators that enable the algorithms to explore and exploit the resolution region intelligently. The learning process in the HQLA helps the algorithm evaluate the efficiency of the operators and discover the most efficient operator (action) for any problem. Besides, Fig. 8a–c depict the perfect balance among exploration and exploitation in the performance of the HQLA throughout iterations. Moreover, in F13 and F15, the HQLA ranked second among all algorithms, which shows its high capability in solving optimization problems. Furthermore, Fig. 7a–c, disclose that the boxplot of the outcomes of the HQLA is narrower and lower than any other algorithm that highlights the superiority of the proposed methodology over existing approaches.

The last family of the benchmark functions is hybrid composite benchmarks that are the most challenging (Liang et al. 2005). Figure 11 presents a graphical representation of these benchmark functions in 2D.

Fig. 11
figure 11

Two-dimension view of the hybrid composite functions

Fig. 12
figure 12

Convergence plots for algorithms in solving composite problems

Computational results of solving this family of benchmark functions using each algorithm are available in Table 6. Considering the results, we observe that the HQLA provides the best results and outperforms other algorithms in solving hybrid benchmarks. Table 6 shows that the HQLA obtained the lowest average, standard deviation, and best values for these benchmark functions while maintaining the lowest worst value. Besides, Fig. 13 approves this statement by showing that the boxplot of the HQLA is narrower and lower than any other algorithms. Moreover, Fig. 12 shows that the HQLA adjusts exploration and exploitation by intelligently choosing the best operators in each iteration. Based on Fig. 12, we observe that the HQLA can improve the best solution consistently throughout iterations.

Table 6 Computational outcomes of the algorithms in solving hybrid composite benchmarks
Fig. 13
figure 13

Boxplot of the results in composite benchmarks

Although we showed that the HQLA outperforms other state-of-the-art methods in terms of solution performance, we apply Friedman’s test, which is a powerful statistical test, to show the statistical superiority of our algorithm to other methods. Table 7 presents the results of Friedman’s test. Based on the outcomes, the average rank of the HQLA is 3.036207, which is far smaller than that of other algorithms. Therefore, from a statistical point of view, HQLA performs significantly better than all other algorithms. We note that we perform the Friedman’s test at a 95 percent confidence level.

Table 7 Results of the Friedman’s test

5 Multi-criteria parameter estimation and curve fitting

Quebec is one of Canada’s provinces that is dealing with the COVID-19 epidemic triggered by the SARS-CoV-2 virus. The province has been reported the most COVI-19 cases that account for more than 63,713 confirmed cases of COVID-19 and 5770 death cases by the disease. On April 29, 2020, Quebec hospitals announced that the healthcare system capacity could not respond to the influx of the COVID-19 patients to the hospitals (Weeks and Tu Thanh 2020; The Globe and Mail n.d.). On June 28, 2020, Montreal’s Emergency Rooms (ERs) reported near capacity status due to limited resources (Fahmy and Ross 2020; CTV News. n.d.). These highlight the need for a methodology that could accurately predict the future trend of the pandemic in the province that enables the policymakers to optimize the resource allocation to avoid loss of lives, as many scientists declared that resource shortages such as ventilator shortages are the difference between life and death for patients (Kliff et al. 020; The New York Time. n.d.). Developing new methodologies to predict pandemic growth is essential to optimize resource allocation and determine the optimal time to implement lockdown measures. On the other hand, we could optimize resource allocation for life-threatening cases admitted to ICUs and reduce the disease’s fatality rate.

In this section, we use the most recent and accurate model called SIDARTHE, presented by Giordano et al. (2020) published in Nature Medicine. The scientists showed that using the model, we could predict the future trend of the pandemic accurately. However, the solution to the problem is a cumbersome task. For more information on the SIDARTHE model, see “Appendix 2”. In this research, we used the SIDARTHE model and applied it to real data from Quebec, Canada. In order to solve the model, we utilize HQLA. Figure 14 shows the convergence plot of HQLA throughout iterations. We divided the period (from January 25, 2020 (day 1) to July 19, 2020 (day 176)) into six stages as follows in which the Quebec government applied specific restrictions to control the pandemic:

  • Stage 1 (from January 25, 2020, to March 15, 2020)

  • Stage 2 (from March 15 to March 24)

  • Stage 3 (from March 24 to March 28)

  • Stage 4 (from March 28 to April 2)

  • Stage 5 (from April 2 to April 13)

  • Stage 6 (after April 13)

Fig. 14
figure 14

Performance of HQLA in solving the SIDARTHE problem

We note that we considered the sum of mean square errors as the objective function value for our model, and its optimal value for our case study is 6.29E−06. Based on the outcomes, we observe that the HQLA can solve the problem very efficiently. Table 8 shows detailed statistics about the optimized factors of the model. Now that the results of Table 8 are the output of the optimization process and solving the SIDARTHE model for Quebec data using HQLA.

Table 8 Results of fitting the model to real-data for Quebec

In the following, we compared our results with actual data from Quebec to validate our results. Figure 15 shows the predicted and actual data from Quebec. In Fig. 15, we accurately predict the number of infected cases, recovered cases, and cumulative diagnosed cases. Based on the outcomes, we conclude that the HQLA is an efficient solution methodology for the problem.

Fig. 15
figure 15

Prediction versus data using HQLA. Non-diagnosed asymptomatic (ND AS), diagnosed asymptomatic (D AS), non-diagnosed symptomatic (ND S), diagnosed symptomatic (DS), and diagnosed with life-threatening symptoms (D IC)

As mentioned earlier, we separated the planning horizon into six phases in which the Quebec government announced detailed limitations to control the epidemic. The first case of COVID-19 was detected in Quebec, Canada, on February 27, 2020 (Lapierre 2020). In the first phase, the transmission rate was considered low. Based on our results, we observe that the transmission rates were low at the first stage, and the reproduction rate was \( R_{0} = 1.0998 \). Quebec province first announced a state of emergency on March 12, 2020. We consider March 15, 2020, to March 24, 2020, as the second phase of the pandemic due to a drastic increase in the number of cases. On March 15, the Quebec government ordered the closure of all recreational and entertainment facilities (Gouvernement Du Québec 2020).

Following the quick progress in the number of infected cases, on March 27, Montreal declared a local state of emergency, and the Quebec government ordered the closure of all universities and schools. Besides, On March 20, the province banned indoor gatherings. In the second phase of the pandemic, our study estimates a reproduction rate of \( R_{0} = 3. 80 2 8 \) for Quebec province. From March 24, 2020, to March 28, 2020, Quebec received more COVID-19 test kits enabling the healthcare authorities to perform more tests and determine the infected cases. In this phase, we approximate the reproduction rate equal to \( R_{0} = 4.6096 \).

From March 28, 2020, to April 2, 2020, strict actions were taken by the government decreased the reproduction rate to \( R_{0} = 1.1193 \). The reproduction rate dropped over the next period to \( R_{0} = 1.0248 \) from April 2, 2020, to April 13, 2020. After April 13, 2020, the reproduction rate was considered to be \( R_{0} = 0.7782 \). In order to present the progress of the pandemic in the next few months, we extended our prediction to the next 365 days, as is shown in Fig. 16a, b.

Fig. 16
figure 16

a Prediction of future cases using SIDARTHE and HQLA. b Prediction of future cases using SIDARTHE and HQLA. Non-diagnosed asymptomatic (ND AS), diagnosed asymptomatic (D AS), non-diagnosed symptomatic (ND S), diagnosed symptomatic (DS), and diagnosed with life-threatening symptoms (D IC)

Based on our results, considering the current social distancing and limitations, we will experience a significant decrease in the number of cases in the next few months if and only if strict measures such as partial lockdown remain in place for the next few months.

6 Sensitivity analyses and managerial insights

In the previous section, we reflected the pandemic growth over the next few months, considering the current partial lockdown and closure measures. However, in May 2020, the Quebec government ordered a gradual reopening of the businesses. Consequently, it is vital to discover how the reopening of the businesses will impact forthcoming circumstances. In this section, we examine the consequence of variation in transmission rates on the progress of the pandemic. Therefore, we augmented the parameters \( \alpha , \beta , \gamma , \delta , \,\,{\text{and}}\,\, \varepsilon \) and explore their impact on the number of infected, recovered, cumulative diagnosed, and death cases. We portray the outcomes in Figs. 17, 18, 19 and 20. The outcomes prove that the parameter \( \alpha \) has a substantial effect on the pandemic growth, and increasing this parameter increases the number of infected, recovered, cumulative diagnosed, and death cases. Besides, increasing \( \beta , \gamma , \) and \( \delta \) increases the number of infected, recovered, cumulative diagnosed, and death cases. Moreover, increasing the parameter \( \varepsilon \) remarkably reduces the number of infected, recovered, cumulative diagnosed, and death cases. Therefore, to decrease the spread of the virus and stop the pandemic, we need to reduce the transmission rate of the infection by applying significant social distancing and behavioral measures while increasing the detection rate of asymptomatic cases.

Fig. 17
figure 17

Future scenarios of the infected cases in Quebec

Fig. 18
figure 18

Future scenarios of the cumulative diagnosed cases in Quebec

Fig. 19
figure 19

Future scenarios of the recovered cases in Quebec

Fig. 20
figure 20

Future scenarios of the recovered cases in Quebec

Based on our results, we observe that from day 250, we will see an increase in the number of infected cases from October 1, 2020, in Quebec by limiting the lockdown measures. Therefore, starting October 1, 2020, Quebec will experience a significant increase in the number of infected cases.

7 Conclusion

COVID-19 is a severe health threat, and the condition is growing every day. Therefore, presenting new procedures to model and predict the COVID-19 pandemic is vital. Prediction of the COVID-19 pandemic will enable policymakers to optimize the use of healthcare system capacity and resource allocation to minimize the fatality rate. In this research, we design a new hybrid reinforcement learning-based algorithm capable of solving complex optimization problems. We applied our algorithm to several well-known benchmarks and show that the presented methodology provides high-quality solutions for the most complex problems in the literature. Besides, we showed the superiority of the offered method to state-of-the-art methods through several measures.

Moreover, to demonstrate the efficiency of the proposed methodology in optimizing real-world problems, we implemented our approach to the most recent data from Quebec, Canada, to predict the COVID-19 outbreak. Our algorithm, combined with the most recent mathematical model for COVID-19 pandemic prediction, accurately reflected the future trend of the pandemic. Furthermore, we analyzed several scenarios for deepening our insight into pandemic growth. We determined essential factors and delivered various managerial insights to help policymakers making decisions regarding future social measures.

Our results showed that the transmission rate caused by the interaction of a susceptible case with an asymptomatic case has the most significant effect on future trends. Increasing this parameter can significantly increase the number of infected, recovered, cumulative diagnosed, and death cases. Besides, increasing the transmission rate due to contact of a susceptible case with a diagnosed, ailing, and recognized case increases the number of infected, recovered, cumulative diagnosed, and death cases. Moreover, increasing the parameter \( \varepsilon \) remarkably reduces the number of infected, recovered, cumulative diagnosed, and death cases. Therefore, to decrease the spread of the virus and stop the pandemic, we need to reduce the transmission rate of the infection by applying significant social distancing and behavioral measures while increasing the detection rate of asymptomatic cases.

As future research, from a mathematical modeling viewpoint, it would be worthwhile to consider stochasticity in the proposed model (Belen et al. 2011; Özmen et al. 2011; Weber et al. 2011). Besides, it would be interesting to use the proposed model to plan for resource management during the pandemic to decrease the fatality rate by increasing the healthcare system capacity. Based on the predictions, healthcare managers can plan for testing kit allocation to test centers. Also, it would be worthwhile to consider the age, medical condition, and gender of the infected cases in the model from a modeling perspective. Moreover, it would also be interesting to use the proposed model to plan for managing healthcare resources such as personal protective equipment (PPE) and ventilators during the pandemic.