Designing a hybrid reinforcement learning based algorithm with application in prediction of the COVID-19 pandemic in Quebec

Khalilpourazari, Soheyl; Hashemi Doulabi, Hossein

doi:10.1007/s10479-020-03871-7

Designing a hybrid reinforcement learning based algorithm with application in prediction of the COVID-19 pandemic in Quebec

Original Research
Published: 03 January 2021

Volume 312, pages 1261–1305, (2022)
Cite this article

Download PDF

Annals of Operations Research Aims and scope Submit manuscript

Designing a hybrid reinforcement learning based algorithm with application in prediction of the COVID-19 pandemic in Quebec

Download PDF

5201 Accesses
40 Citations
Explore all metrics

Abstract

World Health Organization (WHO) stated COVID-19 as a pandemic in March 2020. Since then, 26,795,847 cases have been reported worldwide, and 878,963 lost their lives due to the illness by September 3, 2020. Prediction of the COVID-19 pandemic will enable policymakers to optimize the use of healthcare system capacity and resource allocation to minimize the fatality rate. In this research, we design a novel hybrid reinforcement learning-based algorithm capable of solving complex optimization problems. We apply our algorithm to several well-known benchmarks and show that the proposed methodology provides quality solutions for most complex benchmarks. Besides, we show the dominance of the offered method over state-of-the-art methods through several measures. Moreover, to demonstrate the suggested method’s efficiency in optimizing real-world problems, we implement our approach to the most recent data from Quebec, Canada, to predict the COVID-19 outbreak. Our algorithm, combined with the most recent mathematical model for COVID-19 pandemic prediction, accurately reflected the future trend of the pandemic with a mean square error of 6.29E−06. Furthermore, we generate several scenarios for deepening our insight into pandemic growth. We determine essential factors and deliver various managerial insights to help policymakers making decisions regarding future social measures.

Optimization of Mitigation Strategies During Epidemics Using Offline Reinforcement Learning

Exploring optimal control of epidemic spread using reinforcement learning

Article Open access 16 December 2020

A general framework for optimising cost-effectiveness of pandemic response under partial intervention measures

Article Open access 14 November 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Researchers use optimization in nearly every study area. Optimization remains a fundamental challenge in science and engineering, primarily because of the difficulty of real-world problems and the limitations of traditional methods. Randomization Search Algorithms (RSAs) are among the most flexible and most efficient methodologies to solve complicated problems. These algorithms are mostly polynomial-time algorithms and have significantly lower computational complexity. As one of the most commonly used RSAs, metaheuristics are algorithms that are inspired by natural phenomena to perform optimization. Metaheuristics perform very well in exploring the feasible region and evade local optimum using effective movement processes.

Healthcare science is one of the top research areas in which metaheuristics have been widely applied to. Using these algorithms, scientists can optimize healthcare systems significantly in terms of several objectives, including minimizing cost, waiting time, service time, delivery time, and maximizing reliability or customer satisfaction. In December 2019, a novel strain of coronavirus called SARS-Cov-2 discovered in China. The virus causes COVID-19, a severe respiratory disease. Regardless of primary measures applied by the government of China, the disease spread quickly to many countries leading to 26,795,847 infected cases and 878,963 deaths. Currently, there are no effective medications and vaccines for the disease. However, the effectiveness of some treatment options is under study via clinical trials (Health Canada 2020; WHO 2020). Although most people with mild COVID-19 symptoms recover independently, some other people with severe and critical symptoms need hospitalization in wards and Intensive Care Units (Public Health Authority of Canada 2020). However, due to the limited capacity of the healthcare system, it is impossible to admit all the patients in hospitals.

Efficient modeling and prediction of the COVID-19 pandemic will meaningfully aid the officials and healthcare experts in making decisions to stop the spread of the disease. Besides, by forecasting the upcoming trend of the epidemic, we can also optimally allocate resources to hospitals that will avoid equipment shortages and save patients’ lives. Prediction of the COVID-19’s trend is challenging because of its uncertain nature and complication. Recently, scientists have provided a novel model to simulate the COVID-19 pandemic called SIDARTHE and was initially offered in research by Giordano et al. (2020) published in Nature Medicine. The researchers highlighted the efficiency of the proposed formulation in modeling the pandemic growth. Nevertheless, they emphasized that solving the presented set of differential equations is difficult because of the exceptional characteristics of the model.

In the current research, we offer a novel search methodology that will solve many complex optimization problems very efficiently in a short time. Our algorithm simultaneously benefits from the advantages of machine learning (ML) and evolutionary computation (EC). In our research, we propose a hybrid algorithm that involves a reinforcement learning (RL) technique as the main engine and several ECs as updating operators. The learning process achieved by the RL method accelerates the algorithm and enables us to resolve complicated large-scale problems. The procedure utilizes several operators to enhance the exploration and exploitation capabilities of the algorithm. These features help the proposed process to sidestep local optimum while exploiting the solution space intelligently. We implement the algorithm on several well-known recently developed benchmarks to show its efficiency in solving such complex problems. Moreover, we highlighted significant differences in the performance of the method comparing to other methodologies using robust statistical tests.

Furthermore, we use the proposed algorithm to model and predict the COVID-19 pandemic in Quebec, Canada, that has the most cases of COVID-19 in the country. Our results accurately predict the peak in the number of infected cases of COVID-19 in the province. The outcomes also determine the peak of the number of cases that develop life-threatening symptoms that will require hospitalization. We also perform complex sensitivity analyses to portray future scenarios that enable us to provide detailed information to policymakers and healthcare professionals. In our study, we also measure the effectiveness of implementing measures such as social distancing and partial lockdown on pandemic growth.

The rest of the current study is prepared as follows: In Sect. 2, we deliver a comprehensive review of existing research on the topic. In Sect. 3, we offer a novel algorithm to resolve many complex problems using RL and EC. In Sect. 4, we apply our algorithm to several well-known benchmark functions. We assess the performance of the suggested approach and compare its efficiency to state-of-the-art methods. In Sect. 5, we implement our method to model and predict the COVID-19 pandemic in Quebec, Canada. In Sect. 6, we perform sensitivity analyses and provide valuable managerial insights to fight the COVID-19 pandemic. In Sect. 7, we conclude the paper.

2 Survey on research conducted

The core idea behind most EC algorithms is to follow a swarm intelligence that is inspired by animal behavior and natural phenomenon (Khalilpourazari and Pasandideh 2019). Mirjalili and Lewis (2016) categorized metaheuristics into four main groups: evolutionary algorithms (EAs), physics-based algorithms (PAs), swarm algorithms (SAs), and machine learning-based algorithms (MLAs). EAs imitate the evolution procedure in nature to solve complex problems. PAs utilize laws of physics that enable this family of algorithms to handle complicated problems. On the other hand, SAs simulate the swarm behavior of many individuals in a group. Also, MLAs use artificial intelligence and machine learning to enhance the performance of the previous families of algorithms in terms of exploration and exploitation. Table 1 provides some of the most advanced algorithms on metaheuristics developed in recent years.

Table 1 Classification of the metaheuristics

Full size table

Many researchers used these algorithms to solve complex optimization problems in different fields (Hoursan et al. 2020; Tirkolaee et al. 2020a, b; Sangaiah et al. 2020; Lotfi et al. 2018, 2019, 2020; Zare Mehrjerdi and Lotfi 2019; Khalilpourazari et al. 2020c; Hashemi Doulabi et al. 2020a, b). Defined by the no free lunch (NFL) theorem, we can logically prove that no single method performs optimally in resolving all problems (Adam et al. 2019; Wolpert and Macready 1997). Any metaheuristic algorithm may perform well in some benchmarks but weak in others. This theorem makes this field of study highly interesting for researchers searching for an algorithm that performs promising in many benchmarks. In our algorithm, we consider several ECs and operators and let an RL method decide which algorithm to use to relocate each element. Besides, learning during the optimization process will significantly reduce the computational burden and improve the quality of the results. The learning process accelerates the algorithm due to the fact that using the learning process over iterations, the algorithm adapts its operators to perform the best for each problem.

Moreover, we consider a proper framework to maintain a decent equilibrium between exploration and exploitation of the feasible region over generations of the algorithm to avoid local optima. We show the efficiency of our algorithm on the most complex benchmarks in the literature. Besides, we apply the suggested algorithm to predict the COVID-19 pandemic in Quebec, Canada. Our outcomes express that the designed algorithm robustly predicts the future trends of the pandemic.

3 Algorithm development

Metaheuristics work based on randomization. By randomization, we mean that these algorithms use random stepsizes while updating each particle’s position in the solution space. Metaheuristics use unique operators and strategies to update the position of each particle (solution). The efficiency of these operators and algorithms significantly depends on the solution space of the problem. For instance, some algorithms follow a direct updating procedure, such as water cycle algorithm (WCA), while some other encircle the best solution to update the position of a given particle, such as Grey–Wolf optimizer (GWO). Each of these operators and moving strategies has unique advantages that enable the algorithms to perform well in optimizing specific problems. Therefore, developing new algorithms that could efficiently solve a higher number of optimization problems is essential.

In this study, we use several operators (moving strategies) from various algorithms to update the particles’ position in the solution space. For updating each particle, we have to choose an operator from the given set of operators. Determining the best strategy and the most efficient operator for any given optimization problem is computationally challenging. Therefore, we use a reinforcement learning method that learns and optimizes the choice of operators during the optimization process to achieve optimal performance. In the following, we first describe all the features of the offered algorithm, and then we define the algorithm in a unique structure.

3.1 Q-learning

Reinforcement learning (RL) that approximates dynamic programming, and neuro-dynamic programming, is a type of machine learning which determines the best actions in a specific environment to maximize a reward (Bertsekas 2019). One of the main features of reinforcement learning is that the agent receives a reward or punishment after executing an action. The RL continues to interact with the environment to achieve the optimal policy via trial and error.

The Q-Learning is one of the most efficient reinforcement learning algorithms that determine an optimal policy by evaluating taken actions using the environment. Q-learning is a way to optimize solutions in a Markov decision process (MDP) problem (Akhtar and Farrukh 2017). The Q-learning algorithm aims to maximize the anticipated reward by determining the optimal state-action pairs. The algorithm uses a $ Q\left( {s,a} \right) $ table where $ s_{t} $ is the state and $ a_{t} $ is the action at time step t, and $ Q $ is the cumulative reward matrix. The algorithm updates the components of the Q-table $ Q\left( {s,a} \right) $ iteratively using Eq. (1).

$$ Q_{{\left( {t + 1} \right)}} \left( {s_{t} ,a_{t} } \right) = Q_{t} \left( {s_{t} ,a_{t} } \right) + \epsilon_{t} \left( {r_{t} + \gamma max\left( {Q_{t} (s_{t + 1} ,a_{t + 1} )} \right) - Q_{t} \left( {s_{t} ,a_{t} } \right)} \right), $$

(1)

In Eq. (1), $ \epsilon_{t} $ denotes the learning rate parameter and $ r_{t} $ is the obtained reward/punishment from the current action. Besides, the expression $ \gamma $ is the scaling factor. One of the main challenges in designing an efficient Q-learning algorithm is in determining the importance of the information gained throughout interactions with the environment. For instance, assigning a value close to 1 to the learning rate parameter means that we consider higher importance for the recent information gained. To optimize the value of the learning rate parameter throughout iterations, we use an adaptive methodology that uses Eq. (2) to intelligently tune the learning rate parameter to explore and exploit the search region (Zamli et al. 2018).

$$ \epsilon_{t} = 1 - 0.9\frac{t}{MT}, $$

(2)

In Eq. (2), t is the iteration index, and MT is the maximum generation. In this research, we consider a reward value of 1 if the current action improves the solution quality of the particle; otherwise, we consider a punishment value of − 1. Based on the given illustrations, we present the Pseudo-code of the Q-learning in Algorithm 1.

Algorithm 1: Pseudo Code for the Q-Learning Algorithm
1: input states, actions, gamma, and initial Q(s, a) table;
2: randomly select an initial state;
3: while stopping criteria not met do
4: select the best action from the Q-table;
5: execute the action and get a reward/punishment;
6: determine the max Q-value for the next state;
7: update a(t);
8: update the Q-table;
9: update the current state, s(t) = s(t + 1);
10: end
11: return the Q-table;

In this research, we consider several efficient operators from different algorithms and let the Q-learning algorithm determine the best action throughout the optimization process to modify the location of each particle in the feasible space. In the following subsections, we present the operators in detail.

3.2 Grey–Wolf optimizer

Similar to other swarm intelligence-based systems, GWO initially generates a set of primary solutions. Then, it sorts the solutions regarding their fitness and considers the three best solutions as the dominant wolves (alpha, beta, and delta). The following equations imitate the encircling behavior of the grey wolves around prey:

$$ \vec{D} = \left| {\vec{C}\vec{X}_{p} \left( t \right) - \vec{X}\left( t \right)} \right|, $$

(3)

$$ \bar{X}\left( {t + 1} \right) = \vec{X}_{p} \left( t \right) - \vec{A}\vec{D}, $$

(4)

In Eqs. (3) and (4), t represents the iteration index and $ \vec{A} $ and $ \vec{C} $ characterize location vectors of target and other grey wolves. $ \vec{X}_{p} \left( t \right) $ and $ \vec{X}\left( t \right) $ are the position of the prey and grey wolf, respectively. These coefficients are calculated as follows:

$$ \vec{A} = 2a_{1} \vec{r}_{1} - a_{1} $$

(5)

$$ \vec{C} = 2\vec{r}_{2} , $$

(6)

In Eqs. (5) and (6), $ a_{1} $ decreases over iterations from 2 to 0 and $ \vec{r}_{1} $ and $ \vec{r}_{2} $ are random numbers. After encircling the prey, the wolves start the hunting process. To mathematically express the movements of grey wolves in the hunting process, we consider that the alpha, beta, and delta have superior knowledge of the probable position of the target (possible optimal solution of the problem). In this framework, the following formulas are recommended to mimic the hunting process (Khalilpourazari et al. 2020a):

$$ \vec{D}_{\alpha } = \left| {\vec{C}_{1} \vec{X}_{\alpha } - \vec{X}} \right|, \vec{D}_{\beta } = \left| {\vec{C}_{2} \vec{X}_{\beta } - \vec{X}} \right|, \vec{D}_{\delta } = \left| {\vec{C}_{3} \vec{X}_{\delta } - \vec{X}} \right|, $$

(7)

$$ \bar{X}_{1} = \vec{X}_{\alpha } - \vec{A}_{1} \vec{D}_{\alpha } , \,\,\, \bar{X}_{2} = \vec{X}_{\beta } - \vec{A}_{2} \vec{D}_{\beta } , \,\,\,\,\bar{X}_{3} = \vec{X}_{\delta } - \vec{A}_{3} \vec{D}_{\delta } , $$

(8)

$$ \bar{X}\left( {t + 1} \right) = \frac{{\vec{X}_{1} + \vec{X}_{2} + \vec{X}_{3} }}{3} $$

(9)

Figure 1 depicts a graphical interpretation of the hunting action in 2D space.

3.3 Sine–cosine algorithm

SCA is a newly developed search procedure that mimics the sine and cosine like movements in the feasible space to modify the elements using Eq. (10):

$$ \begin{aligned} X_{i}^{t + 1} = \left\{ {\begin{array}{*{20}c} {X_{i}^{t} + r_{1} \times \sin (r_{2} ) \times \left| {r_{3} P_{i}^{t} - X_{i}^{t} } \right|, r_{4} < 0.5} \\ {X_{i}^{t} + r_{1} \times \cos (r_{2} ) \times \left| {r_{3} P_{i}^{t} - X_{i}^{t} } \right|, r_{4} \ge 0.5} \\ \end{array} } \right. \hfill \\ \hfill \\ \end{aligned} $$

(10)

In Eq. (10), $ X_{i}^{t} $ is the present location $ P_{i}^{t} $is the location of the best particle and $ r_{1} ,r_{2} ,r_{3} $ are random numbers in (0,1]. Throughout iterations, $ r_{1} $ displays the movement path, $ r_{2} $ controls the moving distance, $ r_{3} $ guarantees a suitable equilibrium among underline or deemphasize the desalination, and $ r_{4} $ selects a sine or cosine measure for updating procedure. Figure 2 shows the movement behavior of the sine and cosine actions.

SCA uses sine and cosine movements intelligently to evade local optima. In addition to adjusting the particles’ movements during the solution process, SCA reduces the value of $ r_{1} $ parameter using the below formula to sustain a proper equilibrium between exploration and exploitation as follows:

$$ r_{1} = a_{2} - t\frac{{a_{2} }}{T} $$

(11)

In Eq. (11), $ t $ displays present repetition,$ a_{2} $ is a constant, and $ T $is the maximum generation.

3.4 Moth-flame optimization

Like most of the optimization paradigms, MFO initiates the optimization by creating random solutions. Then, it mimics the spiral flying actions of the moths around light sources using a logarithmic spiral function as follows:

$$ IP_{i}^{X + 1} = \left| {F_{i} - IP_{i}^{X} } \right|e^{bt} \cos \left( {2\pi t} \right) + F_{i} , $$

(12)

In Eq. (12), t is a constant in [− 1, 1], and $ b $ is a constant for determining the form of the logarithmic spiral. $ F_{i} $ is the flame (the best solution), $ IP_{i}^{X} $ is the moth, and $ \left| {F_{i} - IP_{i}^{X} } \right| $ calculates the distance between the moth and flame. Figure 3 shows the spiral movement around the flame.

In order to maintain a suitable balance among its exploration and exploitation, the MFO algorithm reduces the search radius using the following equations (Khalilpourazari et al. 2019; Mohammadi and Khalilpourazari 2017):

$$ a_{3} = - 1 + t\left( {\frac{ - 1}{MT}} \right) $$

(13)

$$ tt = \left( {a_{3} - 1} \right) \times rand + 1 $$

(14)

In Eqs. (13) and (14), $ t $ displays present repetition, and $ MT $ is the maximum generation.

3.5 Particle swarm optimization

Particle swarm optimization (PSO) is one of the most efficient procedures for optimization, and it performs promising in solving many complex problems. PSO is a population-based algorithm that uses Eq. (15) to update the location of a given particle in the solution space:

$$ x_{i} \left( {t + 1} \right) = x_{i} \left( t \right) + v_{i} \left( {t + 1} \right) $$

(15)

In Eq. (15), $ x_{i} \left( t \right) $ presents the current location of the particle and $ v_{i} \left( {t + 1} \right) $ determines the velocity of the particle. $ v_{i} $ vector is the main component of the updating operator that is calculated as follows:

$$ v_{i} \left( {t + 1} \right) = \omega v_{i} \left( t \right) + C_{1} r_{1} \left( {p_{i} \left( t \right) - x_{i} \left( t \right)} \right) + C_{2} r_{2} \left( {G\left( t \right) - x_{i} \left( t \right)} \right) $$

(16)

In Eq. (16), $ r_{1} , r_{2} $ are random numbers in (0, 1], $ C_{1} , C_{2} $ are coefficients, $ p_{i} \left( t \right) $ is the best solution found by the same solution so far, and $ G\left( t \right) $ is the best solution attained so far.

3.6 Water cycle algorithm

Water cycle algorithm (WCA) is one of the best algorithms for solving complex problems. WCA is a population-based nature-inspired metaheuristic that mimics the flow of streams to rivers and sea to perform optimization.

$$ x_{current}^{i + 1} = x_{current}^{i} + C\left( {x_{best\_sol}^{i} - x_{current}^{i} } \right) $$

(17)

In Eq. (17), $ C $ is a random value. We used the updating operator on the WCA as one of the means to update the location of a given particle in the feasible space. Figure 4 represents the updating procedure in WCA.

3.7 Gaussian walks and Lévy flight

In this subsection, we use the leading operators of the stochastic fractal search (SFS) offered by Salimi (2015). SFS utilizes an important scientific property called “fractal”. Fractals are complicated geometric shapes that generally have a “fractional dimension,” resulting in self-similarity. SFS follows diffusion limited aggregation (DLA), which is an efficient technique to create fractals. To simulate the DLA, we use the Gaussian walk and Lévy flight. Figure 5 presents a fractal shape produced through the DLA scheme.

We use the following equation to simulate the diffusion process in the DLA method.

$$ x_{i}^{q} = x_{i} + \beta \times Gaussian\left( {\left| {BP} \right|,\sigma } \right) - \left( {\varepsilon \times BP - \varepsilon^{\prime} \times x_{i} } \right), $$

(18)

In Eq. (18), $ q $ describes the number of new solutions created through the diffusion of each particle, $ \sigma $ is the standard deviation of the Gaussian walk, and $ BP $ is the best solution. Also, $ x_{i}^{q} $ denotes new particles produced via diffusion process and $ x_{i} $ is the $ i $th solution. Besides, $ \varepsilon^{\prime} $ and $ \varepsilon $ are randomly generated numbers in (0, 1]. Element $ \sigma $ is defined using the below formula (Khalilpourazari et al. 2020b):

$$ \sigma = \left| {\frac{\log (g)}{g} \times \left( {P_{i} - BP} \right)} \right|, $$

(19)

where $ \frac{\log (g)}{g} $ decreases the length of Gaussian jumps over iterations. Element $ g $ is the iteration number and $ P_{i} $ is the current position of the particle.

In order to enhance exploration and randomness in the population, we also apply a Lévy flight based updating procedure to the particle under consideration as follows:

$$ X_{new}^{i} = X_{c}^{i} + X_{c}^{i} \otimes Levy\left( D \right)\,\,\,\,\,for\,\,\, i = 1, \ldots ,m $$

(20)

The expression $ X_{new}^{i} $ is the new location and $ X_{c}^{i} $ is the current location of the particle, respectively. We calculate the Lévy flight using Eq. (13).

$$ Levy\left( x \right) = \frac{{0.01 \times \sigma \times r_{1} }}{{\left| {r_{2} } \right|^{{\frac{1}{\beta }}} }}, $$

(21)

where $ r_{1} $and $ r_{2} $ are random numbers. In Eq. (13) $ \sigma $ is obtained as follows:

$$ \sigma = \left( {\frac{{\varGamma \left( {1 + \beta } \right)\sin \left( {\frac{\pi \beta }{2}} \right)}}{{\varGamma \left( {\frac{1 + \beta }{2}} \right)\beta 2^{{\left( {\frac{\beta - 1}{2}} \right)}} }}} \right)^{{\frac{1}{\beta }}} . $$

(22)

3.8 The developed hybrid Q-learning based algorithm

In this section, we propose a novel Hybrid Q-learning based algorithm (HQLA) to solve complicated optimization problems. The idea behind this algorithm is to design a solution methodology that is capable of solving complicated problems by adapting its operators to any solution space. We designed an optimization procedure that can benefit from the advantages of several algorithms. The process can use any movement strategy based on each updating operator. However, a reinforcement learning based algorithm (Q-learning) selects the best action in each iteration for each particle. When optimization begins, the algorithm performs several random actions to evaluate the efficiency of each type of operator. As the iterations continue, Q-learning learns how to employ different actions to achieve the best possible solution. For each action, we consider a reward equal to 1 if the current operators improve the solution quality; Otherwise, the algorithm assigns a punishment value of − 1 to the action. The Pseudo-code of the algorithm is available in Algorithm 2.

Algorithm 2: Pseudo Code for the HQLA
1. input parameters of the algorithm; 2. create a set of randomly generated solutions; 3. while stopping criteria not met 4. check for infeasibility of the particles; 5. bring infeasible particles to the feasible solution space; 6. sort the solutions based on their fitness value; 7. for each particle 8. if it is the first iteration 9. select a random action; 10. else 11. select the action using the Q-table; 12. end 13. if the action is GWO 14. use GWO operators to update the position of the particle; 15. else if the action is SFS 16. use SFS operators to update the position of the particle; 17. else if the action is WCA 18. use WCA operators to update the position of the particle; 19. else if the action is PSO 20. use PSO operators to update the position of the particle; 21. else if the action is MFO 22. use MFO operators to update the position of the particle; 23. else if the action is SCA 24. use SCA operators to update the position of the particle; 25. end 26. check for infeasibility of the particle; 27. bring infeasible particle to the feasible solution space; 28. calculate the objective function value of the particle; 29. determine the reward/punishment value; 30. determine the max Q-value for the next state; 31. update a(t); 32. update the Q-table; 33. update the current state, s(t) = s(t + 1); 34. end 35. end 36. return the best solution

4 Results and discussion

Metaheuristics are approximation algorithms that are based on randomized movements. Therefore, their performance may differ from one problem to another. Thus, to show the efficiency of a metaheuristic algorithm, we should apply them to many benchmark functions. For this purpose, we use 29 benchmarks, including Unimodal, multimodal, fixed dimensional multimodal, and hybrid composite functions that are among the most complex benchmarks in the literature. For more details regarding these benchmarks, see “Appendix 1”.

We compare our algorithm to state-of-the-art methods, including Crow Search Algorithm (CSA), Artificial Bee Colony (ABC), Cuckoo Search (CS), Genetic Algorithm (GA), Moth-Flame optimization (MFO), Gravitational Search Algorithm (GSA), and Dragonfly Algorithm (DA). In order to solve the benchmark functions, we considered 15,000 Number of Function Evaluations (NFEs) to perform a reasonable assessment. Besides, to draw a reliable conclusion, we apply each algorithm on each benchmark 30 times, and report mean, standard deviation, worst and best results. Table 2 provides more details about the values of the main parameters of the algorithms.

Table 2 The values of the parameters of the algorithms

Full size table

In the first step, we assess the performance of the HQLA on unimodal benchmarks (F1–F7). Figure 6 depicts a 2D representation of these benchmark functions.

Unimodal benchmarks do not have several local optima and are considered to assess the exploration ability of metaheuristic algorithms. This set of test suites are difficult to solve since the algorithms should first locate the global optima approximately and then perform exploitation to provide the best approximation of the location of the global optima. Table 3 provides detailed information on the performance of the algorithms in unimodal benchmarks. Based on the results of Table 3, we observe that the HQLA performs the best in most of the benchmarks. In F1–F5 and F7, the HQLA outperforms all other algorithms by obtaining the best possible solution for all benchmarks. In F6, HQLA ranks third in providing the best solution for this benchmark. Besides, HQLA provides the lowest standard deviation that shows very low variability in the performance of this algorithm. Moreover, considering the boxplot of the results of Table 3 present in Fig. 7a–c, it becomes apparent that HQLA has the lowest and narrowest boxplot among the algorithms.

Table 3 Computational outcomes of the algorithms in solving unimodal benchmark functions

Full size table

Furthermore, Fig. 8a–c, that present the convergence plots of the methods, disclose that the HQLA can maintain a perfect balance among exploration and exploitation of the solution space. In Fig. 8a–c, we observe that the HQLA can continually improve the best solution attained in each iteration by choosing the best operator to explore the solution space. The learning process in HQLA enables the algorithm to determine the best operator to change the location of the particles in the solution region by evaluating the efficiency of each updating mechanism. Based on these observations, we conclude that HQLA is a reliable technique for this family of benchmark functions.

The second and third family of the benchmarks are multimodal and fixed-dimension multimodal benchmarks. These benchmarks contain several local optima that make the solution process a complicated task. To perform well in solving these benchmark functions, the algorithms should maintain an excellent balance among exploration and exploitation. This will help the algorithms avoid local optimum. Figures 9 and 10 shows a schematic view of these benchmark functions in 2D.

We provide detailed information on the performance of the algorithms on solving multimodal and fixed-dimension multimodal benchmarks in Tables 4 and 5.

Table 4 Computational outcomes of the algorithms in solving multimodal benchmark functions

Full size table

Table 5 Computational outcomes of the algorithms in solving multimodal benchmarks

Full size table

Based on the results, we observe that in F8–F12, F14, and F16–F23 (14 out of 16) benchmark functions, the HQLA outperforms other algorithms considering average, standard deviation, best and worst values over 30 repetitions. In some of these benchmarks, such as F9 and F11, the standard deviation of the results provided by HQLA is zero. In these benchmark functions, the HQLA achieves global optima in all the repetitions. These results indicate that HQLA is a robust and reliable algorithm in solving complex optimization problems. HQLA is able to choose between several operators that enable the algorithms to explore and exploit the resolution region intelligently. The learning process in the HQLA helps the algorithm evaluate the efficiency of the operators and discover the most efficient operator (action) for any problem. Besides, Fig. 8a–c depict the perfect balance among exploration and exploitation in the performance of the HQLA throughout iterations. Moreover, in F13 and F15, the HQLA ranked second among all algorithms, which shows its high capability in solving optimization problems. Furthermore, Fig. 7a–c, disclose that the boxplot of the outcomes of the HQLA is narrower and lower than any other algorithm that highlights the superiority of the proposed methodology over existing approaches.

The last family of the benchmark functions is hybrid composite benchmarks that are the most challenging (Liang et al. 2005). Figure 11 presents a graphical representation of these benchmark functions in 2D.

Computational results of solving this family of benchmark functions using each algorithm are available in Table 6. Considering the results, we observe that the HQLA provides the best results and outperforms other algorithms in solving hybrid benchmarks. Table 6 shows that the HQLA obtained the lowest average, standard deviation, and best values for these benchmark functions while maintaining the lowest worst value. Besides, Fig. 13 approves this statement by showing that the boxplot of the HQLA is narrower and lower than any other algorithms. Moreover, Fig. 12 shows that the HQLA adjusts exploration and exploitation by intelligently choosing the best operators in each iteration. Based on Fig. 12, we observe that the HQLA can improve the best solution consistently throughout iterations.

Table 6 Computational outcomes of the algorithms in solving hybrid composite benchmarks

Full size table

Although we showed that the HQLA outperforms other state-of-the-art methods in terms of solution performance, we apply Friedman’s test, which is a powerful statistical test, to show the statistical superiority of our algorithm to other methods. Table 7 presents the results of Friedman’s test. Based on the outcomes, the average rank of the HQLA is 3.036207, which is far smaller than that of other algorithms. Therefore, from a statistical point of view, HQLA performs significantly better than all other algorithms. We note that we perform the Friedman’s test at a 95 percent confidence level.

Table 7 Results of the Friedman’s test

Full size table

5 Multi-criteria parameter estimation and curve fitting

Quebec is one of Canada’s provinces that is dealing with the COVID-19 epidemic triggered by the SARS-CoV-2 virus. The province has been reported the most COVI-19 cases that account for more than 63,713 confirmed cases of COVID-19 and 5770 death cases by the disease. On April 29, 2020, Quebec hospitals announced that the healthcare system capacity could not respond to the influx of the COVID-19 patients to the hospitals (Weeks and Tu Thanh 2020; The Globe and Mail n.d.). On June 28, 2020, Montreal’s Emergency Rooms (ERs) reported near capacity status due to limited resources (Fahmy and Ross 2020; CTV News. n.d.). These highlight the need for a methodology that could accurately predict the future trend of the pandemic in the province that enables the policymakers to optimize the resource allocation to avoid loss of lives, as many scientists declared that resource shortages such as ventilator shortages are the difference between life and death for patients (Kliff et al. 020; The New York Time. n.d.). Developing new methodologies to predict pandemic growth is essential to optimize resource allocation and determine the optimal time to implement lockdown measures. On the other hand, we could optimize resource allocation for life-threatening cases admitted to ICUs and reduce the disease’s fatality rate.

In this section, we use the most recent and accurate model called SIDARTHE, presented by Giordano et al. (2020) published in Nature Medicine. The scientists showed that using the model, we could predict the future trend of the pandemic accurately. However, the solution to the problem is a cumbersome task. For more information on the SIDARTHE model, see “Appendix 2”. In this research, we used the SIDARTHE model and applied it to real data from Quebec, Canada. In order to solve the model, we utilize HQLA. Figure 14 shows the convergence plot of HQLA throughout iterations. We divided the period (from January 25, 2020 (day 1) to July 19, 2020 (day 176)) into six stages as follows in which the Quebec government applied specific restrictions to control the pandemic:

Stage 1 (from January 25, 2020, to March 15, 2020)
Stage 2 (from March 15 to March 24)
Stage 3 (from March 24 to March 28)
Stage 4 (from March 28 to April 2)
Stage 5 (from April 2 to April 13)
Stage 6 (after April 13)

We note that we considered the sum of mean square errors as the objective function value for our model, and its optimal value for our case study is 6.29E−06. Based on the outcomes, we observe that the HQLA can solve the problem very efficiently. Table 8 shows detailed statistics about the optimized factors of the model. Now that the results of Table 8 are the output of the optimization process and solving the SIDARTHE model for Quebec data using HQLA.

Table 8 Results of fitting the model to real-data for Quebec

Full size table

In the following, we compared our results with actual data from Quebec to validate our results. Figure 15 shows the predicted and actual data from Quebec. In Fig. 15, we accurately predict the number of infected cases, recovered cases, and cumulative diagnosed cases. Based on the outcomes, we conclude that the HQLA is an efficient solution methodology for the problem.

As mentioned earlier, we separated the planning horizon into six phases in which the Quebec government announced detailed limitations to control the epidemic. The first case of COVID-19 was detected in Quebec, Canada, on February 27, 2020 (Lapierre 2020). In the first phase, the transmission rate was considered low. Based on our results, we observe that the transmission rates were low at the first stage, and the reproduction rate was $ R_{0} = 1.0998 $. Quebec province first announced a state of emergency on March 12, 2020. We consider March 15, 2020, to March 24, 2020, as the second phase of the pandemic due to a drastic increase in the number of cases. On March 15, the Quebec government ordered the closure of all recreational and entertainment facilities (Gouvernement Du Québec 2020).

Following the quick progress in the number of infected cases, on March 27, Montreal declared a local state of emergency, and the Quebec government ordered the closure of all universities and schools. Besides, On March 20, the province banned indoor gatherings. In the second phase of the pandemic, our study estimates a reproduction rate of $ R_{0} = 3. 80 2 8 $ for Quebec province. From March 24, 2020, to March 28, 2020, Quebec received more COVID-19 test kits enabling the healthcare authorities to perform more tests and determine the infected cases. In this phase, we approximate the reproduction rate equal to $ R_{0} = 4.6096 $.

From March 28, 2020, to April 2, 2020, strict actions were taken by the government decreased the reproduction rate to $ R_{0} = 1.1193 $. The reproduction rate dropped over the next period to $ R_{0} = 1.0248 $ from April 2, 2020, to April 13, 2020. After April 13, 2020, the reproduction rate was considered to be $ R_{0} = 0.7782 $. In order to present the progress of the pandemic in the next few months, we extended our prediction to the next 365 days, as is shown in Fig. 16a, b.

Based on our results, considering the current social distancing and limitations, we will experience a significant decrease in the number of cases in the next few months if and only if strict measures such as partial lockdown remain in place for the next few months.

6 Sensitivity analyses and managerial insights

In the previous section, we reflected the pandemic growth over the next few months, considering the current partial lockdown and closure measures. However, in May 2020, the Quebec government ordered a gradual reopening of the businesses. Consequently, it is vital to discover how the reopening of the businesses will impact forthcoming circumstances. In this section, we examine the consequence of variation in transmission rates on the progress of the pandemic. Therefore, we augmented the parameters $ \alpha , \beta , \gamma , \delta , \,\,{\text{and}}\,\, \varepsilon $ and explore their impact on the number of infected, recovered, cumulative diagnosed, and death cases. We portray the outcomes in Figs. 17, 18, 19 and 20. The outcomes prove that the parameter $ \alpha $ has a substantial effect on the pandemic growth, and increasing this parameter increases the number of infected, recovered, cumulative diagnosed, and death cases. Besides, increasing $ \beta , \gamma , $ and $ \delta $ increases the number of infected, recovered, cumulative diagnosed, and death cases. Moreover, increasing the parameter $ \varepsilon $ remarkably reduces the number of infected, recovered, cumulative diagnosed, and death cases. Therefore, to decrease the spread of the virus and stop the pandemic, we need to reduce the transmission rate of the infection by applying significant social distancing and behavioral measures while increasing the detection rate of asymptomatic cases.

Based on our results, we observe that from day 250, we will see an increase in the number of infected cases from October 1, 2020, in Quebec by limiting the lockdown measures. Therefore, starting October 1, 2020, Quebec will experience a significant increase in the number of infected cases.

7 Conclusion

COVID-19 is a severe health threat, and the condition is growing every day. Therefore, presenting new procedures to model and predict the COVID-19 pandemic is vital. Prediction of the COVID-19 pandemic will enable policymakers to optimize the use of healthcare system capacity and resource allocation to minimize the fatality rate. In this research, we design a new hybrid reinforcement learning-based algorithm capable of solving complex optimization problems. We applied our algorithm to several well-known benchmarks and show that the presented methodology provides high-quality solutions for the most complex problems in the literature. Besides, we showed the superiority of the offered method to state-of-the-art methods through several measures.

Moreover, to demonstrate the efficiency of the proposed methodology in optimizing real-world problems, we implemented our approach to the most recent data from Quebec, Canada, to predict the COVID-19 outbreak. Our algorithm, combined with the most recent mathematical model for COVID-19 pandemic prediction, accurately reflected the future trend of the pandemic. Furthermore, we analyzed several scenarios for deepening our insight into pandemic growth. We determined essential factors and delivered various managerial insights to help policymakers making decisions regarding future social measures.

Our results showed that the transmission rate caused by the interaction of a susceptible case with an asymptomatic case has the most significant effect on future trends. Increasing this parameter can significantly increase the number of infected, recovered, cumulative diagnosed, and death cases. Besides, increasing the transmission rate due to contact of a susceptible case with a diagnosed, ailing, and recognized case increases the number of infected, recovered, cumulative diagnosed, and death cases. Moreover, increasing the parameter $ \varepsilon $ remarkably reduces the number of infected, recovered, cumulative diagnosed, and death cases. Therefore, to decrease the spread of the virus and stop the pandemic, we need to reduce the transmission rate of the infection by applying significant social distancing and behavioral measures while increasing the detection rate of asymptomatic cases.

As future research, from a mathematical modeling viewpoint, it would be worthwhile to consider stochasticity in the proposed model (Belen et al. 2011; Özmen et al. 2011; Weber et al. 2011). Besides, it would be interesting to use the proposed model to plan for resource management during the pandemic to decrease the fatality rate by increasing the healthcare system capacity. Based on the predictions, healthcare managers can plan for testing kit allocation to test centers. Also, it would be worthwhile to consider the age, medical condition, and gender of the infected cases in the model from a modeling perspective. Moreover, it would also be interesting to use the proposed model to plan for managing healthcare resources such as personal protective equipment (PPE) and ventilators during the pandemic.

References

Adam, S. P., Alexandropoulos, S. A. N., Pardalos, P. M., & Vrahatis, M. N. (2019). No free lunch theorem: A review. In Approximation and optimization (pp. 57–82). Springer, Cham.
Ahmadianfar, I., Bozorg-Haddad, O., & Chu, X. (2020). Gradient-based optimizer: A new metaheuristic optimization algorithm. Information Sciences, 540, 131–159.
Google Scholar
Akhtar, E., & Farrukh, S. M. (2017). Practical Reinforcement Learning: Develop self-evolving, intelligent agents with OpenAI Gym, Python and Java. Packt Publishing.
Askarzadeh, A. (2016). A novel metaheuristic method for solving constrained engineering optimization problems: Crow search algorithm. Computers & Structures, 169, 1–12.
Google Scholar
Belen, S., Kropat, E., & Weber, G. W. (2011). On the classical Maki–Thompson rumour model in continuous time. Central European Journal of Operations Research, 19(1), 1–17.
Google Scholar
Bertsekas, D. P. (2019). Reinforcement learning and optimal control. Belmont, MA: Athena Scientific.
Google Scholar
CARLY WEEKS & TU THANH HA. (n.d.). Quebec hospitals struggling with influx of COVID-19 patients even as province moves to reopen—The Globe and Mail. Retrieved November 1, 2020, from https://www.theglobeandmail.com/canada/article-quebec-hospitals-struggling-with-influx-of-covid-19-patients-even-as/.
Cerby, V. (1985). Thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45, 41–51.
Google Scholar
Coronavirus Disease (COVID-19) in Québec | Gouvernement du Québec. (n.d.). Retrieved October 11, 2020, from https://www.quebec.ca/en/health/health-issues/a-z/2019-coronavirus/.
Cui, L., Hu, H., Yu, S., Yan, Q., Ming, Z., Wen, Z., et al. (2018). DDSE: A novel evolutionary algorithm based on degree-descending search strategy for influence maximization in social networks. Journal of Network and Computer Applications, 103, 119–130.
Google Scholar
Du, H., Wu, X., & Zhuang, J. (2006). Small-world optimization algorithm for function optimization. In International conference on natural computation (pp. 264–273). Berlin: Springer.
Eberhart, R., & Kennedy, J. (1995, November). Particle swarm optimization. In Proceedings of the IEEE international conference on neural networks (Vol. 4, pp. 1942-1948). Citeseer.
Eskandar, H., Sadollah, A., Bahreininejad, A., & Hamdi, M. (2012). Water cycle algorithm—A novel metaheuristic optimization method for solving constrained engineering optimization problems. Computers & Structures, 110, 151–166.
Google Scholar
Fahmy, G., & Ross, S. (2020). CTV News Montreal Digital Reporter Montreal ERs are near capacity with delayed health problems, a worrying sign as second wave looms. Montreal. https://montreal.ctvnews.ca/montreal-ers-are-near-capacity-with-delayed-health-problems-a-worrying-sign-as-second-wave-looms-1.5003710?cache=%3FclipId%3D68597.
Fogel, L. J., Owens, A. J., & Walsh, M. J. (1966). Artificial intelligence through simulated evolution. London: Wiley.
Google Scholar
Formato, R. A. (2007). Central force optimization. Progress in Electromagnetics Research, 77, 425–491.
Google Scholar
Giordano, G., Blanchini, F., Bruno, R., Colaneri, P., Di Filippo, A., Di Matteo, A., et al. (2020). Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy. Nature Medicine, 1(26), 855–860.
Google Scholar
Hashemi Doulabi, H., Jaillet, P., Pesant, G., & Rousseau, L. M. (2020a). Exploiting the structure of two-stage robust optimization models with exponential scenarios. INFORMS Journal on Computing. https://doi.org/10.1287/ijoc.2019.0928.
Article Google Scholar
Hashemi Doulabi, H., Pesant, G., & Rousseau, L. M. (2020b). Vehicle routing problems with synchronized visits and stochastic travel and service times: Applications in healthcare. Transportation Science, 54(4), 1053–1072.
Google Scholar
Hatamlou, A. (2013). Black hole: A new heuristic optimization approach for data clustering. Information Sciences, 222, 175–184.
Google Scholar
Health Canada. (2020). Drugs and vaccines for COVID-19: Authorized clinical trials [Decisions]. Aem. https://www.canada.ca/en/health-canada/services/drugs-health-products/covid19-industry/drugs-vaccines-treatments/list-authorized-trials.html.
Holland, J. H. (1992). Genetic algorithms. Scientific American, 267(1), 66–73.
Google Scholar
Hoursan, H., Farahmand, F., & Ahmadian, M. T. (2020). A three-dimensional statistical volume element for histology informed micromechanical modeling of brain white matter. Annals of Biomedical Engineering, 48(4), 1337–1353.
Google Scholar
Karaboga, D., & Basturk, B. (2007). A powerful and efficient algorithm for numerical function optimization: Artificial bee colony (ABC) algorithm. Journal of Global Optimization, 39(3), 459–471.
Google Scholar
Kaveh, A., & Dadras, A. (2017). A novel meta-heuristic optimization algorithm: Thermal exchange optimization. Advances in Engineering Software, 110, 69–84.
Google Scholar
Kaveh, A., Seddighian, M. R., & Ghanadpour, E. (2020). Black Hole Mechanics Optimization: A novel meta-heuristic algorithm. Asian Journal of Civil Engineering, 21(7), 1129–1149.
Google Scholar
Kaveh, A., & Talatahari, S. (2010). A novel heuristic optimization method: charged system search. Acta Mechanica, 213(3–4), 267–289.
Google Scholar
Khalilpourazari, S., & Khalilpourazary, S. (2019). An efficient hybrid algorithm based on Water Cycle and Moth-Flame Optimization algorithms for solving numerical and constrained engineering optimization problems. Soft Computing, 23(5), 1699–1722.
Google Scholar
Khalilpourazari, S., Khalilpourazary, S., Çiftçioğlu, A. Ö., & Weber, G. W. (2020a). Designing energy-efficient high-precision multi-pass turning processes via robust optimization and artificial intelligence. Journal of Intelligent Manufacturing. https://doi.org/10.1007/s10845-020-01648-0.
Article Google Scholar
Khalilpourazari, S., Naderi, B., & Khalilpourazary, S. (2020b). Multi-objective stochastic fractal search: A powerful algorithm for solving complex multi-objective optimization problems. Soft Computing, 24(4), 3037–3066.
Google Scholar
Khalilpourazari, S., & Pasandideh, S. H. R. (2019). Sine–cosine crow search algorithm: Theory and applications. Neural Computing and Applications, 32, 7725–7742.
Google Scholar
Khalilpourazari, S., Pasandideh, S. H. R., & Niaki, S. T. A. (2019). Optimizing a multi-item economic order quantity problem with imperfect items, inspection errors, and backorders. Soft Computing, 23(22), 11671–11698.
Google Scholar
Khalilpourazari, S., Soltanzadeh, S., Weber, G. W., & Roy, S. K. (2020c). Designing an efficient blood supply chain network in crisis: Neural learning, optimization and case study. Annals of Operations Research, 289(1), 123–152.
Google Scholar
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. science, 220(4598), 671–680.
Google Scholar
Kliff, S., Satariano, A., Silver-Greenberg, J., & Kulish, N. (2020). There aren’t enough ventilators to cope with the coronavirus—The New York Times. (n.d.). Retrieved November 1, 2020, from https://www.nytimes.com/2020/03/18/business/coronavirus-ventilator-shortage.html.
Koza, J. R., & Koza, J. R. (1992). Genetic programming: On the programming of computers by means of natural selection (Vol. 1). London: MIT Press.
Lapierre, M. (2020). Quebec’s first case of COVID-19 has been confirmed. Montreal. https://montreal.ctvnews.ca/quebec-s-first-case-of-covid-19-has-been-confirmed-1.4831088.
Li, M. D., Zhao, H., Weng, X. W., & Han, T. (2016). A novel nature-inspired algorithm for optimization: Virus colony search. Advances in Engineering Software, 92, 65–88.
Google Scholar
Liang, J. J., Suganthan, P. N., & Deb, K. (2005). Novel composition test functions for numerical global optimization. In Proceedings 2005 IEEE swarm intelligence symposium, 2005. SIS 2005. (pp. 68–75). IEEE.
Lotfi, R., Mehrjerdi, Y. Z., Pishvaee, M. S., Sadeghieh, A., & Weber, G. W. (2019). A robust optimization model for sustainable and resilient closed-loop supply chain network design considering conditional value at risk. Numerical Algebra, Control and Optimization. https://doi.org/10.3934/naco.2020023.
Article Google Scholar
Lotfi, R., Mostafaeipour, A., Mardani, N., & Mardani, S. (2018). Investigation of wind farm location planning by considering budget constraints. International Journal of Sustainable Energy, 37(8), 799–817.
Google Scholar
Lotfi, R., Weber, G. W., Sajadifar, S. M., & Mardani, N. (2020). Interdependent demand in the two-period newsvendor problem. Journal of Industrial and Management Optimization, 16(1), 117.
Google Scholar
Martínez-Álvarez, F., Asencio-Cortés, G., Torres, J. F., Gutiérrez-Avilés, D., Melgar-García, L., Pérez-Chacón, R.,… & Troncoso, A. (2020). Coronavirus Optimization Algorithm: A bioinspired metaheuristic based on the COVID-19 propagation model. arXiv preprint arXiv:2003.13633.
Mirjalili, S. (2015a). Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowledge-Based Systems, 89, 228–249.
Google Scholar
Mirjalili, S. (2015b). The ant lion optimizer. Advances in Engineering Software, 83, 80–98.
Google Scholar
Mirjalili, S. (2016a). Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Computing and Applications, 27(4), 1053–1073.
Google Scholar
Mirjalili, S. (2016b). SCA: A sine cosine algorithm for solving optimization problems. Knowledge-Based Systems, 96, 120–133.
Google Scholar
Mirjalili, S., & Lewis, A. (2016). The whale optimization algorithm. Advances in Engineering Software, 95, 51–67.
Google Scholar
Mirjalili, S., Gandomi, A. H., Mirjalili, S. Z., Saremi, S., Faris, H., & Mirjalili, S. M. (2017). Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems. Advances in Engineering Software, 114, 163–191.
Google Scholar
Mirjalili, S., Mirjalili, S. M., & Hatamlou, A. (2016a). Multi-verse optimizer: A nature-inspired algorithm for global optimization. Neural Computing and Applications, 27(2), 495–513.
Google Scholar
Mirjalili, S., Mirjalili, S. M., & Lewis, A. (2014). Grey Wolf optimizer. Advances in Engineering Software, 69, 46–61.
Google Scholar
Mirjalili, S., Saremi, S., Mirjalili, S. M., & Coelho, L. D. S. (2016b). Multi-objective grey wolf optimizer: A novel algorithm for multi-criterion optimization. Expert Systems with Applications, 47, 106–119.
Google Scholar
Moghaddam, F. F., Moghaddam, R. F., & Cheriet, M. (2012). Curved space optimization: A random search based on general relativity theory. arXiv preprint arXiv:1208.2214.
Mohammadi, M., & Khalilpourazari, S. (2017). Minimizing makespan in a single machine scheduling problem with deteriorating jobs and learning effects. In Proceedings of the 6th international conference on software and computer applications (pp. 310–315).
Özmen, A., Weber, G. W., Batmaz, İ., & Kropat, E. (2011). RCMARS: Robustification of CMARS with different scenarios under polyhedral uncertainty set. Communications in Nonlinear Science and Numerical Simulation, 16(12), 4780–4787.
Google Scholar
Price, K. V. (2013). Differential evolution. In Handbook of optimization (pp. 187–214). Berlin: Springer.
Public Health Authority of Canada. (2020). Coronavirus disease (COVID-19): Symptoms and treatment [Education and awareness]. Aem. https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection/symptoms.html.
Rashedi, E., Nezamabadi-Pour, H., & Saryazdi, S. (2009). GSA: A gravitational search algorithm. Information Sciences, 179(13), 2232–2248.
Google Scholar
Rechenberg, I. (1978). Evolutions strategien. In Simulations methoden in der Medizin und Biologie (pp. 83–114). Berlin: Springer.
Salimi, H. (2015). Stochastic fractal search: A powerful metaheuristic algorithm. Knowledge-Based Systems, 75, 1–18.
Google Scholar
Sangaiah, A. K., Goli, A., Tirkolaee, E. B., Ranjbar-Bourani, M., Pandey, H. M., & Zhang, W. (2020). Big data-driven cognitive computing system for optimization of social media analytics. IEEE Access, 8, 82215–82226.
Google Scholar
Saremi, S., Mirjalili, S., & Lewis, A. (2017). Grasshopper optimisation algorithm: Theory and application. Advances in Engineering Software, 105, 30–47.
Google Scholar
Shareef, H., Ibrahim, A. A., & Mutlag, A. H. (2015). Lightning search algorithm. Applied Soft Computing, 36, 315–333.
Google Scholar
Simon, D. (2008). Biogeography-based optimization. IEEE Transactions on Evolutionary Computation, 12(6), 702–713.
Google Scholar
Tirkolaee, E. B., Goli, A., Faridnia, A., Soltani, M., & Weber, G. W. (2020a). Multi-objective optimization for the reliable pollution-routing problem with cross-dock selection using Pareto-based algorithms. Journal of Cleaner Production, 276, 122927.
Google Scholar
Tirkolaee, E. B., Goli, A., & Weber, G. W. (2020b). Fuzzy mathematical programming and self-adaptive artificial fish swarm algorithm for just-in-time energy-aware flow shop scheduling problem with outsourcing option. IEEE Transactions on Fuzzy Systems, 28(11), 2772–2783.
Google Scholar
Topal, A. O., & Altun, O. (2016). A novel meta-heuristic algorithm: Dynamic virtual bats algorithm. Information Sciences, 354, 222–235.
Google Scholar
Wang, S. Y., Wang, L., Liu, M., & Xu, Y. (2013). An effective estimation of distribution algorithm for solving the distributed permutation flow-shop scheduling problem. International Journal of Production Economics, 145(1), 387–396.
Google Scholar
Weber, G. W., Defterli, O., Gök, S. Z. A., & Kropat, E. (2011). Modeling, inference and optimization of regulatory networks based on time series data. European Journal of Operational Research, 211(1), 1–14.
Google Scholar
WHO. (2020). Q&A on coronaviruses. Retrieved October 11, 2020 from https://www.who.int/news-room/q-a-detail/q-a-coronaviruses.
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1, 67–82.
Google Scholar
Yang, X. S., & Deb, S. (2009). Cuckoo search via Lévy flights. In 2009 World congress on nature & biologically inspired computing (NaBIC) (pp. 210–214). IEEE.
Zamli, K. Z., Din, F., Ahmed, B. S., & Bures, M. (2018). A hybrid Q-learning sine–cosine-based strategy for addressing the combinatorial test suite minimization problem. PLoS ONE, 13(5), e0195675.
Google Scholar
Zare Mehrjerdi, Y., & Lotfi, R. (2019). Development of a mathematical model for sustainable closed-loop supply chain with efficiency and resilience systematic framework. International Journal of Supply and Operations Management, 6(4), 360–388.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical, Industrial and Aerospace Engineering, Concordia University, Montreal, Canada
Soheyl Khalilpourazari & Hossein Hashemi Doulabi
Interuniversity Research Centre on Enterprise Networks, Logistics and Transportation (CIRRELT), Montreal, Canada
Soheyl Khalilpourazari & Hossein Hashemi Doulabi

Authors

Soheyl Khalilpourazari
View author publications
You can also search for this author in PubMed Google Scholar
Hossein Hashemi Doulabi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soheyl Khalilpourazari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

See Table A1.

Table A1 The unimodal benchmark functions

Full size table

Appendix 2

In this appendix, we describe the SIDARTHE model proposed by Giordano et al. (2020). The model considers eight states for people, including Susceptible, Infected, Diagnosed, Ailing, Recognized, Threatened, Healed, Extinct. The model reflects cases in different situations, including symptomatic, asymptomatic, detected, undetected, and life-threatening symptoms that required ICU admission. The suggested model includes eight ordinary differential equations to show the progress of the pandemic over time. We use the following notations, as in Table A2, to present the model.

Table A2 Sets, parameters, and state variables of the model

Full size table

Based on the notations mentioned above, Giordano et al. (2020) presented the following model.

$$ \dot{S}\left( t \right) = - S\left( t \right)\left( {\alpha I\left( t \right) + \beta D\left( t \right) + \gamma A\left( t \right) + \delta R\left( t \right)} \right) $$

(23)

$$ \dot{I}\left( t \right) = S\left( t \right)\left( {\alpha I\left( t \right) + \beta D\left( t \right) + \gamma A\left( t \right) + \delta R\left( t \right)} \right) - \left( {\varepsilon + \zeta + \lambda } \right)I\left( t \right) $$

(24)

$$ \dot{D}\left( t \right) = \varepsilon I\left( t \right) - \left( {\eta + \rho } \right)D\left( t \right) $$

(25)

$$ \dot{A}\left( t \right) = \zeta I\left( t \right) - \left( {\theta + \mu + \kappa } \right)A\left( t \right) $$

(26)

$$ \dot{R}\left( t \right) = \eta D\left( t \right) + \theta A\left( t \right) - \left( {\nu + \xi } \right)R\left( t \right) $$

(27)

$$ \dot{T}\left( t \right) = \mu A\left( t \right) + \nu R\left( t \right) - \left( {\sigma + \tau } \right)T\left( t \right) $$

(28)

$$ \dot{H}\left( t \right) = \lambda I\left( t \right) + \rho D\left( t \right) + \kappa A\left( t \right) + \xi R\left( t \right) + \sigma T\left( t \right) $$

(29)

$$ \dot{E}\left( t \right) = \tau T\left( t \right) $$

(30)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khalilpourazari, S., Hashemi Doulabi, H. Designing a hybrid reinforcement learning based algorithm with application in prediction of the COVID-19 pandemic in Quebec. Ann Oper Res 312, 1261–1305 (2022). https://doi.org/10.1007/s10479-020-03871-7

Download citation

Accepted: 07 November 2020
Published: 03 January 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10479-020-03871-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Designing a hybrid reinforcement learning based algorithm with application in prediction of the COVID-19 pandemic in Quebec

Abstract

Similar content being viewed by others

Optimization of Mitigation Strategies During Epidemics Using Offline Reinforcement Learning

Exploring optimal control of epidemic spread using reinforcement learning

A general framework for optimising cost-effectiveness of pandemic response under partial intervention measures

1 Introduction

2 Survey on research conducted

3 Algorithm development

3.1 Q-learning

3.2 Grey–Wolf optimizer

3.3 Sine–cosine algorithm

3.4 Moth-flame optimization

3.5 Particle swarm optimization

3.6 Water cycle algorithm

3.7 Gaussian walks and Lévy flight

3.8 The developed hybrid Q-learning based algorithm

4 Results and discussion

5 Multi-criteria parameter estimation and curve fitting

6 Sensitivity analyses and managerial insights

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Designing a hybrid reinforcement learning based algorithm with application in prediction of the COVID-19 pandemic in Quebec

Abstract

Similar content being viewed by others

Optimization of Mitigation Strategies During Epidemics Using Offline Reinforcement Learning

Exploring optimal control of epidemic spread using reinforcement learning

A general framework for optimising cost-effectiveness of pandemic response under partial intervention measures

1 Introduction

2 Survey on research conducted

3 Algorithm development

3.1 Q-learning

3.2 Grey–Wolf optimizer

3.3 Sine–cosine algorithm

3.4 Moth-flame optimization

3.5 Particle swarm optimization

3.6 Water cycle algorithm

3.7 Gaussian walks and Lévy flight

3.8 The developed hybrid Q-learning based algorithm

4 Results and discussion

5 Multi-criteria parameter estimation and curve fitting

6 Sensitivity analyses and managerial insights

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation