COVIDOA: a novel evolutionary optimization algorithm based on coronavirus disease replication lifecycle

This paper presents a novel bio-inspired optimization algorithm called Coronavirus Optimization Algorithm (COVIDOA). COVIDOA is an evolutionary search strategy that mimics the mechanism of coronavirus when hijacking human cells. COVIDOA is inspired by the frameshifting technique used by the coronavirus for replication. The proposed algorithm is tested using 20 standard benchmark optimization functions with different parameter values. Besides, we utilized five IEEE Congress of Evolutionary Computation (CEC) benchmark test functions (CECC06, 2019 Competition) and five CEC 2011 real-world problems to prove the proposed algorithm's efficiency. The proposed algorithm is compared to eight of the most popular and recent metaheuristic algorithms from the state-of-the-art in terms of best cost, average cost (AVG), corresponding standard deviation (STD), and convergence speed. The results demonstrate that COVIDOA is superior to most existing metaheuristics.

Generally speaking, optimization algorithms are classified into three categories: swarm-based, physics-based, and evolutionary algorithms. Swarm-based algorithms such as ABC, PSO, CSO, and CS, mimic how a group of agents would behave with each other and their environment [1]. Based on Newton's gravitational law, physics-based algorithms are based on a mathematical idea or physical processes, such as CFO and GSA [3]. On the other hand, evolutionary algorithms are search methods inspired by biological evolution mechanisms, such as reproduction and mutation [77]. The most popular evolutionary algorithm is GA, inspired by Darwin's theory of biological evolution. As mentioned in [23], evolutionary algorithms have some advantages over other types of optimization algorithms, such as: (1) They are conceptually simple: all evolutionary algorithms have similar necessary steps: initialization, fitness evaluation, selection, crossover, and mutation.  (2) In evolutionary algorithms, the individuals with the highest fitness are selected for reproduction, leading to new individuals' production closer to the optimum solution.
(3) Broad applicability: researchers can apply evolutionary algorithms to any problem formulated in the form of an optimization function. A list of the most popular nature-inspired algorithms is shown in Fig. 1.
Since 2020, the world has suffered from the pandemic of coronavirus disease 2019 . Researchers worldwide are doing their best to understand this novel virus's mechanism and find an effective therapy for this disease [40,73]. More than one researcher discussed the mechanism of the novel Coronavirus from different perspectives in the optimization field. The authors in [47] proposed a bio-inspired metaheuristic algorithm based on the propagation model of coronavirus, and the experimental results showed quite remarkable performance of the algorithm. Al-Betal et al. [4] proposed an optimization algorithm based on herd immunity's effect in tackling the COVID pandemic. The comparative analysis showed that the proposed algorithm yields very competitive results compared to other well-established methods. Another algorithm [27] models the coronavirus distribution process as an optimization problem to minimize the number of COVID-19 infected countries and slow the epidemic.
Once the virus is inside the human body, the most severe problem is replication and transcription, in which new copies of the virus are created and target new healthy cells [49,71]. This paper presents a novel evolutionary optimization algorithm named Coronavirus Disease Optimization Algorithm (COVIDOA).
COVIDOA mimics the attacking behavior of coronavirus inside human cells. It is worth mentioning that almost all kinds of viruses have the same general steps for replication: entry, uncoating, replication, assembly, and virion release. However, replication between viruses greatly varies depending on the genes involved [20].
In addition to the advantages of evolutionary algorithms, COVIDOA has several advantages when compared with other similar mechanisms: 1. Based on the virus's novelty and the lack of research on its various aspects, the reported numerical data about the coronavirus lifecycle may be inaccurate. Therefore, the proposed algorithm parameters, such as the number of virus particles in each generation and the number of viral proteins generated by each particle, haven't been set at fixed values. These reasons give the researchers' flexibility to use the extendable values for the controlling parameters that most fit according to their problem.
2. As mentioned in [8], the mutation rate of coronavirus is 1 9 10 -6 , which is very low; however, the mutation rate in the proposed algorithm is set at a larger value in the range [0.1 0.001], which helps in exploring new promising regions and avoid getting stuck in a local minimum. 3. This study simulates a different virus replication technique known as the frameshifting technique [12,43]. The virus uses frameshifting to create more copies of itself, leading to large-scale changes to polypeptide length and chemical composition. It is considered the most harmful to the molecular evolution of human cell proteins resulting in a non-functional protein that often disrupts the biochemical processes of a cell [59]. Applying the frameshifting technique in the proposed algorithm helps update solutions so that the solutions in each generation will not become too similar, which would allow the algorithm to converge to the global minimum.
The rest of the paper is structured as follows. Section 2 describes the inspiration and mathematical model of the proposed algorithm (COVIDOA). Experiments using test benchmark functions and the obtained results are discussed in Sect. 3. Finally, this study's conclusion and future work are presented in Sect. 4.

Proposed algorithm
In this section, the inspiration and mathematical model of COVIDOA are presented.
The coronavirus consists of a set of genetic instructions inside an oily membrane. These instructions are encoded in 30,000 letters of Ribonucleic Acid (RNA)-a, c, g, and u-then read by the infected cell and translated into many types of virus proteins [8]. Like other Coronaviruses, SARS-CoV-2 (COID-19) has four structural proteins, including the spike (S), envelope (E), and membrane (M) that constitute the viral coat, and the nucleocapsid (N) protein, which encapsulates the viral RNA [12]. Human-to-human transmission of SARS-CoV-2 occurs primarily via respiratory droplets from coughs and sneezes. Complications may include acute respiratory distress syndrome (ARDS), multi-organ failure, septic shock, and death [43].
The most serious problem of the virus is rapid replication, where it creates millions of copies of itself and sends it out to damage as many as possible human healthy cells. The replication mechanism of coronavirus inspires the proposed algorithm. For the virus to replicate, it passes through several stages as follows:

Virus entry and uncoating
For replication, coronavirus needs to use the human cell's protein-making machinery. So, it first needs to gain entry into the cell. The virus contains a set of spike (S) proteins; it uses its spike proteins as a key to getting inside a human cell [9,72]. One spike of the virus binds to a protein called angiotensin-converting enzyme 2 (ACE-2) [67] on the surface of some human cells, as shown in Fig. 3. Coronavirus has a sort of membrane that hides its genetic material from the outside world; human cells have the same membrane that hides their material from the outside world. So, when those two things come together, the virus must find a way to get inside the host cell [65]. Once inside, all structural proteins are removed, and the virus contents, the genomic RNA, will be released into the host cell cytoplasm. This process is called virus uncoating [74], as shown in Fig. 4.

Virus replication
Suppose the virus is getting fused in the cell membrane. Its small genetic material must hijack big cellular machinery in the next step. It will be tedious if the virus has few proteins to hijack the cell. The virus genome starts to find something in the host cell called a ribosome [79], a ribosome turns the virus RNA into many virus proteins through the ribosomal frameshifting technique [36], as shown in Fig. 5.

Ribosomal frameshifting during genome translation
Ribosomal frameshifting is also known as translational frameshifting, a biological phenomenon that occurs during translation [36,54]. This phenomenon creates multiple unique proteins from a single messenger RNA (mRNA) molecule [14]. The translation is when the mRNA (messenger Ribonucleic Acid) molecule provides information to ribosomes, leading to protein molecules' formation [36,37]. At the same time, frameshifting is when a specific reading frame of RNA molecule shifts to another reading frame to provide a new protein sequence [67,72,74]. To understand this, we need to understand translation and frameshifting separately.  The frameshifting technique is presented in Fig. 6. As shown in the figure, in the replication process, the virus's mRNA is translated into viral proteins by reading tri-nucleotides (e.g., ACG). Each tri-nucleotide is translated into single amino acid [52]. Thus, shifting (backward or forward) the reading frame of the nucleotides sequence by any number (not divisible by 3) will create different sequences that will be translated into different viral proteins [68].
Each group of the newly created viral proteins is merged to form a new virion. According to this technique, the virus can create millions of new particles than will damage millions of human cells.
In a translating ribosome, a frameshifting can result in either a nonsense mutation [68,72] or a new protein after the frameshift. The most common types of frameshifting are -1 frameshifting and ? 1 frameshifting [58].
A. -1 Frameshifting In -1 frameshifting, the ribosome slips back one The ribosome starts translation from the ? 1 frame when 0 is the initial position, as shown in Fig. 7b. Because of shifting, the sequences are read differently and translated into different proteins.

Synthesis of both genomic and subgenomic RNA species
The ribosomal frameshifting technique results in two types of RNAs, genomic RNA, and subgenomic RNAs. Genomic RNA is produced through the replication process and becomes the genome of the new virus particle. At the same time, Subgenomic RNAs are translated into many structural proteins (S: spike protein, E: envelope protein, M: membrane protein, and N: nucleocapsid protein). The genomic RNA and subgenomic RNAs are combined to form a viral particle [45,58]. Finally, the new virion is released, trying to hijack new healthy cells, Fig. 8.

Virus mutation
As coronaviruses spread from person to person, they randomly accumulate more mutations to escape from the immune system [45]. Mutations involve changing one or more letters that represent the virus genome. As mentioned in [8], coronavirus has lower mutation rates (&10 -6 per site per cycle) in comparison with influenza (&3 9 10 -5 per site per cycle). The replication stages of coronavirus are summarized in Fig. 9.

Mathematical model of COVIDOA
In this section, the mathematical model of COVIDOA is provided. COVIDOA is summarized in the following steps: 1. Initialization population of solutions is randomly initialized, and the cost is evaluated for each solution.
The solutions are then ordered ascendingly according to the fitness function, and the first solution is considered the best solution. 2. Virus replication phase through frameshifting technique for each solution in the population, a parent is selected using roulette wheel selection [46] then, a. The frameshifting technique is applied to produce several proteins from the selected parent as follows: b. For each protein:

Human cell
i. If the ? 1 frameshifting technique is used, the parent solution's values are shifted in the right direction by 1, and the value in the first position is set at a random value in the range [minVal maxVal] as follows.
where minVal and maxVal are the minima and maximum values for the variables in each solution.
ii. If the -1 frameshifting technique is used, the parent solution values are shifted backward by 1, and the value in the last position is to set a random value in the range [minVal, maxVal].
The symbol S k refers to the kth generated protein, P is the parent solution, and D is the problem dimension (number of variables in each solution). The result of frameshifting represents a new protein sequence. c. New virion formation a uniform crossover is applied to the generated sub-proteins to produce a new virion (new solution).
3. Mutation a mutation operator is applied to the solution created in the previous step to generate a new mutated solution as follows: X is the solution before mutation. Z is the mutated solution, X i ð Þ and Z i ð Þ are the ith element in the old and new solutions, respectively, i = 1, …, D, and r is a random value in the range [minVal, maxVal]. MR is the mutation rate. 4. The objective function is evaluated for the new solution, and the population is updated for the next generation (the solutions with the highest fitness remain, and the others are removed). 5. Repeat steps (2-4) for the new population until termination criteria are achieved. For example, the maximum number of iterations is reached. 6. Output the best solution found so far.
The flowchart of the proposed algorithm is shown in Fig. 10.

Parameters of the proposed algorithm
The parameters of the proposed algorithm are suggested as follows: • Max_Iter maximum number of iterations. The pseudocode of the proposed algorithm is as follows: • Shifting a number that represents the type of frameshifting used. For example, shiftingNo = 1 means that the ? 1 frameshifting technique is used. We noticed that the ? 1 frameshifting technique yields the best results. • numOfProtiens number of proteins generated during virus replication in the proposed algorithm, numOfProteins is 2.
3 Experimental results and discussion Twenty standard optimization functions from the literature are discussed and used to test the proposed algorithm's efficiency. These functions are classified into four groups: unimodal, multimodal, fixed-dimension, and n-dimensional functions [35,39]. In fixed-dimension problems, the number of design variables (problem dimension) is fixed, while the other n-dimension problems use any design variables. A multimodal function has multiple (at least locally optimum) solutions instead of a unimodal function with a single optimum solution [35]. As in '' Table 11 in the Appendix'', the chosen optimization functions are described in terms of the function name, formula, problem dimension (D), range of possible values, the global optimum, and the group of benchmark functions to which it belongs.

II. IEEE CEC 2019 benchmark problems
In addition to the classical benchmark functions, five CEC benchmark functions are utilized for evaluation. These are a group of modern test functions known as ''The 100-Digit Challenge'' intended to be used in single objective numerical optimization IEEE competitions [2]. As shown in '' Table 12 in the Appendix'', these functions are described in terms of problem dimension, range of possible values, and the global optimum (https://www. mathworks.com/) [32].  PSO [44] FPA [75] GWO [51] WOA [50] CHIO [4] SOA [19] Proposed COVIDOA    For further evaluation, COVIDOA was applied to five real-world optimization problems. These are bound-constrained real-world optimization problems selected from the CEC 2011 Competition on Testing Evolutionary Algorithms on Real-World Optimization. These problems are as follows [16]
The obtained results change at each run in optimization algorithms due to the random process. The commonly used number of runs is 30, which would give acceptable statistical precision. So, the proposed algorithm and the state-ofthe-art algorithms are run 30 times.
The proposed and state-of-the-art algorithms use Max_Iter = 500 and PopNo = 1000 for the classical benchmark functions. The comparison is made regarding optimum cost, average cost, standard deviation (STD), and convergence speed. The authors downloaded the source code of the state-of-the-art optimization algorithms from the MATLAB website.
Tables 1, 2, 3 and 4 show the results of the best cost, average cost, standard deviation, and convergence speed, respectively, for the 20 classical benchmark functions. The best-obtained results in all the following tables are highlighted in bold. Table 1 shows that the proposed algorithm reaches the optimum global cost in 18 of 20 problems and gets very close to the global optimum in the two remaining problems. Table 2 proves the COVIDOA algorithm's efficiency in terms of the average cost. It reaches the minimum average cost in 17 from 20 problems and the second minimum average cost in three. The third criterion is STD, which shows how the cost values are far from the average cost. Low STD values mean the cost values over the iterations are clustered closely around the average cost. Table 3 shows that the COVIDOA algorithm reaches the minimum STD values in 17 of 20 problems, the second minimum in two, and the third minimum in two, which means that the results of COVIDOA are more reliable than the other algorithms with higher STD values. Compared with the recently proposed algorithm, CHIO, which simulates herd immunity's effect in tackling the -COVID pandemic, COVIDOA is the best. As shown in Tables 1, 2, 3 and 4 and Figs. 11, 12, 13 and 14, CHIO reaches the minimum optimum cost in seven benchmark b Fig. 11 Comparison of convergence curves of COVIDOA and stateof-the-art algorithms for group 1 of the test problems Compared with PSO, GWO, and WOA, COVIDOA is superior according to most of the test problems' best cost, average cost, and STD values. It has a higher convergence speed as it reaches the global minimum after the first few iterations, as in functions (F3, F8, F7, F15, and F16).
The curves in Figs. 11 and 12 represent the relationship between the iterations and the corresponding best cost for the classical test functions. The obtained results using the selected test problems are divided into two groups and displayed in Figs. 11 and 12. Figure 11 represents the test problems for which the COVIDOA algorithm outperforms the other algorithms. In contrast, Fig. 12 shows the results of test problems in which the COVIDOA algorithm has a performance very close to the others.
Additionally, to prove the results' statistical significance, the test results of the 20 classical benchmark functions are compared using Wilcoxon rank-sum test at the 5% significance level [18]. A null hypothesis is a type of hypothesis used in statistics that assumes no significant difference between the two methods' average values. Fig. 13 Comparison of convergence curves of COVIDOA and state-of-the-art algorithms for CEC benchmark functions A small p-value (typically B 0.05) indicates strong evidence against the null hypothesis [70]. Table 5 introduces the p values computed by Wilcoxon rank-sum test that compares the COIDOA with eight wellknown metaheuristic algorithms for the 20 classical benchmark functions. We observed from Table 5 that all p values are less than a 5% significance level for all comparative algorithms, strong evidence against the null hypothesis. Therefore, we conclude that the COVIDOA is better than all other comparative algorithms.
CEC benchmark functions, COVIDOA, and state-ofthe-art algorithms search for the optimum cost for 250 iterations with 1000 solutions in each generation. The results of the best cost, average cost, and STD values are discussed in Table 6, and the convergence curves are shown in Fig. 13. COVIDOA is superior to the other algorithms in CEC01, CEC06, and CEC01. The CEC03 problem reaches the minimum best cost and the second minimum average cost ad STD value. In the case of CEC07, however, it is not the best; it achieves excellent results compared to GA, FPA, GWO, WOA, SOA, and CHIO algorithms.
All test results for the CEC benchmark functions were compared using the Wilcoxon rank-sum test to prove their statistical significance. Table 7 shows the p values computed by Wilcoxon rank-sum test that compares the COI-DOA with other well-known algorithms for CEC benchmark functions. It is evident from Table 7 that all p values are less than 5% which proves the statistical significance of COVIDOA.
To test the impact of changing parameter values on the performance of OVIDOA, we used nine different scenarios by changing the values of the parameters MR (Mutation Rate) and numOfProtiens. We utilized the values of 0.1, 0.01, ad 0.001 for MR, 2, 4, and 6 for numOfProtiens which produces nine scenarios, as shown in Table 8. The results of each scenario on the selected five IEEE CEC benchmark problems are presented in Table 9. We noticed that scenario 1 (MR = 0.1 and numOfProtiens = 2) has better results, followed by scenario 4. The common between these two scenarios is MR = 0.1 which represents a higher mutation rate. This comparison shows that higher MR values are better for improving the performance of the proposed algorithm.  For testing COVIDOA on CEC real-world problems, we obtain our results over 500 iterations. The proposed and state-of-the-art algorithms were run 25 independent times as suggested by IEEE-CEC 2011 Competition [16]. Table 10 and Fig. 14 show the results of the selected CEC real-world problems. The proposed algorithm achieves the optimum best cost, average cost, and STD values for all five selected problems.
Although the general steps of COVIDOA and other evolutionary algorithms, such as GA and DE, are very similar, COVIDOA is superior to them, as shown in Tables 1, 2, 3 , 4, 5, 6, 7, 8, 9 and 10. This progress is caused by the additional step proposed in the replication phase of COVIDOA, the frameshifting technique. Adding frameshifting technique in the replication process helps   Table 10 The best, average, and STD results of COVIDOA and the state-of-the-art algorithms for CEC 2011 real-world problems Problem Meric Algorithms GA [26] D E [ 63] PSO [44] FPA [75] GWO [51] WOA [50] CHIO [4] SOA [19] COVIDOA

Lennard-Jones Potential Problem
Best

Explorations and exploitation capabilities of COVIDOA
It is essential to test the efficiency of the proposed algorithm. In other words, it is necessary to test its exploration and exploitation capabilities. In exploration, the algorithm searches for new solutions in new regions, while exploitation means using existing solutions and improving their fitness [21].

Convergence of COVIDOA
In Table 4, the convergence speed of COVIDOA and the other algorithms for the classical benchmark functions are classified into three groups: Fast, Moderate, and Low, where algorithms that reach the minimum cost in the first 100 iterations are classified as fast convergence algorithms, those that get the minimum cost from iteration 100 to 300 are moderate convergence algorithms, and the others classified as slow convergence algorithms. As shown in Table 5 and Fig. 13, the proposed algorithm has fast convergence in the majority (16 from 20) of the test problems and moderate in the others. In contrast, other state-of-theart algorithms may slow the test problems' convergence.
Overall results reveal that COVIDOA reaches the minimum best cost, average cost, and standard deviation in most test problems. It also has high exploration and exploitation capabilities and a high convergence speed during iterations.

Conclusion
A novel evolutionary optimization algorithm (COVIDOA) inspired by the replication lifecycle of SARS-CoV-2 is presented. The proposed COVIDOA was tested by solving 20 classical benchmark problems, five CEC benchmark test functions, and five CEC 2011 real-world problems. The proposed COVIDOA is compared with the state-of-the-art nature-inspired optimization algorithms in terms of best cost, average cost, standard deviation, and convergence speed. The proposed algorithm is implemented using MATLAB R2016a software, and the source code of the state-of-the-art algorithms and the benchmark problems are downloaded from the mathworks.com website. The experimental results proved that the proposed algorithm outperforms the state-of-the-art optimization algorithms in most test problems and has very close results to other algorithms in the rest of the test problems. COVIDOA has high exploitation and exploration capabilities and convergence speed compared to other metaheuristics.
Future work may include the implementation of COV-IDOA in solving large-scale problems in different fields.
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi        Data availability The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Ethical approval This article does not contain any studies with human participants performed by any authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.