Introduction

COVID-19 was first discovered in Wuhan, China in December 2019. The World Health Organization (WHO) later declared the new emerging disease as a pandemic (Huang et al. 2020). Recent studies reported that COVID-19 is transmitted among humans by droplet infection or direct contact (Lai et al., 2020a).

The WHO has specified that the main human-to-human transmission mechanism varies, but still can be generalized as direct contact with an infected person through shaking hands, exposure to droplets coming out during coughing or sneezing, and by traveling to an affected area and attaining the virus in one or other way. The core symptoms of COVID-19 highly vary, ranging from being severely affected to being asymptomatic and the infected individuals can experience from mild to very severe respiratory illnesses. High fever, cough, sore throat and muscular pain were the primary symptoms in most of the symptomatic cases. However, severe cases suffer from pneumonia, micro-coagulopathies, and septic shock. Rapid clinical deterioration of the cases can lead to death (Qiu et al. 2020; Wu et al. 2020). Mostly, old-aged people and those who have pre-existing medical conditions e.g., diabetes mellitus, chronic respiratory disease, or cancer are more likely to experience manifestations and consequences of COVID-19 infection World Health Organization (WHO) (2020).

As of February 18th, 2021, a total of 110,533,973 confirmed cases of COVID-19 were reported and its death toll reached about 2,443,091 (Worldometer 2020). However, the available information about COVID-19 is being built up and its nature and characteristics are being discovered especially, its very quick ability to change its nature evolving new variants based on its accelerated genetic mutations. Therefore, thoroughgoing observational studies are being performed to establish facts about COVID-19 to find out treatment or a vaccine that may help in ending its pandemic (Yang et al. 2020).

Many research studies on COVID-19 are published and others are on the lane, and floods of huge data about it are constantly accumulating, without reaching a strong prediction about the transmission and end of the pandemic (Yang et al. 2020). In our current study, we deployed machine learning approaches for predicting the spread of the virus in several selected countries. Yet, the same approach can be applied for predicting the spread of COVID-19 infection in any other country, since the nature of the virus is nearly the same everywhere.

This study has the following major contributions:

  • It presents the machine learning model as a method for predicting the transmission of COVID-19 pandemic in an easily understandable way using statistical visualization graphs e.g., normal distribution.

  • It determines the predictive value of the technique with quality and density of collected data of WHO.

  • It provides the governments and health authorities with the required information that helps in planning and decision-making. The resulting predictions will reduce the population’s anxiety and prepares their mentality for accepting and dealing with the next phases of the pandemic.

The current paper’s organization is: Related work is presented in the “Related work” section. Methodology and proposed approaches are presented in the “Methodology” section. Then the experimental observations and the discussion are presented in “Experimental evaluation” and “Discussion” sections, respectively. The paper ends up with conclusions and future work in the “Conclusion” section.

Related work

Machine learning and Artificial Intelligence (AI) models are essentially used to improve the prediction accuracy of diagnosis and the screening of non-infectious diseases. Moreover, machine learning approaches are also widely used in the analysis and prediction of COVID-19 survival rate, the discharge time of patients based on clinical data. Lai et al. (2020a, b, c) considered the scourge idea of COVID-19 in regard to the every day aggregate list, death rate, and cooperative status of the countries’ healthcare and economy. Punn et al. (2020) have proposed the utilization of machine learning and deep learning models to understand the COVID-19 pandemic based on the data taken from the Johns Hopkins dashboard.

Dandekar and Barbastathis (2020) recommended a mixture model that comprises of first-standards epidemiological conditions and an information-driven neural organization to gauge the stopping of the transmission of the COVID-19 infection. They used a neural network model to predict for four locations namely Wuhan city, Italy, South Korea, and the USA. Finally, for the USA, they predicted the currently infected growth curve and predicted a halting of infection by April 20th, 2020.

The WHO rules for the anticipation of the COVID-19 infection showed that it enters the human body through the eyes, nose, or mouth. Along these lines, it gave a few prescribed insurances on the whole settings to avoid getting the infection, for example, trying not to contact the face with unwashed hands, washing hands with soap and water for at least 20 s, or cleaning hands completely with gels, or tissues. It likewise suggested physical distancing of at least one and a half meter or even working from home can diminish the danger of contamination World Health Organization (WHO) (2020).

Former studies developed methods to achieve accurate and time-efficient predictions of the transmission of COVID-19. However, these studies lack some promising features that are mainly related to their low accurate predictive results and lacking the promising features that enable the prediction of the highest possible accuracy of the confirmed cases with COVID-19.

Methodology

In the following subsections, we described the measures used to slow the spread of disease as in “Measures used to slow the spread of COVID-19” subsection,s datasets used to validate the proposed method in “Dataset description” subsection and the description of our proposed method is presented in details in “The predictive machine learning model” subsection.

Measures used to slow the spread of COVID-19

The COVID-19 pandemic is rapidly spreading all over the world, while there is no clear picture of how and why the virus is spreading among the people and involves more countries. The number of infected cases is doubling and the healthcare systems are suffering even in the developed countries rather than the developing ones. To date, it is clear that about 85–90% of cases pass without the need for hospitalization; however, about 10% of the COVID-19 patients require hospitalization and intensive care services. Many countries are trying their best to avoid worsening the situation by “flattening the curve” of the pandemic by preventing and delaying the spread of the virus to keep a large portion of the population not infected at the same time. Therefore, many countries imposed lockdown measures to contain the spread of the COVID-19. WHO called all countries to implement comprehensive precautions and apply preventive measures aiming at slowing down transmission and flattening the curve for saving lives and buying time till the development of effective vaccines and/or specific treatments.

Dataset description

The data used in this study were collected from official data repositories such as Johns Hopkins University, WHO and Worldometer official website. These data shows the daily total COVID-19 confirmed positive cases, daily and total deaths, and the total and daily recoveries. Table 1 shows a sample of the highest and lowest countries arranged in descending order by the number of confirmed cases. The table depicts the time-series summary for confirmed and recovered cases as well as the deaths of COVID-19 from twelve countries namely the United States of America (USA), India, Brazil, Russia, United Kingdom (UK), France, Italy, Turkey, Spain, Germany, Saudi Arabia, and Vanuatu.

According to the WHO, the first corona-virus that was detected in the Chinese city of Wuhan in December 2019 has infected more than 110,533,973 people in at least 210 countries and territories globally. Of those infected individuals, more than 2,443,091 people died. China was the first country that has more than 89,806 reported infections. The Chinese government completely locked down major cities, restricted the movements of millions, and suspended business operations for a period of time in order to contain the COVID-19 pandemic. As for the time of preparing this study, things are getting worse, and the disease is spreading rapidly around the world, with countries like Spain, Italy, France, Germany, and the UK reporting more than 2,071,615 cases each. Other countries like Saudi Arabia and South Africa have also seen a recent spike beyond 365,325 cases, while most world countries (ex. Vanuatu) have less than 100 confirmed cases as shown in Table 1.

Table 1 A sample of the highest and lowest countries arranged in descending order

The predictive machine learning model

This study is mainly developed on a decision tree algorithm on the COVID-19 global real-time data. The core idea is to utilize supervised machine learning algorithms for time-series forecasting. The algorithms proposed for this work namely: decision tree algorithm and linear regression, are powerful models in predicting sequence and time-series data-related problems.

Experimental evaluation

In this section, we present and discuss the experimental results of the proposed method. The experimental results are presented visually and tabular. Moreover, a comparison with results from other previous epidemics will be discussed.

Experimental data

Currently, it is feasible to predict for how long the outbreak of COVID-19 will last and how the epidemic will unfold. This is because of the new features exhibited by COVID-19 and a lot of uncertainties remain problematic. Some domain experts remain optimistic that the transmission will gradually decrease during the northern hemisphere summer, as they consider that COVID-19 will be like the epidemics of seasonal influenza. With the help of machine learning, we developed a predictive model using the available data of COVID-19 found in famous data repository websites.

The WHO uses empirical values to show the rate of confirmed cases, mortality rate, recovery rate, and growth rate. Equation (1) was developed to compute the rate of confirmed change and the mortality rate is computed based on Eq. (2). The recovery rate of patients is computed based on Eq. (3), and Eq. (4) is used to calculate the growth rate of the pandemic.

$$ \mathrm{CC}\left[\mathrm{i}\right]=\mathrm{Confirmed}\left[\mathrm{i}\right]-\mathrm{Confirmed}\ \left[\mathrm{i}-1\right] $$
(1)
$$ \mathrm{Mortality}\ \mathrm{Rate}\left[\mathrm{i}\right]=\mathrm{Deaths}\left[\mathrm{i}\right]/\mathrm{Confirmed}\left[\mathrm{i}\right] $$
(2)
$$ \mathrm{Recovery}\ \mathrm{Rate}\left[\mathrm{i}\right]=\mathrm{Recoveries}\left[\mathrm{i}\right]/\mathrm{Confirmed}\left[\mathrm{i}\right] $$
(3)
$$ \mathrm{GR}\left[\mathrm{i}\right]=\mathrm{Growth}\ \mathrm{Rate}\left[\mathrm{i}\right]=\mathrm{CC}\left[\mathrm{i}\right]/\mathrm{Confirmed}\left[\mathrm{i}\right] $$
(4)

Nowadays, the USA has the majority of confirmed cases, with over 28 million infections as of February 18, 2021. Table 2 presents sample data recorded from January 22, 2020 to January 19, 2021 for the USA that shows only a single patient was detected on the first date and alarmingly increased to 24,246,830 on January 19, 2021. Likewise, the recovery, death, confirmed changes, mortality rates, recovery rate, and growth rate of the USA are described.

Table 2 A sample of confirmed, recoveries, death, confirmed changes, mortality rates, recovery rate, and growth rate in the USA

Table 3 presents sample data recorded globally from January 22 (2020-01-22) to January 19 (2021-01-19). It evidently presents that the spread of COVID-19 grows alarmingly from 540 confirmed cases in January 22, 2020 to 95,390,046 on January 19, 2021. The number of recovered people on 22 January was limited to 28 and increased to 52,370,571 on January 19, 2021. The number of dead people by the COVID-19 on January 22 was 17 and alarmingly increased to 2,037,575 on January 19, 2021.

Table 3 A sample of confirmed, recoveries, death, confirmed changes, mortality rates, recovery rate, and growth rate worldwide

Almost every country and union territory have declared a lock-down time to prevent the outbreak of COVID-19. Figure 1 shows the lock-down day for mainland China and the USA. As it is presented in the same Figure, China had an effective lock-down following the outbreak of COVID-19. China has declared to put Wuhan City, the center of the outbreak, on lock-down on January 23, 2020. Before the lock-down time, the growth rate of the pandemic was 0.361 and decreased to 0.020 after the lock-down and hence China is considered as a model for the lock-down as the spread of the virus is getting flattened over time. Although it is not as effective as mainland China, the US growth rate of the COVID-19 has declined after the lock-down. The growth rate for the USA was 0.277 before the lock-down and declined to 0.176 after the lock-down. Figure 2 shows the global rates for confirmed, recoveries and deaths.

Fig. 1
figure 1

Lock-down days for the USA and mainland China

Fig. 2
figure 2

Worldwide rates

Experimental results

The proposed method has forecasted the possible confirmed cases for the upcoming 7 days for the USA. Experimental results showed that the confirmed cases are exponentially increasing from a few hundreds of thousands to nearly two and a half million. Our observation at this particular point is that the prediction is not as optimal as we have used few numbers of records in our deep learning model that is a challenging problem to train deep learning models using few datasets. Figure 3 depicts the forecasting of confirmed cases for the globe. Similarly, the confirmed cases of the pandemic are forecasted as seen in Fig. 4 that indicates the predicted values are close to the test values, while Figs. 5 and 6 present the forecasting of deaths for the globe. To validate the performance of the proposed method, we used root mean square error on each of the three attributes namely confirmed cases, recoveries and death. Table 4 shows the final prediction of the proposed model for all attributes. Table 5 shows the root mean square error of the experimental results of the proposed model for the specified attributes. This table shows that the overall R2 is 0.99 from the perspective of confirmed cases, and R2 values for deaths, recoveries are 0.99 and 0.99, respectively.

Fig. 3
figure 3

Comparison between train, test and predicted for the confirmed cases globally

Fig. 4
figure 4

The forecasted data of confirmed cases for the globe

Fig. 5
figure 5

Comparison between train, test and predicted data for global deaths

Fig. 6
figure 6

Comparison between test and predicted data for global deaths

Table 4 The experimental results for the expected values of the different attributes
Table 5 The performance of the decision tree model for the different attributes

Comparison with state-of-the-art methods

As shown in Table 6, we performed a comparative study using the most up-to-date methods. A proposed model is compared with various state-of-the-art models (random forest, ARIMA, and deep learning) and the accuracy of the machine learning models on the training dataset is evaluated using root mean square error (RMSE) and mean absolute error (MAE). Substantial results are obtained when comparing the proposed decision tree model’s experimental results to those of the leading state-of-the-art approaches. Due to the good performance of the decision tree model, it may be extended and used to forecast other countries.

Table 6 Comparison between the proposed model and latest state-of-the-art techniques

Comparison with other epidemics

Table 7 presents the most known viruses in the past 20 years such as severe acute respiratory syndrome (SARS) in 2002–2003 (Hu et al. 2017), H1N1 influenza in 2009–2010 (Lathouwers et al. 2017). Middle East respiratory syndrome (MERS) coronaviruses in 2012–2017 (Chu et al. 2019), Ebola in 2013–2016 (Ebola 2020), and COVID-19 in 2019–2020 Lai et al. (2020a, b, c). Unlike other diseases, COVID-19 is still spreading worldwide. The rate of spread for COVID-19 is still lower than the most known pandemics. Moreover, this virus has infected more people than recent outbreaks such as SARS or Ebola, and it does not hit the scale of the most massive modern pandemics such as H1N1 or the seasonal flu. Every year, the seasonal flu infects millions of people, and it is not life-threatening for most people who have to go infected. In contrast, the total reported cases of Ebola are less than 30,000, but it was treated as a crisis because the big number of sick people are dead. Currently, COVID-19 is deadly than the normal flu, but its mortality rate of 6.87% is lesser compared to the mortality rates of other outbreaks such as MERS or Ebola which recorded 34.40 and 39.53%, respectively.

Table 7 Comparison with other epidemics (CIDRAP 2020; Healthline 2020; Kelly-Cirino et al. 2019; Helmy et al. 2020; Organization WH 2020; Sohrabi et al. 2020; Yosra et al. 2020)

Almost, many common key symptoms exist in all pandemics such as cough, fever, and shortness of breath. Moreover, people of all ages are prone to infection COVID-19 and the other pandemics are deadliest among older patients with the weaker immune system. The mortality rate multiplied rapidly as patients got older, to a high percentage among patients over 65. In comparison to SARS, Ebola, and MERS coronaviruses, which were identified in the past 20 years, COVID-19 is likely more highly transmissible but not as deadly, the researchers noted. SARS had a mortality rate of 9.6%, MERS has a rate of 34.4%, and Ebola has a rate of 39.53%. Unlike SARS and MERS, hospital-based outbreaks do not seem to be a hallmark of COVID-19 at this time.

Discussion

The COVID-19 pandemic has become the biggest threat to human beings in many aspects such as health-wise, financial markets and economic crisis. Major financial institutions and banks have stopped forecasting the global economy, with the organization for economic cooperation and development being one of the latest to do so. Fear of the COVID-19 has negatively affected the global economy; mainly the markets are badly hit, worldwide, with stock prices and bond fall steeply. Even though the global economy was expected to grow by 2.9% in 2020, recent economic prediction forecasts only 2.4%. For instance, the manufacturing sector in China has been hit massively by COVID-19 pandemic. Such a slowdown in Chinese manufacturing activity has hurt countries with close economic links with China; many of those countries are from Asia Pacific economic corridors such as Vietnam, Singapore, and South Korea. The good news from China is, factories have resumed operations. In summary, due to the outbreak of the COVID-19 pandemic, close to 1.6 billion children worldwide are absent from school, many business sectors have lost their customers in the USA such as restaurants and aviation. Therefore, the next “Estimation of slowdown COVID-19” section will discuss when COVID-19 is going to be over.

Estimation of slowdown COVID-19

Many countries around the world implemented an effective shutdown in order to contain the fast-spreading COVID-19. Restrictions on daily life for millions of people, such as school closures, large-scale social distancing, and bans on public gatherings, have been put in place. Because it was not easy to know exactly when a vaccine for COVID-19 becomes available, these protective measures will be extended for the next few months. However, health experts are much more cautious. Lifting lock-down restrictions in order to alleviate the economic and social harm that results from long-term lock-down could open the door to future waves of the COVID-19 pandemic.

In our assessment, we have developed a predictive model that can forecast the time period that the COVID-19 can be suppressed. The proposed method depicts the possible stoppage of the pandemic using the normal distribution. It specifically presents the statistical estimation of the slow down period of the pandemic which is extracted based on the concept of normal distribution. The following equations explain how to calculate the area under the curve between μ + 2σ and μ + 3σ. Therefore, we selected the period that the virus can stop between μ + 2σ and μ + 3σ.

$$ {\displaystyle \begin{array}{c}p\left(\mu +2\sigma <X<\mu +3\sigma \right)=p\left(\frac{\mu +2\sigma -\mu }{\sigma }<z<\frac{\mu +3\sigma -\mu }{\sigma}\right)\\ {}=p\left(\frac{2\sigma }{\sigma }<z<\frac{3\sigma }{\sigma}\right)=p\left(2<z<3\right)=2.1\%\end{array}} $$

Table 8 presents the possible period that the virus can slow down from being infectious in the top countries. Table 8 shows the prediction of the deadline for India and the results show that the predicted number of confirmed cases will be 548,318 on August 05, 2021, and after three months, that is, on November 15, 2021, the number of confirmed cases will remain 156. As can be seen from Fig. 7, experimental results of the predictive model, the USA will have 2,379,799 confirmed cases on August 17, 2021, and three months (on November 30, 2021), the expected number of positive cases in 1147 patients.

Table 8 Expected deadline for the selected countries
Fig. 7
figure 7

Expected deadline for US COVID-19

Conclusion

A machine learning model has been developed to predict the estimation of the spread of the COVID-19 infection in many countries and the expected period after which the virus can be stopped. Globally, our results forecasted that the COVID-19 infections will greatly decline during the first week of September 2021 when it will be going to an end shortly afterward. Moreover, we can apply our proposed model to other countries that are affected by the COVID-19. Additionally, our model could also evaluate the effect of the public health guidelines, infection control, and lock-down decisions that were taken to stop the COVID-19 pandemic. Future work could focus on applying a deep learning model by using big data as training data. Moreover, our proposed model can apply to specific countries.