Introduction

In December 2019, an outbreak of coronavirus disease 2019 (COVID-19) appeared in Wuhan, which was caused by severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) (Li et al. 2020a; Zhou et al. 2020; Zhu et al. 2020). The epidemic spread quickly, with over 51,251,907 cases and 1,270,930 deaths reported globally, and there are more than 216 countries or regions with cases (as of November 11, 2020, WHO). Common symptoms are fever, cough, and myalgia or fatigue (Huang et al. 2020). The proportion of black patients with COVID-19 in the USA is much higher than that of white patients (Price-Haywood et al. 2020). Human to human transmission plays a role in the disease (COVID-19) (Chan et al. 2020). Environmental monitoring of two hospitals in Wuhan reveals the airborne hot spots of SARS-CoV-2 RNA (Liu et al. 2020). And summer does not stop the spread of SARSCoV-2 (Baker et al. 2020).

Although researchers believe that novel coronavirus (SARSCoV-2) is the natural host of bat, the final origin of the virus is not yet fully established (Zhou et al. 2020). The intermediate host of SARSCoV-2 may be pangolins (Lam et al. 2020), cat (Halfmann et al. 2020), ferret (Shi et al. 2020), dog (Sit et al. 2020), and hamster (Sia et al. 2020).

It is important to understand the potential epidemic of SARSCoV-2 by assessing the infectivity of the unrecorded SARSCoV-2 infection. It is estimated that 86% of infections before travel restrictions in China on January 23, 2020 have not been recorded. Seventy-nine percent of recorded infections originate from unrecorded infections, resulting in rapid spread of SARSCoV-2 and difficult to block (Li et al. 2020b). Using the real-time mobility data and cases in Wuhan, the initial population mobility data can well explain the spatial distribution of cases. After the prevention and control measures were taken, the growth rate in most regions became negative, and the control measures greatly alleviated the spread of the disease (Kraemer et al. 2020). The global metapopulation disease transmission model is used to predict the influence of travel restrictions on the domestic and international spread of COVID-19. Measures to close Wuhan slowed the outbreak of COVID-19 in China and gave China 3–5 days to prepare for it. At the international level, it had more radiation effects, and by the middle of February, it had reduced nearly 80% of the imported cases (Chinazzi et al. 2020).

Most empirical models use hypothetical parameters, so they are not suitable for COVID-19 data. Using these models to predict future cases of COVID-19 may not be very accurate. There is still great uncertainty about the amount of new confirmed cases in the future. When the epidemic will be over is an urgent question for the public to know. Artificial intelligence (AI) has been proven of their feasibility in capturing nonlinear relationships. Artificial intelligence has been widely used in many fields (Cassimon et al. 2020; Guo et al. 2020; He et al. 2014), and AI can quickly diagnose COVID-19 (Zhang et al. 2020). To overcome limitations of the epidemiological model, we develop artificial intelligence (AI) for real-time predicting of the new confirmed cases of COVID-19 all over the world. Our goal is to provide better estimation methods to assist medical and government agencies in effective response and timely adjustment in the event of the epidemic. We hope our research results will be able to inform the world of the future trend of the epidemic.

Materials and method

Data collection

Data on the daily new infected confirmed cases of COVID-19 and new confirmed deaths of COVID-19 around the world from January 21, 2020 to November 11, 2020 are obtained from the World Health Organization (WHO). Data of COVID-19 worldwide are divided into training, testing, and predicting sequences. Training data are from January 21 to May 22, 2020, testing data are from May 23 to June 4, 2020, and forecasting data are from June 5 to November 11, 2020.

Artificial intelligence

Artificial intelligence (AI) is used to predict the amount of new infected confirmed cases of COVID-19 and new confirmed deaths of COVID-19. With one hidden layer, the artificial neural network (ANN) that hires a backpropagation algorithm is constructed (Rumelhart et al. 1986). The architecture of ANN contains three layers (input layer, hidden layer, output layer) and a layer contains some nodes. C (1) …C (n) is the data of daily new confirmed cases as the input variable, and C (n + 1) is the new confirmed cases predicted for + 1 day (Fig. 1). Ci is the node input, z expresses the node output, and Wji expresses the weight, where D expresses the node excitation threshold, and v and f express the basic and activation functions, respectively. A node assesses the weighted summation of the inputs as

$$ v=\left({W}_{ji}{C}_i\right)+D $$
(1)
Fig. 1
figure 1

The architecture of ANN predicting the COVID-19 epidemic

The activation function appraises output by

$$ z=f(v) $$
(2)

To determine the relationships between previous days’ cases and the next day’s (tomorrow) variations, the ANN model is used to estimate global patients with COVID-19. The input parameters are the past days’ global patients, and the output variable produces predictions of global patients with COVID-19 for + 1 day. The activation functions employed in the input layer, hidden layer, and output layer are hyperbolic tangent sigmoid function (tansig) and rectified linear unit (ReLU, or poslin). To avoid overfitting and validate the reliability of the developed model, we use 90% of the cases for training and 10% of the cases for testing.

Performance criteria

The performance of ANN is assessed employing three metrics including: root mean square error (RMSE), correlation coefficient (R), and mean absolute error (MAE). The R values are employed to determine the model accuracy, and the root mean square error (RMSE) values are employed to determine the residuals between predictions and actual cases:

$$ \mathrm{RMSE}=\sqrt{\frac{\sum {\left({A}_j-{G}_j\right)}^2}{L}} $$
(3)
$$ R=\frac{\sum \left({A}_j-\overline{A}\right)\left({G}_j-\overline{G}\right)}{\sqrt{\sum {\left({A}_j-\overline{A}\right)}^2{\left({G}_j-\overline{G}\right)}^2}} $$
(4)
$$ \mathrm{MAE}=\frac{1}{L}\sum \left|{A}_j-{G}_j\right| $$
(5)

Aj denotes the actual amount of new confirmed cases of COVID-19, Gj denotes the predicted amount of new confirmed cases of COVID-19, \( \overline{A} \) is the mean of the actual cases, and \( \overline{G} \) is the mean of the predicted cases.

Results

The cycles of the global COVID-19 epidemic

The cycles of new infected confirmed cases of COVID-19 are calculated using wavelet analysis (Fig. 2). The wavelet variance diagram can reflect the distribution of wave energy of time series with the different scales. It can be used to determine the main periods of new infected confirmed cases of COVID-19. There are five obvious peaks in the wavelet variance diagram, which correspond to the time scales of 7 days, 32 days, 71 days, 96 days and 103 days (Fig. 2). These are the cycles of new infected confirmed cases of COVID-19. Among them, the maximum peak value corresponds to the 7 days (time scale), which means that the period oscillation of about 7 days (time scale) is the strongest, which is the first main cycle; the second peak corresponds to the 32 days (time scale), which is the second main cycle; the third, fourth, and fifth peak value correspond to 71 days, 96 days, and 103 days (time scale), respectively, and they last for the third, fourth, and fifth main cycle in turn. This shows that the fluctuation of the above five periods controls its variation characteristics in the whole time domain.

Fig. 2
figure 2

The cycles of new infected confirmed cases of COVID-19

Model development using the ANN

Many studies have shown theoretically that three-layered ANN can describe any nonlinear mapping relation with precision (He et al. 2014). A typical neural network consisting of three layers was applied to forecast the global COVID-19 epidemic. The number of neuron in input and hidden layer is decided by trial and error. Figure 3 shows optimization of network topologies. The performance of various numbers of days in the input and hidden layer is compared (Tables 1 and 2). Tables 1 and 2 show the variables and the R values of the proposed ANN model. Tables 1 and 2 also show simulation of the amount of new infected confirmed cases of COVID-19 during the training and test period. Interestingly, using only the most recent 7 days reproduces the best global patients with COVID-19 simulations. Using more than 7 days entirely confuses the model and produces unexpected cases. The final model included 7 past days’ data. Seven variables were selected for the model input, and the amount of nodes of the hidden layer is similarly 6. Finally, network topologies (7-6-1 for ANN) are better than others.

Figure 3
figure 3

Optimization of network topologies for predicting the COVID-19 epidemic. a RMSE for different nodes in the input layer. b RMSE for different nodes in the hidden layer

Table 1 Variables selected using the proposed ANN model
Table 2 Comparison between different nodes in hidden layer

Training algorithms of the ANN model are also chosen by trial and error. Figure 4 shows optimization of training algorithms for predicting the COVID-19 epidemic. The simulation values are close to the actual values using trainbr in both training period and testing period. trainbr is performed best in predicting the COVID-19 epidemic. Table 3 shows the performances of the training algorithms, revealing that the trainbr algorithm has the best performance in simulating global patients with COVID-19. Table 3 also shows the simulation performance using trainbr for the developed ANN model. The simulation case is very close to the actual case. To avoid overfitting, we conduct a test. The model has similar R values, so there were no overfitting issues with the model. The RMSE value for the ANN model using trainbr for the training dataset is 3859.4, and that for the test dataset is 3102.9. The R for the ANN using trainbr during the training and test period are 0.9948 and 0.9683, respectively. The MAE for the ANN using trainbr are 2303.7 and 2090.6, respectively.

Figure 4
figure 4

Optimization of training algorithms for predicting the COVID-19 epidemic

Table 3 Simulation performance and optimization of training algorithms for the ANN model.

Transfer functions of the ANN model are also chosen by trial and error. Table 4 shows transfer function (tansig-poslin) is better than others during training, testing, and predicting periods. Purelin and poslin is a linear transfer function and positive linear transfer function, respectively. Tansig and logsig is a hyperbolic tangent sigmoid transfer function and logarithmic sigmoid transfer function, respectively.

Table 4 Comparison between various transfer functions

Prediction of the number of infected cases and deaths of COVID-19

In the forecast period, global infected cases and deaths of COVID-19 in the next day are predicted cumulatively using previous days’ predicted infected cases and deaths. Table 5 shows the predicting performance using trainbr for the developed ANN model. For new infected confirmed cases of COVID-19 during the predicting period, the R, RMSE, and MAE are 0.9848, 17,554, and 12,229, respectively. For total infected confirmed cases of COVID-19 during the predicting period, the R, RMSE, and MAE are 0.9999, 453,720, and 324,000, respectively. For new infected deaths of COVID-19 during the predicting period, the R, RMSE, and MAE are 0.8593, 631.8, and 463.7, respectively. For total infected deaths of COVID-19 during the predicting period, the R, RMSE, and MAE are 0.9999, 4861.6, and 4213.4, respectively.

Table 5 COVID-19 prediction of the amount of new infected confirmed cases, total infected confirmed cases, new infected deaths, total infected deaths in 2020

Figure 5 expresses predicted new infected confirmed cases of COVID-19 all over the world. The training and testing cases are from January 20 to June 4, 2020. We began to predict from June 5 to November 11, 2020. In the 5 months, the average amount of predicted new infected confirmed cases of COVID-19 is 271,761 every day in the world, and the average amount of actual infected cases is 271,528. Figure 6 expresses predicted total infected confirmed cases of COVID-19 worldwide. The total infected confirmed cases continue to grow during the predicting period. The actual total amount of cumulative infected confirmed cases is more than 51 million by November 11, 2020, and the forecasting total infected cases are similar to the actual infected cases.

Fig. 5
figure 5

Prediction of the amount of new infected confirmed cases of COVID-19 in 2020

Fig. 6
figure 6

Prediction of the amount of total infected confirmed cases of COVID-19 in 2020

Figure 7 expresses predicted new infected deaths of COVID-19 all over the world. During the predicting period, the average amount of predicted new infected deaths of COVID-19 is 5460 every day in the world, and the average amount of actual infected deaths is 5466. Figure 8 expresses predicted globally total infected deaths of COVID-19. The actual amount of total infected deaths is 1,270,930 by November 11, 2020, and the forecasting total infected deaths are 1,258,700. In summary, the predicting infected deaths are very close to the actual infected deaths.

Fig. 7
figure 7

Prediction of the amount of new infected deaths of COVID-19 in 2020

Fig. 8
figure 8

Prediction of the amount of total infected deaths of COVID-19 in 2020

Table 6 shows the predicting performance of new infected confirmed cases of COVID-19 in 10 countries. During the predicting period, the R, RMSE, and MAE for USA are 0.9696, 5139.1, and 4074.6, respectively; the R, RMSE, and MAE for Russian Federation are 0.9978, 311.9, and 205.3, respectively; the R value of Russian Federation is higher than that of other countries, and the RMSE and MAE value of Russian Federation is smaller than that of other countries. Figure 9 (a-j) expresses predicted new infected confirmed cases of COVID-19 in 10 countries. The second wave of COVID-19 outbreak occurred in 10 countries. The largest number of new cases in a single day is in the USA.

Table 6 COVID-19 prediction of the amount of new infected confirmed cases in 10 countries in 2020
Fig. 9
figure 9

Prediction of the amount of new infected confirmed cases of COVID-19 in 10 countries in 2020

Compared with existing COVID prediction models

Different models for predicting COVID-19 infections are evaluated. The R, RMSE, and MAE for new confirmed cases of COVID-19 using our model are 0.9848, 17,554, and 12,229, respectively. The R2, MSE, and MAPE, for COVID-19 infections using W-LSTM, are 0.93, 3.13E+05, and 39.29, respectively (Tuli et al. 2020b). In the LSTM model, they trained and tested LSTM network on Canadian COVID-19 infections dataset; the RMSE error is 34.83 with an accuracy of 93.4% for short-term predictions in Canada. Meanwhile, based on the testing/validation COVID-19 infections dataset, the RMSE error is about 45.70 with an accuracy of 92.67% for long-term predictions (Chimmula and Zhang 2020). The LSTM model predicted that new infections will peak on February 4, resulting in 95,000 cases by the end of April in mainland China. Both the SEIR and LSTM-model predicted a peak of 4000 daily infection between February 4 and 7. The SEIR model also predicted several smaller peaks of new infections in mid to late February. The actual number of cumulative confirmed cases in mainland China on April 30, 2020 is 82,874 (Yang et al. 2020). A set of models using 9 different machine learning algorithms for predicting the rise in new cases, having an average accuracy of 87.9 ± 3.9%, was developed for 10 high population and high-density countries. The highest accuracy of 99.93% was achieved for Ethiopia using ARMA averaged over the next 5 days (Khakharia et al. 2020). Not every machine learning algorithm could give a very high accuracy for predicting the cases for each country. Machine learning models based on decision tree, random forest, logistic regression, and support vector machines are developed and show accuracies between 76.2 and 92.9% to predict early signs of infection containment (Kasilingam et al. 2020). Although our model has better performance when compared with other predicting models, it is unfortunate that cumulative confirmed cases are following increasing trend.

Many machine learning methods or artificial intelligence techniques have been employed to forecast the number of confirmed cases of COVID-19. However, there are lots of challenges for the accurate prediction by machine learning methods. It is difficult to cultivate accurate machine learning models with small datasets. Deep learning methods are successful because of big training data which is not available for prediction task of COVID-19 confirmed cases. It is difficult to select suitable architectures and parameters for deep learning neural networks with small datasets. It is disputed that lots of countries are not doing enough testing. Therefore, it is impossible to have correct confirmed cases in these countries. Using poor-quality datasets to train machine learning algorithms will lead to wrong conclusions (Ahmad et al. 2020).

Discussion

We predict the amount of the infected confirmed cases and deaths of COVID-19 for the next day using artificial intelligence. The accuracies of simulation of the infected confirmed cases and deaths of COVID-19 are high. So, the ANN can be used for simulations. The results show that using trainbr algorithm has the best performance. The 7 input variables are selected to create an ANN model. We also can use the ANN with known parameters to predict the number of the virus epidemic outbreaks in the future.

The lowest RMSE is accomplished by using previous 7 days data in the training and trest stage. If more input parameters are used, the error gets higher until seven input parameters for + 1 day prediction. The RMSE goes up slightly as more input parameters are used in the prediction of + 1 day cases of COVID-19.

We provide a simple AI model for policy makers and researchers to understand the infected confirmed cases and deaths of the COVID-19 in the next 3 months based on specific estimates of global past historical data. The actual data of the confirmed case of COVID-19 in progress is well matched with that of AI, which strongly shows that it is suitable for simulating the epidemic caused by SARS-CoV-2. These results help authorities to control the COVID-19 epidemic.

Without any measures, and the relative risk of infection is 1.5, 2.0, and 3.0, in the next year, the COVID-19 death toll will be 146,996, 293,991, and 587,982, respectively (Banerjee et al. 2020). In the baseline scenario, the basic regeneration number of COVID-19 was 2.68, and the model predicted that the number of people infected in Wuhan was 75,815 as of January 25, 2020. If the transmission characteristics of COVID-19 do not change significantly, the outbreak period of other major cities in China will be 1–2 weeks later than that of Wuhan (Wu et al. 2020). Using the reported data from January 11 to February 10, 2020 to calibrate the susceptible infected recovered dead model, and predict the evolution of the epidemic in Hubei. As of February 29, it is predicted that at least 45,000 people will be infected, and 2700 people will die in Hubei. In fact, about 67,000 people have been infected, and 2800 people have died in this period (Anastassopoulou et al. 2020). The suspension of urban public transport, the closure of places of entertainment, and the prohibition of public gatherings are related to the reduction of COVID-19 cases. Without the Wuhan travel ban and China’s emergency response, more than 70,000 people will be infected with the virus outside Wuhan by February 19, 2020. China’s prevention and control measures seem to have succeeded in breaking the transmission chain and preventing contact between infectious and susceptible people (Tian et al. 2020). The three major non-drug interventions used in China not only contain the development of the epidemic in China but also win a time window for the world. If we do not implement a strong non-drug intervention “combination boxing,” the amount of COVID-19 cases in China may exceed 7 million (Lai et al. 2020). Spatial temporal “risk source” model predicts confirmed cases and identifies high-risk areas in the early stage by using population mobility data (Jia et al. 2020). If the UK government does nothing, it could face more than 500,000 deaths. Without intervention, the USA could face 2.2 million deaths (Adam 2020). The AI app allows individuals to report symptoms themselves to effectively predict whether they have COVID-19, with an accuracy of nearly 80% (Menni et al. 2020). If the average growth rates 30.6% in the USA during the past 14 days, we will be looking at 3.9 million cases by April 12, 2020 (Perc et al. 2020). After the epidemic, COVID-19 will break out again in winter 2020. The USA may still need long-term or intermittent social alienation interventions by 2022. SARS-CoV-2 should continue to be monitored, as the new outbreak is likely to occur again later in 2025 (Kissler et al. 2020). The 3 biomarkers of COVID-19 are used to predict the mortality rate of COVID-19 patients at least 10 days in advance, and the accuracy rate is over 90% (Yan et al. 2020).

The ANN model with EEMD-based decomposition technique for predicting COVID-19 epidemic is developed. The training and testing period property of the ANN model obtained by R2 values 0.9997 and 0.99982, respectively. And the R2 of validation is 0.99981 (Hasan 2020). Cloud computing and machine learning (ML) is deployed to forecast COVID-19 epidemic. The results show that the severity of the global spread of COVID-19 (Tuli et al. 2020a).

The outbreak of COVID-19 has seriously affected the environment, ecology, economy, and society (Kluge et al. 2020). COVID-19 is a major menace to the world economy. In March 2020, the outbreak of the epidemic caused a huge earthquake in the US stock market, triggering the circuit breaker mechanism four times a month. At present, there is an urgent need to know what the future transmission trend of COVID-19 might be.

Conclusions and Implication

We examined the feasibility of using AI with past days’ cases as input variables to predict global infected confirmed cases and deaths of COVID-19. The performance of the ANN model is assessed employing three statistical criteria. A simple ANN model with 7 past days’ global patients as input variables is identified. The acceptable predicting capability of the ANN is also verified. In view of the global epidemic of COVID-19, we need to further strengthen the research on the prediction model.

At present, the risk of the world economic recession mainly comes from the spread of the epidemic, and the recovery process depends on when the epidemic is contained in the world. Special drugs of COVID-19 are still not successfully developed in the current situation. It is urgent to launch joint action to fight the epidemic. China’s experience will undoubtedly provide effective help for the global fight against the epidemic.

We will predict the epidemics trend of COVID-19 for different countries using deep learning approaches, such as the recurrent neural network (RNN), the gated recurrent unit (GRU), and the long short-term memory (LSTM) and compare how their model performs in diverse demographics. We are planning to get a single standard model that can be used for any country, which may be a combination of different algorithms.

People should gather less. People should avoid the places where people gather, especially the places with poor air mobility, and reduce unnecessary going out. If going out, personal protection and frequent hand washing should be done. In densely populated public places, people try to keep a certain social contact distance with others, and it is recommended to wear medical surgical masks. Only by adhering to the concept of human community, following the trend, responding to the times, avoiding the pitfalls of protectionism rationally, strengthening international joint defense and joint control, coordinating macroeconomic policies, and strengthening global supply chain cooperation, can the international community work together to promote the early recovery of the world economy.