Diseases and bacteria or viruses which cause them often have different names. The "human immunodeficiency virus," HIV, for example, induces the "acquired immunodeficiency disease," AIDS. The virus that triggers the current outbreak is called coronavirus 2, a serious acute respiratory syndrome shortened to SARS-CoV-2. The illness, shortened to COVID-19, is called coronavirus disease. The World Health Organization, and the International Committee on Taxonomy of Viruses (ICTV) [1], gave these names. In public speaking, the WHO also refers to the virus as "the virus accountable for COVID-19," or "the COVID-19 virus." The outbreak was first reported in Wuhan city, China. Wuhan is the capital of the Hubei Province and has a population of around 11 million. Chinese authorities reported a cluster of related pneumonia cases in the town on 29 December 2019. A novel coronavirus which was later called SARS-CoV-2 soon confirmed to cause these cases [2,3,4,5,6,7]. The first COVID-19 cases outside of China were found in Thailand on January 13 and in Japan on January 16 [8]. The Chinese Government put the city of Wuhan and other cities in the area on lockdown on January 23. COVID-19 has since spread to several more countries-cases have been recorded in all regions of the world. It grew into a global pandemic by March 2020, and was announced by the WHO as such [9,10,11]. While people sometimes refer to the virus that causes COVID-19 as "the coronavirus," several different coronaviruses do exist. The word refers to a group of viruses specific to humans: coronaviruses cause about 30 percent of all cold cases [12]. Corona is Latin for "crown"—this group of viruses is named because, under an electron microscope, its surface looks like a crown. As the outbreak of the novel SARS‐CoV‐2 is increasingly spreading in China and beyond, threatening to become a global pandemic, epidemiological data need to be interpreted in such a way that the model of statistical data analysis and visualization can increase the understanding of situation among the mass population in the coming days [13, 14].

The World Health Organization (WHO), Johns Hopkins University researchers, and other agencies all maintain dataset on the number of confirmed infected cases, deaths, and disease recoveries. All data obtained in this research work is from Johns Hopkins University and is freely accessible via the GitHub repository. The dataset covered the period from 22 January 2020 to 17 April 2020 which includes time-series and aggregated data [15].

We statistically analyzed our dataset with various methods of data analysis and visualized those data to provide a proper understanding of the COVID‐19 outbreak worldwide. Our exploit analysis was carried out by Johns Hopkins University with the 2019 coronavirus dataset (January–April 2020). Here, between 22 January 2020 and 17 April 2020, we present an effort to visualize and analyze the results. COVID‐19 has so far propagated nearly 185 Countries/Regions, 83 Cities/Provinces have been registered, and 264 separate geographical locations combined. Using time-series data, it estimated the number of individual cases, such as confirmed infected, deaths and recovered around the globe and the top 10 countries in the world. As of 17 April 2020, the United States and Spain are among the top ten countries in the world. Further to the discussion on different cases, such as confirmed illnesses, deaths and recovery in those countries as seen in the Fig. 1.

Fig. 1
figure 1

Display the number of confirmed infected, deaths, recovered and active cases of Globe, USA and Spain from 22 January 2020 to 17 April 2020

Worldwide the total confirmed infected cases are 2,152,646, and the global average rate is 0.38, with a standard deviation of 2.15. The global average rate of the Top 10 countries is 7.73, with a standard deviation of 8.49. In this circumstance, the US ranked first with a total of 667,801, the global percentage is 31.02, and with a total of 184,948, the global percentage of 8.59, Spain is second. The estimated number of deaths worldwide is 143,800, with a global average of 3.61. For this situation, the US occupied the first place with 32,916 counts, and with 22,170 counts, Italy was second in the top 10 countries around the world. The total number of cases recovered is 542,107 in the world. In this scenario, Germany ranked first, with a total of 77,000, with a total of 74,797, Spain ranked second, and the US ranked fourth, with a total of 54,703, in the top 10 countries of the world.

From a statistical data analysis, it can be understood that 5% of deaths and 8% of recoveries occurred in reported cases in the United States. In Spain, 10% of deaths and 40% of recoveries occurred in confirmed cases.

We also explore time-series data using visual data analysis to provide a clear and understandable outcome of this extreme outbreak of COVID‐19. This segment will analyze various time-series data using several visual data analysis approaches with the R programming language. We have created a graph and given awareness of how SARS‐CoV‐2 spread around the globe from 22 January 2020 to 17 April 2020; it allows individuals to grasp the epidemiological essence of COVID‐19. Figure 1 indicates that the confirmed infected cases have been crossed by 2,000,000 cases around the globe. Many cases, such as death, recovery and active, have also been shown. New cases reported on a single day do not actually represent new cases on that day, as the number of confirmed infected cases or deaths announced by any organization—including WHO, ECDC, Johns Hopkins University and others—does not reflect the total number of new cases or deaths on that day. This is due to the long chain of reporting that occurs between a new case or death and its inclusion in national or international statistics.

Regression and generalized linear models (GLM) of data from the COVID-19 time series are used to analyze confirmed infected, deaths and recovered cases. The fitted models have yielded better statistical results; the findings shown below represent all three cases in the USA. From the models results obtained, on the confirmed case, the exponential model coefficients are − 0.807 and 0.17, the GLM Poisson model coefficients are 3.469 and 0.119, and the GLM Gamma model coefficients are − 0.433 and 0.17, both of which are statistically significant, as shown in Fig. 2. In case of death, the exponential model coefficients are − 2.774 and 0.144, the GLM Poisson model coefficients are − 2.424 and 0.151, both of which are statistically reasonable. In the recovered case, the exponential model coefficients are − 2.204 and 0.137, the GLM Poisson model coefficients are − 2.864 and 0.163, both of which are statistically significant. From the findings, it can be understood that all cases, such as confirmed infected, deaths and recovered, are linearly increased, the same thing is reflected in the upper part of the graph, i.e. the output of linear and generalized linear models. Regression and generalized linear models of the COVID-19 time series were also used to analyze confirmed infected, death and recovered cases in Spain. The fitting models have provided better statistical results; the findings shown below reflect all three cases in Spain. Here, too, the count has risen linearly. In the confirmed case, the exponential model coefficients are − 2.278 and 0.185, the GLM Poisson model coefficients are 4.159 and 0.093, both of which are statistically significant. In case of death, the exponential model coefficients are − 2.919 and 0.152, the GLM Poisson model coefficients are 1.329 and 0.104, both of which are statistically fine. In the recovered case, the exponential model coefficients are − 2.876 and 0.165, the GLM Poisson model coefficients are 0.914 and 0.124, both of which are statistically appropriate.

Fig. 2
figure 2

Curve fitting and distribution of linear regression, and GLM with “Poisson” and “Gamma” function models with parameters of number of days and number of confirmed infections in the US between 22 January 2020 and 17 April 2020

In the event of an outbreak of an infectious disease, it is necessary not only to track the number of deaths, but also the rate of increase in the number of deaths. If there is a fixed number of deaths over a fixed period of time, we call that "linear" growth. But if they continue to double within a fixed time span, we call it "exponential" growth. Based on the results, looking at the rate of death growth, we have understood that it is linear growth in the US and Spain. Figure 3 indicates that changes every day occurred in confirmed cases between 22 January 2020 and 17 April 2020 from the USA and Spain. By this we will conclude that the reported cases will accelerate on 20 March 2020 and that the last day of change is 31,451 in the US. In Spain, the confirmed case rises linearly from 03 March 2020 to 15 April 2020, the last day of change is 7,304. It is clear that the real-time analysis of these data is extremely useful in documenting the epidemiological behavior of this severe disease. We believe that this method of data analysis will certainly increase understanding of the situation and inform behavior.

Fig. 3
figure 3

Showing daily changes in confirmed cases of infection in the United States and Spain from 22 January 2020 to 17 April 2020

This study examined three separate categories of data, including confirmed infected, death and recovered cases across the globe, for the period from 22 January to 17 April 2020. It will also include a comparative overview of all the cases reported in the United States and Spain. Nevertheless, we are discussing various cases internationally in order to explain the various cases identified over a particular time span. After review, 2,152,646 confirmed cases of COVID‐19 occurred worldwide on 17 April 2020. In the US, where the highest count is 667,801, the global percentage is 31.02. Death cases were 143,800 across the globe (6.68%), with the US top count being 32,916 (4.93%). The cases recovered were 542,107 around the globe (25.18%) with Germany at the top of the list with a total of 77,000 cases. The visual analysis of the growth rate of confirmed infected, deaths and recovered cases between the US and Spain is another investigation.

The goal of this article on COVID-19 is to summarize existing research, collect relevant data and make it possible for readers to make sense of the published data and early research on the coronavirus outbreak. Much of our work focuses on known problems for which we can link with well-established research and evidence on COVID-19. The research presented here is based on statistical and visual data analysis methods with the aid of a dataset provided by John Hopkins University. The research was done with R Studio 1.2.5033 and R 4.0 beta versions of the Windows 10 operating system. Each and every description of the different cases of COVID‐19 is documented here between 22 January 2020 and 17 April 2020. We are now also observing the harmful outbreak of the SARS‐CoV‐2 virus. To the world, this is extremely troubling. In this analysis, we examined the top 10 countries most affected and comprehensive reported cases of the United States and Spain.

In conclusion, the dataset COVID-19 (2019-nCoV) from the Johns Hopkins CSSE data repository (22 January 2020 to 17 April 2020) was used for our experiment. It has supported us to generate and disseminate detailed information to the scientific community and to the public, especially at the peak phase, in order to understand the growth and impact of the novel coronavirus. Nevertheless, knowledge of this novel SARS‐CoV‐2 virus remains minimal among the general population around the globe. Raw data published from different sources are not adequately capable of offering an insightful understanding of COVID‐19 as a consequence of SARS‐CoV‐2. A user-friendly data analysis platform would also be more effective in recognizing the epidemic of this severe disease. The informative graphics of the visualization platform provide an intuitive interface and a simple view of all raw data. Hopefully, in the coming days, we will continue to track the epidemiological data of this outbreak that we have used in this study and from other official sources.