Machine-learning method for analyzing and predicting the number of hospitalizations of children during the fourth wave of the COVID-19 pandemic in the Lviv region

The purpose of this paper is to develop a machine-learning model for analyzing and predicting the number of hospitalizations of children in the Lviv region during the fourth wave of the COVID-19 pandemic. This wave is characterized by dominance of a new strain of the virus—Omicron—that spreads faster than previous ones and often affects children. Their high sociability and a low level of vaccination in Ukraine resulted in a sharp increase in the number of hospitalizations. The complexity of the research is also related to the geolocation of the Lviv region. This article analyzes and predicts the number of hospitalizations of children during the fourth wave of the COVID-19 pandemic for the first time for the Lviv region. Data were obtained from publicly available resources. Public Domain Software—the Python programming language and the Pandas library—was used for software implementation of the machine-learning method: the developed model consists of two components—analysis and prediction. The analysis of the number of hospitalized children was performed using the Pearson correlation coefficient. Short- and medium-term predictions were made with the use of non-iterative SGTM neural-like structures that were taught in supervised mode and tested in online mode. The RMS and maximum ones that were reduced to the range of error values of short-term (up to a week) and medium-term (up to 2 weeks) predictions did not exceed 0.48% and 0.61% and 1.81% and 2.83%, respectively. The developed model can also be used for predicting other COVID-19 parameters.

continued until mid-July 2021, whereas the third wave ended 10 February 2022. On 11 January 2022, the fourth wave of the coronavirus pandemic began, and it continues until now.
In Ukraine, all waves began with a slight delay compared to the world. The fourth wave of the COVID-19 pandemic began in Ukraine on 11 January 2022, while worldwide it already began in early December 2021. Compared to other waves of the coronavirus pandemic in Ukraine, the fourth is the most rapid. This is due to the emergence of the new strain of SARS-CoV-2 named Omicron (B.1.1.529) that is more contagious than the previous ones. The Omicron variant replicates in the lung tissue 70 times faster than Delta variant of coronavirus [3,4]. However, due to the lower ability to penetrate deep into the lung tissue, the Omicron variant generally causes less severe disease and fewer deaths than earlier variants of the coronavirus.
Another specific feature of the fourth wave of the pandemic is the fact that the Omicron variant began to affect not only the elderly but children as well [5,6]. This wave of the pandemic can be considered the "children's" wave, as hospitalization rates for children and adolescents have increased compared to previous waves of the coronavirus pandemic. The analysis of literature showed the features of the clinical picture of COVID-19 in children hospitalizations and the consequences of the disease. Methods of statistical analysis [7], regression analysis [8] were used to identify factors of COVID-19 influence on children. The research allowed identifying factors that increase the risk of hospitalization: pneumonia, gastrointestinal symptoms, multisystemic inflammatory syndrome in children, rash, history of diabetes, obesity, and comorbidities. 41.5% of children who came though COVID-19 had a persistent cough and asthma-like symptoms. In addition, those children who have allergic rhinitis and those, family members of which already had asthma were more prone to it. Moreover, among the common consequences were mental disorders, anxiety and fear disorders, depressive disorders, and obesity.
The fact that most of the children were unvaccinated also contributed to a significant hospitalization of children. Only since 14 October 2021, Ukraine allowed to vaccinate children aged over 12 years. As of 18 October 2022, the number of children aged from 12 to 15 years that were vaccinated against COVID-19 equaled to 2922 persons, while there were 7592 vaccinated children among those aged from 16 to 18 years.
The article is devoted to the analysis and prediction of the fourth wave of the coronavirus pandemic in the Lviv region that share borders with Poland and has an airport of international importance. In addition, Lviv is an extremely popular tourist destination due to its multicultural heritage and many historical and architectural monuments. For this reason, those numerous visitors might contribute to the rapid spread of a new strain of coronavirus [9][10][11].
The paper consists of the following main sections-Introduction, Related works, Materials and methods, Modeling and results, Comparison and discussion, and Conclusion. Section 1 analyzes the literature on this issue in the world in general and in Ukraine in particular. Identical research for terms of the Lviv region that is unique by its geolocation was not found in the literature. Section 2 covers the existing mathematical models and machine-learning methods for analyzing and predicting the spread of the COVID-19 pandemic in children. Section 3 entitled Materials and methods describes in detail the collection and processing of statistical data; finding of correlation dependences between parameters of the COVID-19 pandemic using the Pearson correlation coefficient; preparation of training and test data for SGTM neural-like structures. Section 4 is devoted to modeling and presenting the obtained results. Section 5 compares the results obtained with the help of the developed model of machine learning and presents prospects for further research.

Related works
SIR models and their modifications are most commonly used to predict various scenarios of COVID-19 propagation [12]. Modeling and analyzing of various scenarios of COVID-19 propagation with the use of modified multi-agent systems are presented in works [12,13].
The difference between the solution of the SEIRD model and the observed data with the use of the ARIMA model is effectively calculated and predicted in the publication [14]. Mathematical modeling concerning the study of COVID-19 dynamics in Ukraine is based on the use of the latest data on the parameters characterizing clinical features of the disease. These include age parameters of the disease, as well as contact matrices that are related to the age and location, for representing contacts [15].
Most often, mathematical models are developed with the use of regression models based on power polynomials [16], segmented regression model [16], multifactor linear regression [2,15], and Ridge regression [18]. Methods based on the autoregressive integrated moving average (ARIMA) model have been developed and successfully used to predict the dynamics of the COVID-19 epidemic in Ukraine [19]. The criterion for their evaluation is the average absolute error that amounted to 4.7%.
Regression models based on the methods of least squares are often used for calculating the total number of confirmed COVID-19 cases and the number of deaths [20]. Correlation analysis is also used effectively [21]. Based on them, machine-learning algorithms and compartment models of COVID-19 epidemic process and research of experimental results of modeling are developed.
The mathematical modeling of the wave structure of COVID-19 propagation, within which the wave structure is considered as a complex flow of epidemic events in the form of a set of simple epidemic waves, as well as cross-diffusion reaction-diffusion system using the Lie symmetry method are rarely used [17,22].
The Holt's linear model showed high accuracy for shortterm prediction (up to 10 days) [23]. Artificial neural networks are often used as a basic element in machinelearning models in conjunction with the above-described methods. In particular, the Support Vector Machine is used to predict the COVID-19 epidemic process in Ukraine and neighboring countries [17]. In combination with correlation analysis methods, they are successfully used for short-term prediction of the spread of the coronavirus pandemic [21]. The publication [24] represents neural network modeling of the transformation of the trajectory of economic and social development caused by quarantine restrictions during the COVID-19 pandemic. An agent-oriented model was also developed to study the COVID-19 epidemic process in Ukraine [25]. Therefore, to increase accuracy of short-and mediumterm predictions, the combined use of mathematical methods, Big Data tools and neural networks is required. Within the framework of the research, it was proposed to analyze and predict the number of hospitalized children with COVID-19 during the fourth wave of the pandemic in the Lviv region based on data taken from publicly available sources. The analysis is to be performed using the Pearson correlation coefficient. Short-and medium-term predictions are to be done using non-iterative SGTM neural-like structures [26].

Collection and processing of statistical data
Statistical data taken from open access resources were used to analyze and predict the spread of COVID-19 in Ukraine [27]. In addition, data on the spread of COVID-19 in the Lviv region were obtained from [28]. The Pandas library for the Python programming language was used to work with statistics. This choice is due to its specialization focused on working with data structures by performing operations for manipulating numerical tables and time series. The important and undeniable advantage is that it belongs to Public Domain Software. Due to this, there is freedom in the use of the developed software product for any purpose, freedom to transfer copies of the developed software product and the initial version of the software code.
Since the file contains information about COVID-19 collected from around the world on a daily basis, it became necessary to filter information with the use of the Pandas library according to the «location Ukraine» condition. To ensure a clearer perception, the obtained data were averaged by a simple moving average of seven points, because the cyclic component is 7 days. Next, the number of laboratoryconfirmed cases was divided into four waves of the pandemic. The day when the number of laboratory-confirmed reached the minimum value was considered the day of the next wave of the pandemic. The waves of the COVID-19 pandemic in Ukraine divided as described above have been combined starting from the first day. Figure 1 presents comparative characteristics of laboratory-confirmed cases for four waves of the COVID-19 pandemic in Ukraine and the Lviv region.
According to the above graphs for the entire Ukraine, it can be observed that the maximum daily number of patients in the first, second, third waves of the pandemic was 16,585; 20,456; and 28,477, respectively. The fourth wave began most rapidly and reached the maximum numbers of laboratory-confirmed cases of COVID-19 compared to previous waves of the coronavirus pandemic in Ukraine, namely 45,022. The Lviv region is to be analyzed separately, as it is unique in comparison with other regions of Ukraine.
The Lviv region is one of three regions of the historical and cultural region of Galicia, part of the Carpathian Euroregion. Lviv is a tourist city with an international airport and a wide railway connection with a numerous stream of transit passengers. The region also borders Poland and has large busy checkpoints, including a pedestrian checkpoint. Thus, a large number of tourists in the Lviv region may greatly contribute to the spread of the coronavirus pandemic in this region. Therefore, it is crucial to perform analysis and shortand medium-term predictions of the coronavirus pandemic spread.
According to the above graphs representing data for the whole of Ukraine, the maximum daily number of patients in the first, second, and third waves of the pandemic was 16,585, 20,456, and 28,477, respectively. It is reasonable to assume that there is a correlation dependence between the number of laboratory-confirmed coronavirus cases in Ukraine and the Lviv region. It is expedient to investigate the presence of correlation dependence using the Pearson's correlation coefficient. The coefficient can be easily calculated, and it is informative, as it allows not only to establish the presence or absence of the correlation dependence, but also to show whether it is directly or inversely proportional. Figure 2 depicts comparative characteristics of the number of deaths for four waves of the COVID-19 pandemic in Ukraine and the Lviv region. These data were processed by performing actions identical to those performed with the number of laboratory-confirmed cases using the Pandas library.
As it is shown in Fig. 2a, the maximum number of deaths in Ukraine during the first wave was 297 people per day. During the second wave, this number increased compared to the first wave and amounted to 486. The largest number of deaths caused by COVID-19 was observed in Ukraine during the third wave of the pandemic, and it amounted to 865. The Delta variant (B.1.617.2) was the dominant strain of COVID-19 during the third wave of the pandemic. It mostly affects elderly people. This strain was characterized by the rapid development of pneumonia. Late diagnostics resulted in a high percentage of fatalities. During the fourth wave with the predominating Omicron strain, the number of deaths decreased and amounted to 322 as of 18 February 2022. According to Fig. 2b, the maximum number of deaths in the Lviv region during the first, second, third, and fourth waves was 30, 35, 65, and 28, respectively. However, the fourth wave is still going on, so the values may change.

Detection of correlation dependences between parameters of the COVID-19 pandemic
To detect the correlation dependence between the number of laboratory-confirmed cases of COVID-19 in Ukraine and the Lviv region, the Pearson correlation coefficient between two where x, y is the sample means x m i y m ; S 2 x S 2 y is the sample variances; r x,y is [− 1,1].
If the value of the correlation coefficient is + 1, it means that the dependence between X and Y is linear, and all points of the function lie on a line showing the increase in Y with increasing X. If the value is − 1, then all points lie on a line showing the decrease in Y with increasing X. If the correlation coefficient is equal to 0, then there is no linear correlation between the variables. Using Formula 1, we calculate the Pearson correlation coefficient separately for each wave of the pandemic. Table 1 presents the results of calculating the correlation between the number of laboratory-confirmed cases and deaths caused by COVID-19 in Ukraine and the Lviv region.
As can be seen from the obtained results shown in Table 1, all Pearson correlation coefficients are positive. This means that there is a direct linear dependence between the correlation parameters, i.e., an increase in one parameter is followed by an increase in another dependent parameter. For new laboratory-confirmed cases of coronavirus in Ukraine and the Lviv region, there is a high correlation dependence varying from 0.912 to 0.954. The Pearson correlation coefficient during the first and third waves of the coronavirus pandemic was approximately the same and ranged from 0.933 to 0.936. The highest correlation was observed during the second wave of coronavirus, and it constituted 0.954. During the fourth wave of coronavirus, the Pearson correlation coefficient decreased slightly compared to other waves of the COVID-19 pandemic, but still had a high value of 0.912. This indicates that, during all waves of the coronavirus pandemic, there was a direct linear dependence between the number of laboratoryconfirmed cases of coronavirus disease throughout Ukraine and in the Lviv region.
The correlation dependence between the number of deaths caused by COVID-19 is slightly lower compared to the number of laboratory-confirmed cases. For them, the value of  the Pearson correlation coefficient is within the range from 0.826 to 0.877. The lowest linear correlation dependence was observed during the first wave of the pandemic (0.826), while, during the second and third waves, the linear correlation was approximately the same (from 0.871 to 0.876), and, during the fourth wave, it was slightly lower than during the second and third waves of the coronavirus pandemic and amounted to 0.865. Meanwhile, correlation dependence between the number of deaths caused by COVID-19 in Ukraine and the Lviv region is significant. Moreover, the number of deaths in Ukraine increases with the increase in the number of deaths in the Lviv region, and, conversely, the number of deaths in the Lviv region increases with the increase in the number of deaths in Ukraine.

Preparation of training and test data for SGTM neural-like structures
SGTM neural-like structures are used for short-and mediumterm predictions (Fig. 3), topology and the learning algorithm of which are described in publication [26]. It is based on the model of successive geometric transformations. It can work in controlled and uncontrolled modes and be used for solving various tasks. A feature of the topology of linear SGTM neural-like structure is the lateral connections between adjacent neurons of the hidden layer. The procedures for training and operation of this tool are the same. The greedy noniterative training algorithm allows using SGTM neural-like structures to process a variety of data online efficiently. It has a number of advantages over multi-layer direct distribution networks, i.e., it simulates an arbitrary nonlinear function using only one intermediate layer. The parameters of the linear combination in the source layer can be fully optimized using linear optimization methods that work quickly and do not offer any difficulties with local minima. Therefore, the SGTM neural-like structure learns quickly, and it is appropriate to use it for short-and medium-term predictions. At the same time, it is sensitive to a high dimensionality of the input vector due to poor extrapolating properties. For this reason, multidimensional data inputs should be avoided.  Figure 3 shows the SGTM neural-like structures consisting of three layers: input layer, hidden layer, and output layer. There are additional lateral unidirectional connections between neurons of the hidden layer. The vector of input signals is denoted as X x 1 , x 2 , . . . , x 6 . The output consists of only one neuron, to which we apply a value of y. As the number of children admitted to hospitals increased significantly during the fourth wave of the pandemic, it is reasonable to develop a machine-learning algorithm for predicting the number of hospitalized children in the Lviv region.
Taking into consideration the established correlation dependence between the number of patients in Ukraine and the Lviv region, we will form a learning set for SGTM neurallike structures.
To input neurons, we will apply: • the total number of beds occupied by children in the Lviv region; • the number of beds occupied by children with a confirmed COVID-19 diagnosis in the Lviv region; • the number of beds occupied by children with a suspected COVID-19 diagnosis in the Lviv region; • the total number of beds occupied by children in Ukraine; • the number of beds occupied by children with a confirmed COVID-19 diagnosis in Ukraine; • the number of beds occupied by children with a suspected COVID-19 diagnosis in Ukraine.
All values of the input vector are entered per i day. The value of the number of hospitalized children in the Lviv region per (i + 1) day is supplied to the output neuron. The optimal number of neurons of the hidden layer that is equal to 3 was experimentally established. The criteria for assessing the accuracy of training and prediction is errors: root mean square and maximum ones are reduced to a range of values. SGTM neural-like structures are considered to be trained if both errors are minimal.

Modeling and results
The main steps of the developed method of machine learning with the support of SGTM neural-like structures for predicting the number of hospitalized children in the Lviv region are schematically presented in Fig. 4.
All steps are implemented using the Python programming language and the Pandas library. The modeling procedure was performed using the personal computer: Intel Core I7-5500U (with integrated graphic processor); 4 CPU 2.40 GHz (4-CPUs); 16 Gb RAM; Win10 OS.
The results of short-and medium-term predictions of the children's hospitalization rates during the fourth wave of the pandemic in the Lviv region are presented in Table 2. The criterion for assessing the accuracy of prediction is to minimization of the root-mean-square value and maximum value reduced to a range of error values.
As can be seen from the results shown in the table above, the standard error of the short-term prediction reduced to the range of values does not exceed 0.48%, while the maximum one does not exceed 0.61%. For the medium-term prediction, the root-mean-square error reduced to the range of values does not exceed 1.81%, while the maximum one does not exceed 2.83%. It is notable that the accuracy of prediction is highly precise.

Comparison and discussion
There are several COVID-19 predictive models that use machine learning. The effectiveness of the developed method was evaluated by comparing its work with the results of the six most common machine-learning algorithms: logistic regression; support vector machine; k nearest neighbor; random forest; gradient boosting; neural network. It is reasonable to compare the methodologies used to develop methods of machine learning. Their results are presented in Fig. 5. Regression models [15][16][17] revealed a polynomial relationship between coronavirus infection rate and population density. The obtained results are confirmed in the model of machine learning developed and presented in this article due to the Pearson correlation coefficient and SGTM neural-like structures. Machine-learning models based on such methods as: Support Vector Machine [24], k nearest neighbor [25,27,28], Gradient Boosted Decision Tree [29,30], and random forest [30,31] allow to predict the spread of the COVID-19 pandemic with high accuracy. However, most of them were employed to predict the number of hospitalized patients with COVID-19 in general, and not children in particular. However, they did not take into account the peculiarities of the disease in children and their low level of vaccination. Often they did not consider the peculiarities of geolocation of sick children and their higher contact rate than that of adults. The machine-learning methods developed to predict the spread of COVID-19 in children worked on the basis of the clinical picture of the disease after the children had already been hospitalized. Such methods are unsuitable for predicting the number of hospitalizations and, consequently, the required number of "children's" beds in hospitals.
The model developed in this article is devoid of the disadvantages described above. It also allows predicting online with the ability to retrain as new data become available. To visualize the obtained results, we present graphs of actually occupied children's beds intended for COVID-19 patients in the Lviv region and the results of short-and medium-term predictions. Figure 6 shows actually occupied children's beds intended for COVID-19 patients in the Lviv region in blue, results of the short-term prediction made for a week ahead of schedule-in red, results of the medium-term prediction with a 2-week lead-in green.
Taking into account, that the set of parameters is quite simple and has a linear dependence that was proved by the Pearson correlation coefficient, the results of the prediction turned to be of high accuracy. The lowest accuracy was shown by the medium-term prediction with the largest advance of 14 days. For it, the root mean square reduced to a range of values learning error was 0.71%, and the maximum one was 1.64%. When conducting tests with real values, the RMS reduced to the range of values learning error was 1.81%, and the maximum one was 2.83%. However, taking into account a significant advance, such accuracy is satisfactory. Undoubt- The Lviv region is one of the regions of Ukraine with a well-established system for collecting statistics. The medical system that deals with testing of patients for COVID-19 also functions quite effectively. The daily number of unprocessed tests is inconsiderable (up to 10% of the total number of tests) compared to the number of processed tests with confirmed positive or negative results. Therefore, qualitative statistics allows making a highly accurate prediction.
Thus, during the fourth wave of the COVID-19 pandemic, a new strain of the virus, Omicron, predominates. The following influenced its rapid spread in the Lviv region: • a high level of contagiousness caused by the B.1.1.529 virus compared to the previous ones; • a level of vaccination among children in Ukraine in general and in the Lviv region in particular; • geolocation of the Lviv region, through which a significant passenger flow passes.
For the first time, a machine-learning model was developed for the Lviv region for this task. It consists of two components: analysis (with the use of the Pearson's correlation coefficient) and prediction (with the use of non-iterative SGTM neural-like structures).
Modern software solutions are fully used for software implementation of the machine-learning method. The choice of Python programming language is due to the fact that it has dynamic semantics that can be reasonably used for rapid development of programs and combining of existing components. The Pandas library is best suited for manipulating open source data and conducting further analysis. Its ability to manipulate numeric tables and time series was used. Using the Pandas library, the data regarding the coronavirus pandemic specifically in the Lviv region, including the data starting from January 11, 2022 (the 4th wave of the pandemic), were filtered. To determine the COVID-19 parameters that influence the number of children's hospitalizations in the Lviv region, an indicator that determines the degree of linear dependence-the Pearson's correlation coefficient-was used. As a result, it was found that six parameters supplied to the input of SGTM neural-like structures have the highest correlation dependence.

Conclusion
Within the framework of this work, we attempted to use the simplest non-iterative neural networks for predicting a forecast online. The developed model can relearn using new data as soon as they arrive in open access resources. The developed model of machine learning can also be used to predict other parameters that describe the COVID-19 pandemic online.
The choice of this type of neural network is due to its main advantages of it as a non-iteration type of training with satisfactory accuracy. The RMS and maximum error that were reduced to the range of error values for the short-term prediction (up to a week) and medium-term prediction (up to 2 weeks) did not exceed 0.48% and 0.61%, 1.81% and 2.83%, respectively. The obtained accuracy of prediction is satisfactory in comparison with identical predictions made for other geolocations.
However, if there is no need for retraining the GRNN neural networks in real time, other types of neural networks can be used to obtain higher accuracy of prediction. Since new statistical data on the number of hospitalized children in the Lviv region are recorded with daily discretion, it is possible to use other types of neural networks that will relearn in on-demand mode (as new data become available).
Prediction accuracy can be improved using GRNN neural networks as a pre-processing tool in meta-learning approach [32]. Therefore, it is worth to conduct further research in the direction of using other types of neural networks for improving prediction accuracy.