Prediction model of seasonality in the construction industry based on the accidentality phenomenon

The construction industry is an economic sector that is characterized by seasonality. Seasonal factors affect the volume of production, which in turn affects the accident rate. The aim of the research presented in the article was to develop a model for predicting the number of people injured in occupational accidents in the construction industry. Based on the analysis of statistical data and previous studies, the occurrence of certain regularities of the accidentality phenomenon was found, namely the long-term trend over many years, as well as seasonality and cyclicality over the course of a year. The found regularities were the basis for the assumptions that were made for the construction of the model. A mathematical model was built in the non-linear regression dimension. The model was validated by comparing the results of prediction errors generated by the developed model with the results of prediction errors generated by other known models, such as ARIMA, SARIMA, linear and polynomial models, which take into account the seasonality of the phenomenon. The constructed model enables the number of people injured in accidents in the construction industry in selected months of future years to be predicted with high accuracy. The obtained results can be the basis for making appropriate decisions regarding preventive and prophylactic measures in the construction industry. Commonly known mathematical tools available in the STATISTICA package were used to solve the given task.


Introduction
The construction industry is one of the most important sectors of the economy of every country. Due to the high degree of diversity and complexity of construction works, it is characterized by a high level of hazard to the life and health of employees. The consequence of this is a high accident rate for this sector of the economy, which is measured by the number of people injured in accidents each year. This statement applies to most countries in the world [1][2][3].
The accidentality phenomenon has a negative impact on the economy and causes a serious global public health problem. In many countries of the world, to improve the existing accident situation at workplaces, actions that aim to achieve a zero number of accidents have been initiated [4]. This goal can be achieved by legislative changes in labor law, systematic employee education, shaping the appropriate safety climate and safety culture in enterprises, and with appropriate management. To properly shape the policy of changes that are introduced, and the directions of accident prophylaxis and prevention, it is necessary to observe trends in occupational accidents and to analyze forecasts in this area.
The accident rate in the construction industry depends on many factors of a technical, human, managerial, legislative, environmental and economic nature. The accidentality phenomenon in the construction industry is also influenced by unknown factors that have not yet been investigated [5][6][7][8].
Research conducted by scientists in many countries around the world has shown that the number of accidents in the construction industry varies from year to year, as well as from month to month.
The aim of the research was to develop an accident prediction model in the construction industry. Statistical data published by the Central Statistical Office in Poland was analyzed. The occurrence of certain regularities in the accidentality phenomenon was found, namely, a decreasing long-term trend, as well as seasonality and cyclicality of this phenomenon. The found regularities were the basis for the assumptions that were made to build the model. These assumptions included: non-linear regression, as well as the seasonality and cyclicality of the phenomenon. The STATIS-TICA package was used for the calculations.
Building the model will allow the number of people injured in accidents in selected months of the coming years to be predicted. The obtained results can be the basis for making appropriate decisions regarding preventive and prophylactic actions. Knowledge about hazards in a workplace, and also the adopted methods of preventing accidents, which were provided systematically to employees, may change their attitudes in terms of the perception of dangerous situations at work and the use of safe methods of work. It can also be the basis for forecasting the future costs of occupational accidents, which in turn may have an impact on prices in the construction industry, as well as on financial and insurance policies.

Literature survey
The construction industry is considered to be one of the most dangerous sectors of the economy. Statistical data published in various countries of the world concerning occupational accidents indicate that the problem of accidents in the construction industry is global. The risk of a fatal accident in the construction industry is three times higher than in other industries [9,10]. Each year, the construction industry around the world suffers more than 60,000 deaths. The phenomenon of accidentality in the construction industry should, therefore, be given special attention, as it has a negative impact on the economy, employers, employees, and their families.
When analyzing the accident rate in the construction industry on a monthly basis, it can be seen that the number of accidents in individual months of the year is different [11]. This may indicate the seasonality of this phenomenon. This thesis is confirmed by research conducted by scientists in many countries. In Singapore, for example, the highest number of deaths in accidents in the construction industry occurs in October before the rainy season [1]. In Taiwan, in the period from 2000 to 2009, most accidents occurred in the summer [12]. In turn, in South Australia, the number of accidents in the period from 2002 to 2011 was the highest in Autumn [2]. Based on a study of occupational injuries and diseases in the United States in the years 2003-2010, it was found that most of the injuries occurred during the summer months [13]. Accidents in Korea occurred less frequently in winter than in other seasons [14].
The above-mentioned examples show that there are seasonal factors that influence the occurrence of accidents in the construction industry. Research conducted by Traczyk and Trzebski [15], which concerned work-related physical effort, showed that a too high or low temperature, high humidity, high and low atmospheric pressure, and also changes in wind speed may cause changes in the body of an employee-even in acclimatized people. Such changes could become a cause of an accident. It was also noted that the number of accidents depends on the climatic zone. In Spain, for example, in the period 2003-2008, the majority of accidents occurred in the Mediterranean and continental zones and accounted for 48% and 35% of all accidents, respectively [3]. The influence of climatic zones and the period of the year was also analyzed in China [16]. It was observed that in the years 2010-2016, August was the month with the highest average number of accidents, and the incidence rate was the highest in regions with a mild climate. Therefore, a constant pattern of the accidentality phenomenon in the construction industry can be observed. The fact that it repeats each year confirms the seasonality of the phenomenon.
The seasonal nature of the accidentality phenomenon in the construction industry is influenced by many factors. W. J. Granger defined a seasonal factor as any pattern that repeats itself every year in the same month, regardless of the cause. He classified the factors influencing seasonality in the construction industry into four categories, namely [17]: • climate-related factors, including weather changes, which have a direct impact on the efficiency of • construction production; • calendar-related factors, including the number of actual working days in a month, which affect monthly production; • date-related factors such as the number of public holidays during which production is stopped; • expected factors that influence seasonal behavior related to, e.g., the date of VAT settlements, the payment of other taxes, the end of the year, and also the need to comply with both the production result and the annual budget.
Seasonal factors affect the amount of construction production, which in turn affects the accidentality rate. Mach et al. [18] identified cyclical fluctuations in relation to the construction industry in 16 European countries. Moreover, the authors of article [19] showed the dependence between the number of accident and production cycles in the construction industry in South Korea. The authors of paper [20] analyzed the influence of seasonal factors on the costs and duration of the implementation of a project. The relationship between the accident rate and the value of production was proved by, among others, Fabiano et al. Research concerning the Italian industry showed that there is a cyclical trend in the number of fatal accidents and that the number of accidents depends to a large extent on the value of production [21]. Similar conclusions were formulated by Dong et al. [22]. Their research, concerning falls from roofs, confirmed the relationship between the number of fatalities and the economic cycle. Research conducted by Hoła and Szostak [23], which referred to the construction industry in the European Union countries, also indicated the existence of a long-term cyclical trend in the accidentality phenomenon.
Analysis of the accidentality phenomenon in the construction industry can also be found in [24][25][26][27]. The research conducted by Hoła [28] regarding the Polish construction industry indicated that the accident rate in the construction industry changes every year, and that there is a clear downward trend. Studies of trends in the accidentality phenomenon and the prognostic models that are built on their basis can be used to predict future events. Moreover, based on this knowledge, decisions in the area of labor law, education and prophylactic and preventive actions can be undertaken.
For the modeling of the accidentality phenomenon in the construction industry, among others, linear mathematical models, descriptive and graphical models, and IT models (e.g., neural networks) can be used. For example, the research included in [28] involved qualitative and quantitative analyses that resulted in the creation of mathematical models for the development of the trend of the accidentality phenomenon in the Polish construction industry. In turn, Indian scientists proposed the use of artificial neural networks (ANNs) to predict the behavior of construction workers in a risk situation [29]. In American studies [30], an approach based on the Bayesian network (BN) was proposed to diagnose the risk of falls from a height. The tests performed on the model enabled the probability of an occurrence of an accident related to various safety hazards to be determined.
Thanks to IT technologies, it is possible to virtually model the construction environment and its hazards. Paper [31] proposes the use of a construction site model developed in a virtual environment for training and educational purposes. In turn, studies [32] propose the use of a virtual reality environment to identify hazards, to determine the rules of movement of employees on a construction site, and to specify the impact of the building environment on its surroundings. In research [33], the authors proposed a model that enables the tracking of construction workers in real time using Real-time locating systems (RTLS). The developed tool can be used to control the exposure of construction workers to hazards that change over time.
The authors of study [34] proposed an approach based on machine learning (ML) to assess occupational safety. According to the authors, all accidents follow certain patterns, which in turn allows them to be predicted. The ML technique was used to predict slip-trip-fall (STF) accidents.
Based on the literature review, it can be concluded that the accidentality phenomenon in the construction industry is seasonal, and that the number of occupational accidents changes cyclically over a period of time covering one calendar year. When conducting studies concerning the accidentality phenomenon during a multi-year period, it is also possible to identify a general trend in the accident rate in the construction industry. The constructed prediction model should, therefore, take into account both of these phenomena simultaneously.
To predict data, various methods are used, such as linear, logistic and Bayesian regression [35]. To analyze the time series in the situations of many explanatory variables, various selection and extraction methods can be used, which will allow a set of features that best discriminate the explained variable to be obtained. Such methods include Principal Component Analysis [35,36].
The review of the methods used to build prediction models shows a certain research gap, namely that all regression methods encounter difficulties in a situation when, on the basis of numerical data, the simultaneous occurrence of a trend and seasonality is found. Although there are methods that take into account the seasonality and cyclicality of phenomena, such as: ARIMA and SARIMA [37,38], their application is limited in many cases. This is due to the fact that these methods can be used to model stationary series, i.e., series in which there are only random fluctuations around the mean value, or non-stationary series that are reducible to the stationary form [39]. In addition, the ability to predict events that may only occur in the near future can be seen to be a limitation.
Taking into account the above observations, the authors attempted to build a model for predicting the number of people injured in accidents in the construction industry, which takes into account both the long-term trend and the seasonal nature of the phenomenon with regards to non-linear regression. The information obtained on the basis of the model on the expected number of accidents in particular months of subsequent years may be the basis for making decisions in the field of legislative, educational, prophylactic and preventive actions, as well as for estimating the costs incurred due to accidents. This in turn may affect prices in the construction industry, as well as the financial and insurance policy of enterprises.

Statistical data
To develop the prediction model, statistical data published by the Central Statistical Office in Poland were used. The data concerned the number of people injured in occupational accidents in the construction industry in particular months of the year during the period covering the years from 2007 to 2019 [40]. Figure 1 shows, in the form of a threedimensional graph, how the total number of people injured in occupational accidents in the construction industry ( x i,j ) changed in the individual months of the analyzed years.
This figure shows that the number of people injured in accidents in the construction industry was high at the beginning of the analyzed period, while from 2012 onwards it shows a clear downward trend. When analyzing the distribution of the number of people injured in accidents in the following months of the studied years, a certain regularity can also be noticed: the accident rate in January is low, then increases, reaches its maximum in July and August, and decreases again in the following months. The above observations suggested examining the correlation between the following years of the analyzed period of time and the number of people injured in the construction industry, and also the correlation between individual months of the year and the number of people injured in those months. The obtained values of the correlation coefficients will indicate whether there is correlative dependency, and what the actual strength of the relationship between these variables is.

Investigation of the correlative dependencies between variables
Due to the fact that one of the variables assumes ordinal values, and that the other one assumes quantitative values, the Spearman's rank correlation was used to determine the strength of the relationship between the studied variables [41,42]. The study of the correlative dependence was carried out for two sets of numerical data, namely: The following scale of evaluating the strength of the relationship between variables r i andx i and m j andx j, which was expressed by the Spearman Rho coefficient [43], was adopted: • Rho = 0-the variables are not correlated, • 0 < Rho < 0.1-very weak correlation, • 0.1 ≤ Rho < 0.3-weak correlation, • 0.3 ≤ Rho < 0.5-average correlation, • 0.5 ≤ Rho < 0.7-high correlation, • 0.7 ≤ Rho < 0.9-very high correlation, • 0.9 ≤ Rho < 1-nearly full correlation.
The calculations were conducted using the STATISTICA package. Table 1 presents descriptive statistics of variable x i -the total number of people injured in occupational accidents in the construction industry in the i-th calendar year.
The calculated values of the Rho and p statistics (rho =− 0.847; p < 0.01) indicate a very high negative correlation. It can be concluded that the total number of accidents decreases over the period from 2007 to 2019. In turn, Table 2 contains descriptive statistics for variable x j -which The calculated values of the Rho and p statistics (rho = 0.094; p = 0.245) indicate a very weak correlation between the variables, which proves their non-linear dependence and the seasonality of the phenomenon.
To illustrate the seasonality of the accidentality phenomenon in the construction industry, graphs of the variability of the total number of people injured in accidents in the construction industry in particular months m j ;j = 1, … , 12 of consecutive years were developed. Figure 2 shows the correlation curves and the distribution charts of the monthly number of people injured in occupational accidents in the construction industry in individual years of the analyzed time period. Figure 2 provides very important information. Each year, there is a very repeatable scenario in which the total number of accidents over a period of 1-12 months can be approximated by a quadratic function. The maximum value of this function falls on the 7-8th month of the year. The distribution of the parabola arms is also comparable in individual years, and it can, therefore, be concluded that we are dealing with the phenomenon of an increase in accidents in the summer, and a decrease in the winter.
As a measure of adjustment of the correlation curve (in the form of a quadratic function) to the empirical values, the coefficient of determination R 2 i was adopted. The value of this coefficient is presented in Table 3.
When analyzing the results, it can be noticed that the value of the determination coefficients varies from R 2 12 = 0.305 (which means an average correlation) to   The charts presented in Fig. 2, when combined with each other (Fig. 3), show how the total number of people injured in occupational accidents in the construction industry changed in individual months in the period from 2007 to 2019. Figure 3 also shows the decreasing trend of the abovementioned feature, which was confirmed by a very high negative correlation (rho = − 0.835, rho 2 = 0.697, p < 0.01).

Mathematical model of the accidentality phenomenon in the construction industry
When building the model, the following assumption was made: if the changes in the total number of accidents in the same months of subsequent years are comparable, then it is possible to estimate such a constant numerical value that will allow a trend for subsequent annual cycles to determined. For this purpose, the following were calculated: the average annual increase/decrease in the total number of accidents, and the average monthly increase/decrease in the total number of accidents.
The average annual increase/decrease in the total number of people injured in accidents ∆r can be written as follows: where n determines the number of years on the basis of which the model is being built. For the analyzed data, n = 13 was assumed. k determines the number of months on the basis of which the model is being built. For the analyzed data, k = 12 was assumed. x i,j is the total number of accidents in the i-th month of year j. i is the subsequent year, i = 1,2,…,13. j is the subsequent month, j = 1,2,…,12. (1)

Measurements in the subsequent months of 2007-2019
Similarly, the average monthly increase/decrease in the total number of people injured in accidents can be determined as follows: where n determines the number of years on the basis of which the model is being built. For the analyzed data, n = 13 was assumed. k determines the number of months on the basis of which the model is being built. For the analyzed data, k = 12 was assumed. x i,j is the total number of accidents in the j-th month of year i. i is the subsequent year, i = 1,2,…,13. j is the subsequent month, j = 1,2,…,12.
The results of parameters (1) and (2) are presented in Tables 4 and 5. The results multiplied by 100% determine the percentage increase. Table 6 presents the results of the analysis of the correlation between the values of the average annual increases (∆r i ) and average monthly increases (∆m j ) in the number of people injured in accidents and the subsequent years and months of the year.
The obtained values of the correlation coefficients indicate: a weak correlation between the average annual increases in the total number of people injured in accidents and the subsequent years, and a weak correlation between the average monthly increases in the total number of people injured in accidents and the subsequent months of the year. The obtained results allow for the conclusion that the increases do not have a trend, and are, therefore, comparable with regards to months and years.
Due to the fact that the trend and seasonality are the strongest for 2016 ( R 2 10 = 0.782 ; p = 0.007) (Table 4), a regression curve for this year was determined by approximating the seasonal fluctuations in the accident rate in the construction industry using a polynomial function in the form: The above function refers to the adopted pattern of seasonal changes in the accident rate in the subsequent months of 2016. Due to the fact that the long-term trend should also be taken into account in the prediction model, the longterm trend coefficient was introduced into formula 3, and expressed as n m,2016 = −3.7234m 2 + 52.969m + 306.58 , m = 1, … , 12  where for empirical data, Δŕs r = 0.0473% (as shown in Table 6) is the average value of ∆r i . This coefficient shows how many percent the value of the prediction changes for each month of any year in relation to 2016. Finally, a prediction model of the total number of people injured in occupational accidents in the construction industry n r,m was obtained for year r and month m, and is expressed by the following formula:

Results
Using the proposed prediction model expressed by Eq. (5), the prediction error was determined as the difference between the real value of the total number of people injured in accidents in the construction industry and the predicted value obtained from the model. The prediction error values for individual years and months are summarized in Tables 7 and 8, and also in Fig. 4. In Table 7, the average value of the prediction error for each year, as well as other statistics, are calculated for 12 months. In turn,    in Table 8, the values are calculated for the same month in individual years. The mean prediction error obtained for all the data is 13.59%, with a standard deviation of 9.95%. The lowest prediction error is 0.07%, and the highest is 41.87%. Figure 4 shows (a) a scatterplot of the prediction errors for all the subsequent measuring points, and (b) a plane plot showing the distribution of prediction errors broken down into months and years.
From the scatterplot of the prediction error presented in Fig. 4a, it can be concluded that the prediction error decreases with each subsequent measurement, and since 2014 has been at the level of 5-10%. The chart showing the prediction error broken down into years and months, which is presented in Fig. 4b, shows that the prediction error for each month is comparable for individual years. However, this error decreases in subsequent years. Based on the obtained results, it can be concluded that in the years after 2014 the prediction error is significantly lower than in the years before 2014. The prediction error after 2016 enables the total number of people injured in accidents to be forecasted with high accuracy.

Discussion
The results of the analysis of the number of accidents in the construction industry in Poland in the years 2007-2019, which are included in this study, and the results of studies conducted in other parts of the world, confirmed that it is a seasonal phenomenon. The published statistical data confirm that the number of accidents in the summer season is  greater than in the winter months [1,2,12,13]. In the graphs presented in Fig. 2, a similar pattern of variability in the number of people injured in occupational accidents in the construction industry is repeated, which proves the seasonal and cyclical nature of this phenomenon. Similar conclusions with regards to seasonality in the construction industry were formulated by Mach et al. in [18]. In long-term studies concerning accident rates in many European and world countries, the formation of long-term trends in accident rates can be noticed. Thanks to the "zero accidents" action carried out in the construction industry around the world, there is a downward long-term trend in many countries. Both these phenomena, namely seasonality and cyclicality, as well as the long-term trend, were included in the developed model of predicting the number of people injured in occupational accidents in the construction industry.
A model based on non-linear regression assumptions was developed. The developed model was verified by comparing the values calculated on the basis of the model with the empirical results. The average prediction error obtained for all the data is 13.59%, with a standard deviation of 9.95%. The lowest prediction error is 0.07%, and the highest is 41.9%. When analyzing graphs 4a and b, it can be noticed that the prediction error decreases when approaching the end of the analyzed time interval. It can be assumed that in the years after 2014 the prediction error is significantly lower than in the years before 2014.
The developed model was verified by comparing the prediction calculated on the basis of the model with the results obtained using the following methods: linear regression, polynomial regression, as well as the ARIMA and SARIMA methods. In the calculations carried out with the above-mentioned methods, the experimental protocol was adopted as a fivefold cross-validation. Within the same data set, fourfold was used for learning, and one was used for testing and validation. Thus, in each method, the learning set accounted for 80% of the samples, with 20% of the samples being included in the testing set. The results of prediction errors obtained using verifying methods, i.e., linear regression, polynomial regression, and ARIMA and SARIMA, are presented in Table 9.
Using the Wilcoxon test, the significance of the differences of the prediction errors between the different base models was assessed at the level of statistical significance (α = 0.05). For example, for model (e), the obtained results were a, b and d. This means that the prediction error of 8.9% for the authors' model is statistically significantly (p < 0.05) lower than in the case of using: a-linear regression, for which the prediction error amounts to 32%; b-polynomial regression, for which the prediction error amounts to 28%; and d-the SARIMA model, for which the prediction error is equal to 11.7%. This result can be compared with the result obtained using the ARIMA method, in which the prediction error is equal to 10%. For the considered task, the developed model turned out to be better than the models of linear and polynomial regression, and also the SARIMA model. The SARIMA and ARIMA models provided similar results and were better than the methods that are based on the least squares method.
For the forecast covering the period from 2014 to 2019, the prediction error for the developed model was equal to 5.45%. The ARIMA model generated an error of 7.89%. The other methods generated the following prediction error values: SARIMA-8.6%, RL-26%, and RW-23%. The model for predicting the number of people injured in occupational accidents in the construction industry, which was developed and presented in this article, reflects well the attributes of the accidentality phenomenon in the construction industry, such as the long-term trend, seasonality and cyclicality.
When building the model, data concerning the number of accidents in the construction industry in individual months of the year were used. An accident situation is more accurately reflected in the indicator that refers to "the number of accidents in relation to the number of employees". However, in Poland, precise data on the number of employees in the construction industry in particular months of the year are not available. The lack of these data limit our research to only the number of accidents.

Conclusions
The aim of the research that was undertaken and described in this article was to develop a model for predicting the number of people injured in occupational accidents in the construction industry based on statistical data published by the Based on the analysis of statistical data and previous studies, the occurrence of certain regularities of the accidentality phenomenon was found, such as a decreasing long-term trend over many years, as well as seasonality and cyclicality of the phenomenon over the course of a year. The found regularities were the basis for the assumptions that were made for the purpose of building the model.
Prediction errors were calculated for the data obtained from the model. The obtained results of the prediction errors indicate that the model reflects well the accidentality phenomenon in the construction industry, and is better than the linear and polynomial regression models and the ARIMA and SARIMA models. The average prediction error for the developed model was equal to 8.9%. The information obtained on the basis of the model concerning the expected number of people injured in accidents in particular months of subsequent years may be the basis for making decisions in the field of legislative, educational, prophylactic and preventive actions, as well as for estimating the costs associated with accidents. This in turn may have an impact on shaping prices in the construction industry, as well as on the financial and insurance policy of enterprises. The gained information can also be used to estimate the probability of an occurrence of an accident in the future, and the knowledge obtained as a result of the prediction can be the basis for managing occupational safety in the construction industry.
The model presented in the article was built on the basis of data concerning the number of people injured in occupational accidents in the entire construction branch. Therefore, an analysis of the model's correctness was performed for data covering the entire construction industry. However, the model is universal and can be used to forecast accident rates in individual subsections, e.g., industrial and residential constructions, road infrastructure, bridges, and others.

Availability of data and material Not applicable.
Code availability Not applicable.

Conflicts of interest Not applicable.
Ethics approval Not applicable.

Consent to participate Not applicable.
Consent for publication All the authors have read and agreed to the published version of the article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.