1 Introduction

From the point of view of agriculture, ecology, or environmental engineering, the capability of forecasting meteorological variables in the long and short term is very useful. On the one hand, short-term forecasts enabling the planning of field work in agriculture, management of mass events or tourism are important; on the other hand, long-term forecasts related to the advancing climate change are also very interesting. These changes depend on many global factors, the most important of which is the amount of greenhouse gases in the atmosphere. According to the analyzes, greenhouse gases (mainly carbon dioxide) are responsible for the temperature increase observed in many places around the world. In this case, depending on the RCP (Representative Concentration Pathways) scenario [1, 2], the content of these gases in the atmosphere is forecast using climatic indicators up to the year 2100. The ability to model climate variables can be used for forecasting as well as for generating synthetic data in any time horizon and in any resolution using weather generators. Meteorological variables in the short-term horizon are usually described using models that take into account a large number of atmospheric factors [3, 4]. There are numerous models based on various types of statistical analysis [3,4,5], and from the point of view of new technologies, models based on artificial intelligence are becoming more and more popular [6]. From the point of view of generating meteorological data [7], the most popular models that are in use are models that in the case of precipitation, the model total precipitation by means of the first-order Markov chain to determine the occurrence of wet/dry days, and then for the amount of precipitation, the multidimensional two-parameter gamma distribution is used [2, 8, 9]. For other variables such as daily minimum temperature, maximum temperature, solar radiation, and wind speed, multivariate autoregression models are usually used [10]. In the first generators, it was assumed that the distributions of the described variables could be described by Gaussian distribution. However, it was soon observed that this assumption was not correct. In [11], the multivariate closed skew-normal distributions [12] were used for modeling of the residual skewness observed in climate data.

Following this path, one should ask whether the use of autoregressive Gaussian models to estimate model parameters is correct and whether there are reasons to go from the Gaussian model to the generalized models. Answering this question is the first goal of the paper. The second purpose is to link total precipitation data with temperature time series. In the first generator, precipitation was treated as a variable not correlated with temperature, which, as we will show in the paper, is inconsistent with reality.

To achieve the defined goals, in this paper, we propose a multidimensional approach for climate data description. To take into consideration the relation between the examined climate data (i.e., temperature and total precipitation), we apply a multidimensional vector autoregressive (VAR) model. It should be emphasized that this time series can be used to describe the dependence between analyzed variables, and at the same time, it takes into account the time dependence for single components. The VAR model is considered as the classical multidimensional time series [13]. However, because the analyzed data exhibit non-Gaussian behavior, the classical VAR model used in this paper, is modified to take into consideration the specific characteristics of the data. Thus, we propose to replace the Gaussian distributed innovations in the classical model by a more general class, namely \(\alpha -\)stable distributed variables. This class of distributions was introduced in 20’s [14, 15]. The stable probability laws are important in probability theory. According to the Generalized Central Limit Theorem, the stable laws attract distributions of sums of random variables with a diverging variance. It is a generalization of the Central Limit Theorem, which states that the Gaussian law attracts distributions with finite variance. The \(\alpha -\)stable distribution is considered as the generalization of the Gaussian one. Although the first application of this distribution appeared in the work of Mandelbrot [16], where the financial time series were analyzed, the \(\alpha -\)stable distributions and processes have found various applications, including economy [17,18,19,20,21,22,23], physics [24,25,26,27], signal processing [28,29,30,31], computer science [32,33,34,35], geology and geophysics [36,37,38], biology [39,40,41,42], and many other fields. The \(\alpha -\)stable distributions are also considered for climate data modelling, see, e.g., [43,44,45,46]. We also refer the readers to interesting bibliography positions [47, 48]. The VAR time series based on the \(\alpha -\)stable distribution was considered, for instance, in [49,50,51,52]. This model takes into consideration the relationship between the multidimensional data, and by using the \(\alpha -\)stable distribution instead of the Gaussian one, it allows describing the data with the heavy-tailed behaviour.

The paper is organized as follows. In Sect. 2, we present the used model. Next, in Sect. 3, we describe the analyzed climate data indicating the relationship between variables. The relation between temperature and precipitation data is a starting point to apply the VAR model. The results of the fitting are presented in Sect. 4. In this section, we also discuss the prediction results and compare them with the predictions obtained in the Klimada project. The last section concludes the paper.

2 The \(\alpha -\)stable vector autoregressive model

In this section, we present the model under consideration. Because the general methodology was introduced in the literature, therefore, we remind only the basic information referring to the appropriate bibliography positions.

The model examined in this paper is called the vector autoregressive time series (VAR model) with the \(\alpha\)-stable distribution. This time series can be treated as a generalization of the Gaussian VAR model, [13].

A m-dimensional time series \(\{{\mathbf {X}}(t)\}= \{(X_1(t),\ldots ,X_m(t))^T\}\) is called a VAR model with the \(\alpha\)-stable distribution, and if for each \(t \in {\mathbb {Z}}\) , it fulfills the following equation

$$\begin{aligned} {\mathbf {X}}(t)-\varTheta _1{\mathbf {X}}(t-1)-\ldots -\varTheta _p{\mathbf {X}}(t-p)={\mathbf {Z}}(t). \end{aligned}$$
(1)

In Eq. (1), the sequence \(\{{\mathbf {Z}}(t)\}=\{(Z_1(t),\ldots ,Z_m(t))^T\}\) constitutes a sample of independent m-dimensional vectors with independent components having one-dimensional \(\alpha -\)stable distribution. More precisely, for each \(t\in {\mathbb {Z}}\) \(Z_j(t)\) (\(j=1,2,\cdots ,m\)) has \(\alpha -\)stable distribution and \(Z_j(t)\) and \(Z_i(t)\) are independent for \(i\ne j\). Moreover, \(\varTheta _1,\ldots ,\varTheta _p\) are \(m\times m\) matrices with time-constant coefficients. In our paper, we consider the special case of the model (1) and assume \(p=1\). We remind the one-dimensional random variable Z has \(\alpha -\)stable distribution if its characteristic function is given by [53, 54]

$$\begin{aligned} {\mathbb{E}} \left[ \exp \{i\theta Z\} \right] = \exp \left\{ -\sigma ^\alpha |\theta |^\alpha \left( 1+i\beta w(t,\theta )\right) +i\mu \theta \right\} , \end{aligned}$$
(2)

where

$$\begin{aligned} w(\theta ,\alpha ) = \left\{ \begin{array}{ll} -sign\left( \theta \right) \tan \left( \frac{\pi \alpha }{2}\right) &{} \text {if}\, \alpha \ne 1,\\ \frac{2}{\pi }sign\left( \theta \right) \ln \left| \theta \right| &{} \text {if}\, \alpha = 1, \end{array} \right. \end{aligned}$$
(3)

and \(sign(\cdot )\) denotes a sign function. The parameter \(0<\alpha \le 2\) is the stability index which is responsible for the rate of convergence of the distribution tail. More precisely, for \(\alpha \ne 2\) , the probability tail \(P(Z>z)\sim z^{-\alpha }\) and the corresponding second moment are infinite. Thus, the smaller \(\alpha\), the convergence is slower, and in the consequence, the \(\alpha -\)stable random variables take extreme values more likely than it in the Gaussian case. The \(\alpha -\)stable distribution for \(\alpha \ne 2\) belongs to the so-called heavy-tailed class of distributions. On the other hand, for \(\alpha =2\) , it reduces to the Gaussian distribution, and in this sense, it can be considered as its generalization. The model defined as in Eq. ( 1) can be considered as the generalisation of the VAR model with Gaussian innovations. The other parameters of \(\alpha -\)stable distribution are: the scale parameter \(\sigma > 0\), the skewness parameter \(-1\le \beta \le 1\), and the shift parameter \(\mu \in {\mathbb {R}}\). For more details about the \(\alpha -\)stable distribution, we refer to [53, 55,56,57,58,59].

It should be emphasized that for \(m=1\) , the model defined in Eq. (1) is the one-dimensional autoregressive time series (AR model) with \(\alpha -\)stable innovations, [60]. For more details about the AR model and multidimensional VAR model with \(\alpha -\)stable innovations, we refer the readers to the bibliography positions [49,50,51, 60,61,62].

In the Gaussian VAR model, one of the classical methods used for the estimation of the parameters is the Yule-Walker approach that utilizes the auto-covariance function [13]. For the VAR model with non-Gaussian \(\alpha -\)stable distribution, this classical measure of dependence is infinite, and thus, in the literature, there are considered different approaches to the parameters’ estimation. One of these is the modified Yule-Walker approach that utilizes the so-called auto-covariation function, and one of the dependency measures properly defined for the \(\alpha -\)stable model. This method is described in detail in [49, 63], and in this paper, it is used for multi-dimensional VAR models as well as for one-dimensional AR time series with \(\alpha -\)stable innovations. See also [28, 64, 65].

3 Climate data description

The analyzed data are the daily measurements of the maximum temperature (Tmax), minimum temperature (Tmin), average temperature (Tavg), and total precipitation (P) for the city of Wrocław from 1961–2020. The data come from the Institute of Meteorology and Water Management (IMGW) [66]. Each variable is described by a time series of length 21915 data.

Fig. 1
figure 1

The daily measurements of maximum temperature, minimum temperature, average temperature, and total precipitation for Wrocław from the period 1961–2020

Fig. 2
figure 2

Daily maximum temperature for each month in year for the period 1961–2020

Fig. 3
figure 3

Daily minimum temperature for each month in year for the period 1961–2020

Fig. 4
figure 4

Daily mean temperature for each month in year for the period 1961–2020

Fig. 5
figure 5

Daily precipitation sum for each month in year for the period 1961–2020

Figure 1 shows the graphs of all analyzed data. As can be easily seen, all data exhibit seasonal behavior. Therefore, they were divided into individual months, and thus, 12 time series for each variable were obtained. Figures 23, , 4 and 5 show the behavior of the analyzed variables in the years 1961–2020 for individual months.

For the temperature data, an upward trend characterizing the data in individual months is visible; therefore, before the time series modelling, the long-term trend, based on a moving average averaging data from the moving 10 years, was removed. As it is known from various studies [67], this trend is related to long-term climate changes leading to global warming.

The first step of our study was to analyze the structure of the relationship between the individual time series. For this purpose, the behavior of correlation coefficients for 5-year windows sliding by one day taking into account all combinations of time series was investigated. The period of five years has been taken as long enough to detect a correlation while not being affected by the long-term trend. For the purposes of research, the Pearson, Spearman, and Kendall correlation coefficients were used, see [68]. The Person correlation coefficient is essential to study the linear relationship between two variables; however, its estimatior is sensitive to outliers. Therefore, this measure is useful, especially for the Gaussian (or light-tailed) distributed variables [69]. The Spearman correlation coefficient (called Spearman rank correlation coefficient) measures a monotonic relationship between variables. Its estimator is insensitive to large observations and thus in cases when the analyzed variables are heavy-tailed [70, 71]. The Kendall correlation coefficient (called also Kendall rank correlation coefficient) indicates not only the strength, but also the direction of the dependence. Similarly to the Spearman correlation coefficient, it is resistant to outliers and is used especially for non-Gaussian distributed data [72].

In Fig. 6, there are presented the exemplary correlation coefficients for the sliding 5-year windows for July data and in Fig. 7 for December. The coefficients were calculated for all combinations of the considered data.

Fig. 6
figure 6

The analyzed correlation coefficients for sliding five-years windows in July for each data combination

Fig. 7
figure 7

The analyzed correlation coefficients for sliding five-years windows in December for each data combination

It can be clearly seen that the Spearman and Pearson coefficients coincide for the temperature data. The Kendall coefficient is usually lower due to the way it is calculated. This indicates a linear relationship between the data. The situation is different for the relationship between the temperature and precipitation data. In this case, we observe large discrepancies between the coefficients, which indicates a nonlinear relationship and probably non-Gaussian behavior. An important observation is the fact that all considered data are significantly correlated with each other, which justifies the consideration of multivariate data modeling using VAR-type models. On the other hand, the nonlinear relationship between the data also indicates that considering a model other than Gaussian may be justified.

4 Model fitting

As a set of four time series was analyzed and the correlation coefficients between all data are significant, an attempt was made to fit the 4-dimensional VAR(1) model, with \(\upalpha\)-stable innovations, described in the previous section, to the data under consideration. The VAR(1) model was used as the optimal according to multiple literature positions analysing this type of data [2, 8, 9].

Due to the fact that in the case of the daily sum of precipitation, the VAR model based on maximum, minimum, and average temperatures cannot reflect the exact values and the methodology of forecasting precipitation has a different nature [2, 8, 9] (which does not change the fact that precipitation has a significant impact on daily temperatures), only the estimation of the model and forecasts for maximum and minimum temperature is considered. The two remaining variables will be treated as auxiliary variables that have a significant impact on the quality of the forecast

As the relationships between 4-dimensional data are not always linear, the residuals distribution of the fitted non-Gaussian 4-dimensional VAR(1) model was investigated by testing whether these distributions are Gaussian or \(\alpha\)-stable and fitting the \(\alpha -\)stable distribution. In the Tables 1 and 3, we present the values of the five statistics used for testing Gaussian and/or \(\alpha\)-stable distributions of the residuals, namely Kolmogorov-Smirnov test (T1), Kuiper test (T2), Watson test (T3), Cramer-von Mises test (T4), and Anderson-Darling test (T5), see [73]. In Tables 2 and 4, we can see the parameters of fitted to residuals series the \(\alpha\)-stable distribution. The results are presented for the minimum and maximum temperature.

Table 1 Results of the normality and stability test statistics of non-Gaussian VAR(1) residuals for daily maximum temperatures
Table 2 Fitted \(\upalpha\)-stable distribution parameters for residuals of daily maximum temperatures

In the case of daily maximum temperature, the residuals distributions of the fitted 4-dimensional non-Gaussian VAR (1) model are not Gaussian for all considered months. The results of the stability test statistics show that the residuals distribution is much closer to \(\alpha\)-stable distribution which indicates a justified use of a non-Gaussian model. In Table 2, we can observe that the residuals for maximum temperature, especially for winter months, have heavier tails than in Gaussian distribution. Additionally, for most of the analysed months, the residuals \(\beta\) parameter is significantly positive, which indicates the right skewness of this distribution. Thus, in the case of maximum temperatures, we will observe the extreme values more often, with particular emphasis on the values with high maximum temperature.

Table 3 Results of the normality and stability test statistics of non-Gaussian VAR(1) residuals for daily minimum temperatures
Table 4 Fitted \(\upalpha\)-stable distribution parameters for residuals of daily minimum temperatures

In the case of minimum temperature, the lack of Gaussianity of the residuals distributions is not as visible as in the case of maximum temperature. In some months, statistics for normality are lower than for stability. However, generally we can assume that the residuals distribution is closer to \(\alpha\)- stable than to Gaussian. As in the case of maximum temperatures, the fitted \(\alpha\) factor of \(\upalpha\)-stable distribution indicates heavier tails in winter months, and a negative \(\beta\) for almost all months indicates that special attention should be paid to the more frequent low values of minimum temperature.

4.1 Prediction

Based on the fitted model, the one-step prediction was made (one day ahead) for the minimum temperature and for the maximum temperature. The mean absolute error (MAE) measure, which is appropriate for non-Gaussian considerations, was used to measure the forecast error [74]

$$\begin{aligned} MAE=\frac{1}{n}\sum _{k=1}^n|t_{pred}-t_{obs}|. \end{aligned}$$
(4)

where \(t_{pred}\) means the predicted and \(t_{obs}\) the observed value. The mean absolute forecast error was determined for the last ten years, i.e., for the period 2010–2020. The forecast errors obtained in this way were compared with the one-dimensional model of autoregression of the first-order AR(1) with \(\alpha\)-stable innovations for both minimum and maximum temperature separately. The forecast quality results are presented in Tables 5 and 6.

Table 5 MAE of one day ahead prediction for daily maximum temperatures in case of non-Gaussian, 4-dimensional VAR(1) model and one-dimensional AR(1) model
Table 6 MAE of one day ahead prediction for daily minimum temperatures in case of non-Gaussian, 4-dimensional VAR(1) model and one-dimensional AR(1) model

The results presented in Tables 5 and 6 show that the multivariate model has an advantage over the one-dimensional model in the case of forecasting the daily maximum and minimum temperature values. However, the quality of the forecast for the minimum temperature is much better. The mean absolute forecast error for the minimum temperature for all months is about 0.5 degrees Celsius lower than for the univariate model, while for the maximum temperature, the difference is between 0.15 and 0.3. However, from the prognostic point of view, considering multivariate models, in which the correlation between the series under consideration is taken into account, is absolutely justified. In addition, it is also justified to use models with a nonlinear correlation structure and no assumption of Gaussianity.

Due to the fact that in the case of minimum temperature, the quality of the forecast is much better, and an attempt was made to estimate the future minimum temperature values for the year 2090, assuming the RCP 8.5 scenario [1, 75]. In the RCP 8.5 scenario, it is assumed that the warming of the climate related to the content of greenhouse gases will continue as before. The determined values were presented with confidence intervals and compared with the forecast values in the Klimada project [76].

The calculations were made for 10, 000 trajectories simulated on the basis of the estimated 4-dimensional VAR(1) model using the Monte Carlo methodology for each considered month. Figure 8 presents a chart showing the scenario for the year 2090. In the figure, we can see the monthly average minimum temperature obtained in the Klimada project and using the 4-dimensional VAR(1) model and the \(99\%\) and \(95\%\) confidence intervals from the VAR model.

Fig. 8
figure 8

The monthly average minimum temperature assuming RCP 8.5 scenario for 2090 according to 4-dimensional VAR(1) model and Klimada project plus \(99\%\) and \(95\%\) confidence intervals from the VAR model

It is worth noting that the forecast based on a simple VAR model, in the case of the forecast for 2090 of the average monthly minimum temperature, assuming the RCP 8.5 scenario, gives results not much different from those from the Klimada project, where the model used is certainly much more sophisticated. What is essential is the fact that the confidence intervals are shifted downwards against the mean (the distance between the lower limit of the confidence interval and the mean is in most cases greater than the distance between the mean and the upper limit of the interval). This asymmetry is related to the fact that non-Gaussian residuals are used for projection (thicker tails and left asymmetry). This affects the quality of the projection and the further information that the projection carries with it. As for the distribution of the structure of minimum temperatures, it is also worth noting that despite the asymmetry of the confidence intervals and a slightly lower average minimum temperature in the winter months, the percentage of days with a minimum temperature below \(0^{\circ }\)C in the VAR model is \(10\%\), while in the Klimada project, it is about \(12.6\%\), which indicates a certain narrowing of the confidence intervals. Projection using the VAR model shows that in the RCP 8.5 scenario, we can expect, for example, daily minimum temperatures in December even around \(16^{\circ }\)C , which clearly indicates the already observed upward trend.

5 Conclusions

In this paper, we have analyzed multidimensional climate data related to temperature (minimum, maximum, and averaged) and total precipitation. We have demonstrated the examined time series are related, and the structure of their relation changes over time. Moreover, we analysis of three different correlation coefficients clearly indicates their non-Gaussian behavior. The preliminary research was the starting point for the proposition of the stochastic model that shares similar properties as the data. More precisely, we have proposed the multidimensional VAR time series with non-Gaussian \(\alpha -\)stable innovations. This model was previously used by the authors for financial data modelling. We have shown that the proposed model more precisely reflects the nature of the data, and thus, it is justified to describe the time series by using non-Gaussian systems. Moreover, finally we have demonstrated that the model-based prediction corresponds to the RCP 8.5 scenario-based results. The proposed approach is universal and can be used in a more general case, e.g., to describe climate data corresponding to more variables. One of the main goals was to prove that the non-Gaussian multidimensional model is adequate to the analyzed data. This objective of this paper is fulfilled, and the obtained results clearly confirm this.