1 Introduction

Human influence has warmed the atmosphere, oceans and land, causing rapid and widespread changes in those systems (IPCC 2021). This influence on climate, driven by emissions of greenhouse gases, has been unprecedented in at least the last 2000 years (IPCC 2021). The global surface temperature was 1.09 [0.95 to 1.20] °C higher in 2011–2020 than in 1850–1900, with larger increases over land (1.59 [1.34 to 1.83] °C) than over the ocean (0.88 [0.68 to 1.01] °C) (IPCC 2021).

The focus of this study was to create a homogenised mean monthly air temperature series to produce monthly temperature grids and to derive climate monitoring products to assess the state of air temperature and the observed temperature change in Croatia. The country is part of the Mediterranean, where the mean temperature is expected to increase at a rate larger than the global mean temperature, particularly in summer (Lionello and Scarascia 2020). Data measured at meteorological stations are representative for describing the climate of the surrounding area of the meteorological station, but only remote sensing methods or spatial interpolation methods (SIMs) can provide estimates where there are no ground observations (Vicente-Serrano et al. 2003; Hofstra et al. 2008). As stated in Haylock et al. (2008), the main reasons for producing gridded data are (1) estimation of the values away from the observation stations, (2) the need for the optimal assessment of the areal averages and (3) validation of the regional climate models (RCMs). Although high-quality gridded climate data are available for monitoring climate and climate change in Europe, either derived from observations such as E-OBS (Haylock et al. 2008) at 6 km resolution or reanalysis such as ERA5 (Hersbach et al. 2020) at 31 km resolution and ERA5-Land (Muñoz-Sabater et al. 2021) at 9 km spatial resolution, fine-scale analysis using national observed data is often preferred.

In Li and Heap (2014), 38 SIMs applied in environmental sciences are classified as nongeostatistical, geostatistical and combined. Estimation by nearly all SIMs can be regarded as the weighted average of observations (Webster and Oliver 2007). Hijmans et al. (2005) created monthly temperature and precipitation grids using a thin-plate smoothing spline (TPS) algorithm with latitude, longitude and elevation as independent variables. The dataset is expanded with solar radiation, vapour pressure and wind speed in Fick and Hijmans (2017). Multiple linear regression (MLR) was used to produce the temperature climatology of the greater Alpine region, including Croatia, in Hiebl et al. (2009). Longitude, latitude, elevation and distance from the coast were used as the predictors in MLR. The UK monthly gridded dataset of several climatological variables is derived with multiple regression with geographic factors followed by inverse distance weighting of the residual (Perry and Hollis 2005). The Alpine precipitation dataset combined a regression approach for the interpolation of the long-term mean with the interpolation of the daily anomalies by angular distance weighting (ADW) (Frei and Schär 1998). The gridded monthly CRU dataset of several variables also combines long-term grids with monthly anomalies calculated by ADW (New et al. 2000). All these datasets are derived using combined SIMs, which mostly differ in methods for the interpolation of anomalies (IDW, TPS, ADW). Similar is regression kriging (RK) which combines regression on environmental predictors with kriging of the residuals (Hudson and Wackernagel 1994; Knotters et al. 1995; Odeh et al. 1995; Hengl et al. 2003, 2004, 2007; Meusburger et al. 2012; Brinckmann et al. 2016). The RK interpolation method is used for producing the maps of 30-year climate standard normals in the Climate Atlas of Croatia 1961–1990 and 1971–2000 (Zaninović et al. 2008; Perčec Tadić 2010).

In this study, we focus on the mapping of the mean monthly values for a particular month of each year in the period 1981–2018 (38 years). The RK prediction and normalised error maps are calculated for each month. From monthly grids, we calculated climate monitoring products such as climate standard normals for the 1981–2010 period, anomalies from the normal, spatial temperature trends and spatially aggregated anomalies on a country level.

Prior to deriving gridded data that can be used for climate monitoring and trend estimation, station data need to be checked for inhomogeneities and homogenised if needed. Homogenisation based on long meteorological time series is often regarded as the first stage in climate monitoring and climate variability and change detection (Alexandersson and Moberg 1997). Inhomogeneities in the data series can be regarded as anomalous perturbations not related to climate variability. One type of inhomogeneity is a sudden shift in the mean compared with surrounding sites. Such shifts are often related to relocations of the station, changes in the local environment, observation practices or data processing but can also be the result of digitization and database errors (WMO 2020). Reviews on the different methodologies for dealing with inhomogeneities consider the standard normal homogeneity test (SNHT) widely used and among the better performing inhomogeneity detecting methods (Peterson et al. 1998; WMO 2003; Beaulieu et al. 2008).

SNHT was used in Likso (2003), who examined homogeneity of the annual temperature series from the 1949–1998 period on ten Croatian stations revealing the breaks in all but one station (Zagreb-Maksimir). Pandžić and Likso (2010) analysed the abrupt and gradual linear change in annual mean temperatures from the 1961–2000 period at 22 Croatian stations using the modified SNHT, where breaks were detected at all but five stations. Our preliminary research (Nimac and Perčec Tadić 2017) revealed breaks in homogeneity on half of the 39 air temperature time series from the 1981–2010 period. The change in long-term seasonal and annual temperature means among homogenised and nonhomogenised series was statistically significant at only one station (Nimac and Perčec Tadić 2017). However, the changes in trend revealed that homogenisation changed the amount but also the statistical significance of the trend. Positive trends become significant, while negative ones become not significant after homogenisation. Coefficients of correlation among pairs of stations became higher and more similar due to homogenisation. In this study, the robustness of the homogenisation method was tested, while the dataset was extended to 122 stations and to a longer time period.

The study was organised as follows: in Sect. 2, we describe the main characteristics of the country and the data. In Sect. 3, we describe the clustering, homogenisation and spatial interpolation methods, the accuracy measures and the climate monitoring products to be derived. Sect. 4 presents the results of the applied methods. In Sect. 5, we present selected national climate monitoring products. The discussion and conclusions are presented in Sect. 6. The table with station metadata and a list of breaks is presented in Appendix. The charts of the time series of the original and homogenised data are provided in Online Resource 1.

2 Data

Croatia is a southeastern European country situated on the eastern coast of the Adriatic Sea with a Dinaric ridge separating its maritime from the continental part, which belongs to the Pannonian Basin (Fig. 1). This defines several climatic factors that shape the climate of Croatia: diverse orography ranging from 0 to 1700 m, position at the coast of the Adriatic Sea, longitude and latitude (Figs. 2 and 3 in Perčec Tadić 2010). The result was a large mean annual temperature span over a small country, from 4 °C in the mountains to more than 17 °C along the coast for the normal period of 1981–2010. These spatial temperature differences with the cold elevated locations, slightly warmer continental areas and even warmer Adriatic coast with the warmest southern Adriatic are to be modelled for the entire territory with the proposed spatial interpolation method. Standard deviations of the mean annual temperature are up to 0.8 °C, compared to 0.4 °C at the coast, where variability is lower due to maritime influence.

Fig. 1
figure 1

Map of Croatia and meteorological stations divided into nine regions based on the clustering algorithm: eastern continental (econ), central continental (ccon), western continental (wcon), Lika plateau (lika), mountainous (moun), Istria (istr), Adriatic hinterland (hadr), northern Adriatic (nadr) and southern Adriatic (sadr). The colour of the symbols denotes a region, while the number inside the symbol is the ID of the station

Data used in this study were monthly mean temperatures from 122 Croatian main and climatological meteorological stations from the period 1981–2018 (station positions are denoted by circles in Fig. 1). There were 27% of stations with complete data records, but the stations with missing data were also used if at least 25 years of data are available. Croatian land fits in a square of approximately 466 × 462 km2, but due to its peculiar shape, its area is just 57 750 km2. If the distribution of the stations would be regular in space, there would be one station on each 22 × 22 km2. However, the density of the meteorological stations was uneven across the country and not perfectly balanced across the elevation range (Figs. 4 and 5 in Perčec Tadić 2010).

3 Methods

The present study addresses (1) the homogenisation of monthly temperature data measured at the stations, (2) spatial interpolation to a regular 1 km grid and (3) calculation of the gridded climate monitoring products. The R framework (R Core Team 2020) was used for calculations and visualisations. The main packages used were climatol (Guijarro 2019) for homogenisation, gstat (Pebesma 2004; Gräler et al. 2016) and automap (Hiemstra et al. 2009) for spatial interpolation, raster (Hijmans 2020) for raster manipulations and calculations and spatialEco (Evans 2020) for spatial trend analysis.

3.1 Clustering

For the homogenisation method to be reliable, homogenisation must be performed on data with similar climate regimes (Mamara et al. 2013; Marcolini et al. 2019; Guijarro 2021). Ward’s hierarchical clustering or the method of minimum variance (Ward 1963) was performed to group stations according to the similarity in temperature regimes, which was shown to be a first approximation of the climate regions. We were looking for clusters of stations that have similar sequences of temperature differences for consecutive months. Similarity between sequences was measured by ordinary Euclidean distance. The method started from a single sequence in each cluster and iteratively merges sequences based on minimal distance. Ward’s algorithm assures that intragroup similarity is maximised while intergroup similarity is minimised. The clustering results with a selected number of clusters that are expected to contain stations with the most similar temperature series and hence are suitable for further homogenisation inside the clusters.

3.2 Homogenisation

Homogenisation can be performed by comparing each candidate series with another station series or with a composite reference series. Among several methods used in the large MULTITEST project, the ACMANT method provided the most accurate homogenisation of individual time series, resulting in the reconstruction of local climate variability (Domonkos et al. 2021; Domonkos 2021). This method was closely followed by climatol. Both methods use composite reference series, while pairwise methods using a single reference series seem to be more efficient for the detection of the mean climatic trends over large geographical regions (Coll et al. 2020; Domonkos et al. 2021; Domonkos 2021). Among the homogeneity tests, the SNHT has been used quite extensively (Venema et al. 2012; Mamara et al. 2013; Kolendowicz et al. 2019; WMO 2020; Coll et al. 2020). In the R package climatol used in this paper, the SNHT was applied to the differences between candidate and composite reference series, both in the normalised form, to reconstruct local climate variability (Guijarro 20192021). The normalisation was calculated by subtracting the mean and dividing it by the standard deviation over the entire data series of 456 months (38 years). In the first homogenisation stage, the SNHT was applied iteratively on overlapping windows of 120 terms sliding forward by 60 terms, which is one way of detecting multiple breakpoints in one series. In the second stage, the test was applied to the whole series. In the process, the series were split into two fragments at the position where the test values were above the SNHT threshold. Thresholds for the SNHT statistic T were denoted by SNHT1 for the first stage and SNHT2 for the second stage (Guijarro 2019). For our sample size, the listed critical values of the SNHT statistic T for a 95% confidence level would be 9.33 for a first stage over 120 terms and 10.272 for a second stage over 450 terms (closest to our samples’ size of 456 months), according to Khaliq and Ouarda (2007). These are theoretical thresholds simulated from large sets of random normal numbers with up to 50,000 members by Khaliq and Ouarda (2007), similar to Alexandersson and Moberg’s (1997) thresholds for series up to 200 members. However, the range of the estimated SNHT T values in our real-world dataset was much larger, presumably due to deviation from the theoretical randomness (Mamara et al. 2013), with values comparable to those suggested in Meseguer-Ruiz et al. (2018) and Curci et al. (2021). Consequently, we chose to test several critical T values that were higher than theoretical, where higher values correspond to less strict break detection criteria. There were several options for selecting the final homogenised series, and we selected the series with the original data being retained at the end of the series, as in Marcolini et al. (2019) and Meseguer-Ruiz et al. (2018).

3.3 Regression kriging

Regression kriging (RK) is a combined SIM (Li and Heap 2014) that accounts for large-scale spatial variability through the dependence of the target variable on the spatial predictors. Additionally, it accounts for the local, or small-scale, part of the remaining variability through the model of spatial autocorrelation. The procedure can be described in four steps: (1) assess the relationship of the dependent variable (monthly temperature) with climatic factors of altitude, distance to the sea, latitude and longitude (predictors) through multiple linear regression (MLR), (2) establish the variogram model of spatial autocorrelation of residuals (\(\widehat{e}\)) where residuals are the differences between observed values and the values predicted by MLR, (3) evaluate the accuracy of predictions with leave-one-out cross-validation on station locations, and (4) calculate prediction maps and normalised error maps (Hengl et al. 2004; Perčec Tadić 2010). In the process, only the significant predictors in the MLR model were retained based on a t test applied to the regression coefficients under a significance level of 0.05. The remaining residuals were interpolated by kriging.

The sample variogram was calculated from residuals for regular distance intervals (bins) \(\left[{h}_{j},{h}_{j}+\delta \right]\) by the semivariance:

$$\widehat{\gamma }({\overline{\mathrm{h}} }_{j})=\frac{1}{2{N}_{j}}{\sum }_{i=1}^{{N}_{j}}{(\widehat{e}({s}_{i})-\widehat{e}({s}_{i}+h))}^{2}, \forall ({s}_{i},{s}_{i}+h ):h\in \left[{h}_{j},{h}_{j}+\delta \right].$$
(1)

Here, the spatial locations of stations were denoted by \({s}_{i}\), where \({N}_{j}\) is the number of station pairs separated by a distance \(h\) belonging to interval \(\left[{h}_{j},{h}_{j}+\delta \right]\), and \({\overline{h} }_{j}\) is the average of all Nj h′s (Pebesma 2014) .

Four variogram models were tested by default while fitting the variogram model to the sample variogram: spherical, exponential, Gaussian and Matern. The initial variogram parameters, nugget, partial sill and range, were estimated from the data automatically. The fitting of the variogram model was controlled by adjusting the minimum number of points in a bin, min.np.bin (Hiemstra et al. 2009). The default value of five points was too low for our spatial data configuration. In many months, the first to the third bin in the sample variogram would produce too jittery semivariance, so the optimal value was selected as one that minimises the RMSE of the mean temperature through leave-one-out cross-validation (loocv). For interpolation of the residuals that remain after MLR, we adopted an option of kriging in a local neighbourhood, while point kriging and kriging of the block mean values are also available (Pebesma 2004; Hiemstra et al. 2009; Gräler et al. 2016). The default setting of using all observations for the prediction at each grid cell is not optimal for a peculiarly shaped and climatologically diverse country such as Croatia; hence, we carefully balance the number of the nearest observations, nmax, and the maximum distance from the prediction location to the observations, maxdist, to avoid the stations from a climatologically different region. The optimal values were selected through loocv as for the optimal min.np.bin.

3.4 Accuracy assessment

The performance of the multiple linear regression model is evaluated through the coefficient of determination R2 adjusted for the number of predictors (Draper and Smith 1998). The RK performance at the prediction locations was again assessed through the loocv procedure. The mean error (ME) and root-mean-square error (RMSE) were calculated at prediction locations:

$$ME=\frac{1}{l}{\sum }_{j=1}^{l}\left[z\left({s}_{j}\right)-\widehat{z}\left({s}_{j}\right)\right]$$
(2)
$$RMSE=\sqrt{\frac{1}{l}{\sum }_{j=1}^{l}{\left[z\left({s}_{j}\right)-\widehat{z}\left({s}_{j}\right)\right]}^{2}}$$
(3)

where the observed temperatures at spatial locations \({s}_{j}\) were denoted by \(z\left({s}_{j}\right)\) and the loocv predictions by \(\widehat{z}\left({s}_{j}\right),\) while their differences were called residuals. The standard deviation of the observations is denoted by \({\sigma }_{z}\). Comparison of the RMSE across the months is performed through RMSE normalised by the standard deviation of observations:

$$RMSEr=\frac{RMSE}{{\sigma }_{z}}.$$
(4)

As a rule of thumb, we consider that prediction is accurate if RMSEr < 0.4 (Hengl et al. 2004), in which case the accuracy defined by

$$accuracy=1-{RMSEr}^{2}$$
(5)

is above 84%. The normalised kriging error map is calculated as the square root of the prediction variance map divided by the standard deviation of the prediction variable (Burrough et al. 1998; Hengl 2009; Perčec Tadić 2010). A normalised prediction error lower than 0.4 is considered a satisfactory prediction since in this case, the model predicts more than 84% of the total variation. Otherwise, if this error was above 0.8, the model accounts for less than 36% of the variation (Hengl 2009). Those errors also propagate and affect the derived climate monitoring products. On the stations’ locations, the effect of the errors can be noticed as differences between measured and interpolated values.

3.5 Climate monitoring products

Climate standard normals (monthly means) and trends are calculated from the homogenised station data. Gridded climate standard normals were calculated from the monthly grids as a mean over the period 1981–2010, from which the grids of monthly anomalies are calculated by subtracting the monthly climate standard normal grid from each monthly grid. The derived spatial means of monthly anomalies aggregated over the country were among the most important national climate monitoring products (WMO 2017a).

Temperature trends and their significance were calculated from the monthly grids over the period 1981–2018. Theil-Sen slope was calculated to estimate the trend over 10 years, and the Mann-Kendal test was used to assess its significance (under the significance level 0.05). The nonparametric Pettitt rank test (Verstraeten et al. 2006) was applied to temperature anomalies to estimate a year when a change in the mean temperature appears over Croatia. The areas on the maps where the trends were not significant were displayed using 50% transparency. The normals and trends calculated from the homogenised station series were presented on the corresponding gridded maps.

4 Results

4.1 Clustering

Hierarchical clustering was performed to group the stations according to similarity in temperature regimes, which was a prerequisite for further homogenisation. Starting from the smallest possible number of clusters, i.e., two, we can distinguish the maritime and continental temperature regimes (not shown). Going down the dendrogram tree, more details about the climate characteristics appear. With three clusters, the continental cluster was divided between continental lowland and the mountainous region, leaving the maritime cluster unaffected. With four clusters, differences between the northern and southern Adriatic data regimes appear in maritime clusters. With five clusters, we can see that most of the eastern part of the continental lowland (econ) was different from the rest of the central and western continental regimes. With six clusters, the southern Adriatic cluster was split into the hinterland (hadr) and coast (sadr). With seven clusters, the mountain region was split into the Lika Plateau (lika) and higher mountain tops (moun). Eight clusters revealed that the Istria Peninsula (istr) was distinct from the rest of the northern Adriatic stations (nadr), and finally, nine clusters reveal a small distinction between the western (wcon) and central-northern continental (ccon) lowland (Fig. 1). Even though the temperature series of the last two clusters were similar, we decided to separate them to have a more balanced number of stations across the clusters. We did not proceed with a larger number of clusters since for ten clusters, some microlocation characteristics emerged on southern Adriatic islands that were too local to be investigated in this research. After the nine regions with similar temperature regimes were determined, homogenisation was performed for each region separately. We note that the stations in each cluster were also grouped geographically, although geographical locations were not explicitly included in the algorithm. A notable exception was the region moun covering the mountain tops, which also includes station Puntijarka (ID = 25, h = 988), which geographically belongs to the wcon region but is nevertheless characterised as a mountain station. This showed that the clusters were also well aligned with the orography. Consequently, the obtained clusters can be recognised as a first approximation of the climate regions.

Coefficients of correlation over the pairs of nearest stations in the same region are above 0.9, supporting the similarity inside the clusters and gradually decrease with the distance between the stations.

4.2 Homogenisation

Due to the limitations of the SNHT thresholds derived from synthetic series, we tested the robustness of the test by choosing six combinations (denoted as test1 to test6 in Table 1) of the SNHT thresholds for the two stages of homogenisation. It appeared that this affects the number of detected breaks of homogeneity. Test1 with the highest SNHT threshold allowed for more variation between the candidate and the reference series and hence the smallest number of breaks. In that case, 70 breaks were detected in all regions. Test6, with the lowest SNHT threshold, was the strictest in break detection, with 288 breaks suggested over the regions. Setting the SNHT threshold as in test6 would result in a higher level of homogeneity and smoother spatial temperature fields.

Table 1 Comparison of the number of detected homogeneity breaks depending on the SNHT statistic T for the nine regions (columns from econ to sadr) and the total sum of breaks over all nine regions (sum) for the six tests performed. In the SNHT column, the first number is SNHT1, and the second is SNHT2

Without quality metadata, there is no objective decision on which SNHT threshold combination to choose. However, homogenisation makes the temperature trends more uniform across the region (Nimac and Perčec Tadić 2017), so we compared the trends for datasets where only missing data have been interpolated (OR stands for original in Fig. 2) with trends for test1 to test6, which are expected to be changed by homogenisation. We showed the comparison for test1 (HO stands for homogenised in Fig. 2), which already reduced the largest isolated negative trends in dataset test0 (OR in Fig. 2) in September and October, including one in September that was marked as significant (Station 141). Most positive isolated significant trends in econ and ccon in May were reduced to nonsignificant ones. The same happened in wcon in March and in lika, istr, hadr and sadr in January. Reduced were also isolated positive significant trends in hadr in February and in hadr and sadr in October. Nevertheless, several negative trends, although not significant at the 0.05 level, remain during September, October and December. However, those were the months with the smallest positive and sometimes negative temperature trends compared to other months in the entire Croatian territory. Using more strict SNHT thresholds (test2 to test6) makes trends’ significance more spatially homogenous in some but not in all regions, so test1 SNHT thresholds are used further in the analysis for break detection and homogenisation.

Fig. 2
figure 2

Trends (colour) and significance at the 0.05 level (symbols) in mean monthly and annual temperatures for the stations in continental and mountainous (a) and coastal (b) regions. On the OR (original data) panels, the point symbol denotes whether the trend in the original data is significant. On the HO (homogenised) panels, the symbols also denote the change of the significance of the trends between OR and HO datasets: symbol sign denotes the significant trend in both datasets, gain shows that trend changed from nonsignificant to significant and loss if the opposite happened. For nonsignificant trends, no symbol is used

The inventory of detected breaks for all the stations is presented in Appendix together with the percentages of the original data, missing data that were interpolated and the data created during homogenisation. The breaks were detected in 54 out of 122 data series. Altogether, 70 breaks were identified. There were 41 stations with one identified break, 10 stations with two breaks and three stations with three identified breaks.

It was preferable that stations with many breaks and/or with a substantial percentage of altered original data were not clustered in space. Indeed, Fig. 3 shows a balanced spatial distribution of high-quality stations with no breaks (circles) and a small percentage of interpolated missing data (blue symbols). The less favourable situation can be noticed in ccon, wcon, sadr and hadr regions where there are clusters of stations with more than 40% of altered data (yellow, orange or red symbols in Fig. 3).

Fig. 3
figure 3

Spatial distribution of stations according to the number of breaks (n.breaks denoted by symbols’ shape) and according to the percentage of interpolated and/or homogenised data (INHO, denoted by symbols’ colour and label text). The map in the background shows the orography (DEM)

A comparison of the monthly percentage of the original, homogenised or interpolated (missing) data is presented in Fig. 4. The percentage of missing data in each month shows that from 1981 to July 1991, approximately 10–15% of the overall monthly data from the 122 stations were missing. These data were interpolated. In 1991, the war in Croatia started, and then the largest drop in the percentage of available data occurred, with 30% of stations missing in September 1991 and 35% missing in December 1991 and January 1992. Around July 1994, the percentage of missing data was approximately 20%, dropping to 10% in 1997 and to 5% around October 2000. Until 2015, this was the observation period with the most data available. After that, the percentage of missing data slightly increased due to four stations that were closed in 2014 and 2015.

Fig. 4
figure 4

The monthly percentage of original (OR), homogenised (HO) and interpolated (IN) data

The decision to keep, for climate monitoring purposes, the homogenised series that have the original data at the end of the period was reflected in the monthly share of the data altered by the homogenisation. The percentage of data was larger at the beginning of the period, from 1981 until July 1991, when approximately 30–40% of stations were homogenised in a month. Until 2000, approximately 20–30% of the monthly data were homogenised, dropping to 10–20% from 2000 until 2006. After that, the percentage of the homogenised data was below 10% in a month (Fig. 4).

A detailed comparison of the original and homogenised series for each station is provided in Online Resource 1. Those charts were also used to visually check if there are some large alterations of the data after homogenisation. Without available digitized metadata of dates of relocation, change of the observer or change in the environment to assist break detection, it is advisable to try to confirm at least the largest proposed changes by consulting published breaks or the available paper metadata reports, which is discussed in Sect. 6.

The differences between the original and homogenised series were used to calculate the ME and RMSE at the station and regional levels to assess the effects of homogenisation. The substantial portion of the data without any adjustment is disregarded from ME and RMSE calculations since they would mask the overall size of the adjustments. The ME calculated from differences of HO and OR data (adjustment) was mostly negative with a median of -0.5–0.0 °C (lower HO than OR, mostly in the beginning of the period) except for econ and lika regions where a median of ME was 0.3 °C and istr with 0.4 °C (Fig. 5a). Median RMSEs were 0.2–0.5 °C except for hadr with 0.72 °C (Fig. 5b).

Fig. 5
figure 5

The mean error (a) and root mean square error (b) calculated from differences in measured and homogenised data for the regions. Only values that are changed by homogenisation are used in the calculation of ME and RMSE

4.3 Regression kriging

Due to the differences in temperature regimes across climatological regions, we preferred to develop kriging in a local neighbourhood. Therefore, we needed to select the optimum maximum distance (maxdist) and the number of nearest neighbours (nmax) to be used for kriging in each grid cell. Additionally, for the variogram estimation, the minimum number of point pairs over the bins of the distances was optimised (min.np.bin). Possible values need to consider the approximate size of the region that is a square of 400 × 400 km, the peculiar shape of the country, the number of stations available in each grid point (Fig. 6) and the assumption that based on climatic differences, we prefer not to use stations from the coast to predict temperature in continental Croatia and vice versa (Zaninović et al. 2008).

Fig. 6
figure 6

The number of neighbouring stations available for interpolation in each grid cell for a search radius of 100 km

We determined the number of neighbouring stations for various search radii between 20 and 400 km with a 20 km step. For the example of a search radius of 100 km, there are more than 40 stations available for local interpolation in the western continental region but not more than 10 over peripheral parts of the domain (Fig. 6). This was to be considered when optimising the parameters for kriging in a local neighbourhood.

Optimisation of the RK parameters was conducted for a minimum number of points in a bin for variogram estimation (min.np.bin) and for the maximum number of points (nmax) and maximum distances to the points (maxdist) to be used for kriging predictions. The parameters tested were \(min.np.bin \in \left\{40, 50, 60\}, maxdist \in \left\{80, 120, 160,.., 400\}x1000 m\right.\right.\) and \(nmax \in \left\{5, 15, 20,..,120, 122\}\right.\), giving a total of 351 combinations for each month. The performance was evaluated with loocv RMSE, and all tree min.np.bin values were shown to be equally good (not shown). When setting nmax = 5 or maxdist = 80 km, the RMSEs were above 10 °C, so those were excluded from further consideration. The RMSEs averaged over the locations and min.np.bin for each of the remaining combinations of nmax and maxdist were lower than 0.9 °C (Fig. 7a and b). The selection of the optimal maxdist, among those longer than 120 km, is no longer as sensitive (Fig. 7a). The optimal nmax was found to be 45 or 55 (Fig. 7a and b). Setting it higher than 55 will not affect RMSE too much, while setting it lower than 45 will increase RMSE by approximately 0.2 °C. Finally, 34 out of 351 combinations were selected as the best combinations for all 456 months based on the lowest RMSE. The combination of the parameters min.np.bin = 40, nmax = 45 and maxdist = 160 km (40_045_160) was selected as the best in 28.9% months, followed by 40_055_160, which was the best in 18.9% of months. The RMSEs were slightly higher in winter months and in November.

Fig. 7
figure 7

Loocv RMSE for the optimisation of the RK parameters. Change in monthly RMSE over nmax for different maxdist (a). Change in monthly RMSE averaged over maxdist (b). Colour coding is the same for figures a and b and refers to months

This decrease in the final number of optimal combinations to 34 suggests a possible simplification. Namely, one can choose the best combination of the parameters for each calendar month (12 models) or even a single combination for all months in the dataset. The impact of these simplifications on the loocv RMSE is shown in Fig. 8. Overall, the medians of RMSEs were the lowest from March to July (0.5–0.6 °C), reaching 0.8 °C in Jan and Dec, with bulk values below 0.9 °C (Fig. 8). It is seen that the simplification increased the RMSE by less than 0.05 °C. Based on this, we selected a single set of parameters (40_045_160) to be used for kriging in a local neighbourhood for all months.

Fig. 8
figure 8

Comparison of the monthly loocv RMSE for three sets of experiments: the best RK parameter combination for each month (best), a single combination for each calendar month (month.single) and a single combination for all months (single)

The next step was to select the variogram model. The optimal variogram model for each month is selected among four models and fitted to an experimental variogram. Matern (Ste) is the best model for 63% of the months, followed by Spherical (Sph) in 26% of the months and Gaussian (Gau) in 10% of the months, while the Exponential (Exp) model is selected as the best one only once. The nugget values were 0.0–1.5 °C, the partial sill was 0.0–1.1 °C, and the variogram range was 10–300 km. The nugget was larger than 1.5 °C for cases of 12/1998, 11/2011, 12/2015 and 12/2016, the partial sill was larger than 2 °C for 4/1995, and there were two infinite ranges for 4/1995 and 3/2000. They were used as such for the gridding since they did not blow up the loocv RMSE.

The median of the monthly adjusted R2 is above 0.9 in all months, suggesting that the high proportion of the spatial variability is explained by the climatic factors through MLR (Fig. 9a). The accuracy of RK (Eq. 4 and 5) based on the loocv RMSE was above 0.84 throughout the year (Fig. 9b) and was slightly higher in the cold part of the year. Recalling Eq. 5 and that the accuracy is higher if the RMSE was lower, we would expect a higher accuracy in the warm part of the year (recall Fig. 8). However, since the accuracy was calculated from RMSE normalised by the variance of the data (Eq. 5), a higher variance of the data reduces the RMSE (Eq. 4) and increased the accuracy, leading to higher accuracy during colder months (Fig. 9b).

Fig. 9
figure 9

Adjusted R2 of the MLR Model (a) and loocv accuracy for RK (b)

Final regression kriging prediction and normalised error maps were presented for the two selected months, a winter month and a summer month, which had the largest warm anomalies in the whole dataset (Fig. 10). The regression kriging predictions (maps in Fig. 10a and c) were in line with the observations (circles in Fig. 10a and c). Somewhat higher normalised errors (Fig. 10b and d) were found over isolated mountains in eastern continental Croatia and over the Adriatic coast. We suspected that prediction accuracy can be improved only by additional measurements.

Fig. 10
figure 10

Prediction (a, b) and normalised errors (c, d) for the months with the most pronounced warm temperature anomalies in winter (January 2007, a and c) and summer (June 2003, b and d)

5 Climate monitoring products

In this section, we present and discuss several climate monitoring products that can be operationally derived from homogenised station data as well from a series of monthly maps. Some of the most important national climate monitoring products are the monthly and annual climate standard normals (WMO 2017b). Here, we highlighted the main seasonal and regional temperature characteristics deduced from the homogenised stations’ normals for the 1981–2010 period (Fig. 11).

Fig. 11
figure 11

Annual course of homogenised monthly temperature data. Stations are grouped according to regions. Continental and mountainous regions are on the left, and maritime regions are on the right. Months are differentiated by the colour of the symbols

Winter: In continental Croatia (econ, ccon and wcon), the coldest winter month is January (Fig. 11). Then, follow slightly less cold December and February. In lika and hadr, January was the coldest, while December and February temperatures were almost the same. Over the moun region, January and February are equally cold. The exception was the highest mountain station (Zavižan, ID = 41), where the coldest is February, followed by January and December. Near and on the coast, in istr, nadr and sadr, the coldest months are January, February and December in that order, with January and February temperatures being almost the same at many stations.

Spring and autumn: The spring/autumn pairs of months: Mar/Nov, Apr/Oct, May/Sep have very similar temperatures in continental Croatia, with spring months being slightly warmer than their autumn pairs primarily due to longer days. The exception was Ogulin (ID = 18, alt = 328 m), with monthly temperatures closer to the continental regime but with warmer autumn than spring months, which are the characteristics of the intermittent lika and moun region. This is the result of its position on the border of those regions. In intermittent lika and moun regions, the differences between spring/autumn months are larger compared to continental regions and in favour of warmer autumn pairs (showing already present maritime influence). In maritime Croatia (istr, hadr, nadr, sadr), the autumn is even warmer than spring, compared to lika and moun regions. In spring, the land is getting warmer, but the colder sea is slowing down the heating of the air, while in autumn, the sea is acting as a heat container that makes the autumn cooling slower and hence autumn months warmer than their spring pairs at the coast.

Summer: In all the regions, the warmest summer months are July, August and June in that order. In continental Croatia, the differences in August and July temperatures are less distinct than the differences from June temperatures to those two. In maritime Croatia, the August and July temperatures are almost the same, while the June temperature is 1–2° colder. This change from the continental to maritime temperature regimes of the warmest months can be noticed in lika and moun Croatia by widening the difference in the June temperature from more similar July and August.

There were several stations with special local characteristics. Station Zagreb-Grič (ID = 38, alt = 157 m) was warmer than other close city stations, showing the urban island temperature effect. Coastal Istrian stations (IDs 24, 102, 117, 120, 122) were warmer than inland ones. The highest station, Zavižan (ID = 41, alt = 1594 m), was already mentioned as the station with the coldest temperatures in Feb. Surprisingly, station Split-Marjan (ID = 32, alt = 122 m), which was geographically on the southern Adriatic coast, was classified in the neighbouring hadr region. We suspect that it could be the influence of the microlocation in elevated city forest, which makes its temperature regime closer to the higher elevation stations of the hinterland (hadr).

Next, we present the grids of monthly temperature climate standard normals for 1981–2010 together with the station data (Fig. 12). The same colour bar for all months allows for insight into the annual course of the mean temperature. The agreement between interpolated and measured values on station locations can be visually confirmed, even though the maps of the normals are aggregated from the time series of monthly grids over the period 1981–2010 and not directly interpolated from the station normals. This means that the interpolation method can reproduce aggregated values such as the means.

Fig. 12
figure 12

Maps of mean monthly temperatures for the period 1981–2010. Each map also shows homogenised station normals

The monthly temperature differences between climatological regions based on station normals (Fig. 11) are also confirmed over the grids. For example, subtracting February from the January grid confirms that January was colder than February in continental Croatia (Fig. 13a). Approaching the coast, the mean temperature of those two months was becoming more similar. The spring months have similar or slightly warmer temperatures than the autumn months over continental Croatia, which was slightly more pronounced in March, which was up to 1 °C warmer than November (Fig. 13b). However, in the intermittent Lika and mountain region and towards the coast, those differences start to change in favour of warmer autumn months due to the warming effect of the sea. This is pronounced the most in the case of the Apr/Oct pair (Fig. 13c), where October can be 2–3 °C warmer than April on the coast, while over the continent, the differences were smaller. The least pronounced temperature difference was for May/September, where May was slightly warmer in the continent and September at the coast (Fig. 13d). The two warmest months, July and August, have quite similar temperatures at the coast, while July was slightly warmer than August on the continent. This analysis also confirmed the success of the interpolation method to reproduce the spatiotemporal characteristics of the annual cycle of mean monthly temperatures deduced from station data (Fig. 11).

Fig. 13
figure 13

Differences between mean monthly temperatures for the period 1981–2010. Shown are differences between the two coldest months, Jan-Feb (a), and then differences for the pairs of spring and autumn months with similar temperatures: Mar-Nov (b), Apr-Oct (c) and May-Sep (d). The last map shows the differences between the two warmest months, Jul-Aug (e)

One of the most often used climate monitoring charts is the time evolution of the temperature anomalies calculated from grids and averaged over the regions (Fig. 14). Positive monthly anomalies (red in Fig. 14) were more frequent and lasted over several consecutive months after approximately 2000. The Pettitt test applied to monthly anomalies detected a significant change in the mean in 2000 and an increase of 1 °C in the mean temperature over the later period, indicating the climate change.

Fig. 14
figure 14

Time evolution of the monthly temperature anomalies [°C] over the 1981–2018 period averaged for the country. The anomalies were calculated from monthly grids by subtracting the gridded monthly normals for 1981–2010

Temperature trends based on homogenised monthly temperatures from weather stations give us more detailed insight into this climate change (point symbols in Fig. 15). Positive but nonsignificant warming was observed in January, February and March at most stations. The trend was significant and positive at a few stations in the nadr in January (0.5–0.6 °C/decade), at 15 coastal and Adriatic hinterland stations in February (0.5–0.8 °C/decade) and at a few coastal stations in March (0.3–0.5 °C/decade). Positive and significant trends were observed in April (0.4–0.9 °C/decade). In May, all stations experienced positive trends, but they were significant at only several stations, mainly in the wcon, nadr and sadr regions (0.4–0.7 °C/decade). In June, all stations had positive significant trends (0.5–1.0 °C/decade). In July and August, most of the stations experienced positive significant trends (0.4–0.9 °C/decade), and only a few had positive nonsignificant trends. In September and October, all stations had nonsignificant weak trends. Almost all stations had significant positive trends in November (0.5–1.0 °C/decade). In December, all stations had nonsignificant weak trends.

The overall annual temperature trend was positive and significant at all stations (0.3–0.7 °C/decade).

Furthermore, trends for each calendar month were calculated from the 38 monthly maps from the 1981–2018 period and presented in Fig. 15 together with the station data. Positive but not always significant trends prevail. Significant strong warming was observed and mapped over the entire Croatian territory in April, June, July, August and November. In April, a significant trend of 0.5–0.7 °C/decade was present in econ and ccon and along the coast and was even stronger in wcon and over parts of the moun region, where it reaches 0.7–0.9 °C/decade. In June, the strongest warming was experienced in wcon and over parts of the moun, nadr and istr regions, 0.8–1.0 °C/decade. Econ and ccon experienced June trends of mostly 0.7–0.8 °C/decade. The June trend was 0.5–0.7 °C/decade in continental and southern mountains and in lika and 0.7–0.8 °C/decade over the rest of the territory. July was warming by 0.4–0.6 °C/decade along the coast, up to 0.6–0.8 °C/decade inland. August trends were 0.4–0.6 °C/decade in lika, 0.5–0.7 °C/decade along the coast, 0.6–0.8 in moun, wcon and ccon and up to 0.9 °C/decade in econ region. The July and August maps look similar, but there was noticeably stronger warming during August in the continental regions. The strongest significant trends are measured and mapped in November. The strongest trends were around the capital of Zagreb, reaching 1.0–1.1 °C/decade. The rest of the wcon was experiencing a trend of 0.9–1.0 °C/decade. The ccon and econ were warming by 0.8–0.9 °C/decade, slightly slower over eastern continental mountains. The main mountain Dinaric region that separates the continental part of the country from the Adriatic was warming by 0.7–0.9 °C/decade, while the coast was warming by 0.5–0.7 °C/decade. In May, only the western continental region and smaller parts along the coast experienced a significant trend of 0.3–0.5°/decade. The measured amount, sign and significance of trends estimated at meteorological stations (point symbols in Fig. 15) were accurately captured on the trend maps calculated from the monthly maps.

Fig. 15
figure 15

Monthly temperature trends for the period 1981–2018. The significant trends are presented in full colour (*T trend), while nonsignificant trends are shown with 50% transparency (T trend). The trends from station data are presented as point symbols for each station. The statistically significant trends are circled with black and the nonsignificant with green colour. Months with dominantly significant trends are marked with an asterisk (*)

Finally, the mean annual temperature for the normal period 1981–2010 was calculated from the monthly normals (Fig. 16a); then, annual anomalies were calculated as differences of each annual grid and the annual normal (Fig. 16b) and spatially aggregated over Croatia, while the annual trend was calculated from the 38 mean annual temperature grids (Fig. 16c). Each annual grid was calculated as the mean from the corresponding monthly grids. The annual temperature climate standard normal for 1981–2010 in Croatia ranges from 1 °C on the highest Croatian mountain Dinara to 17 °C in the southern Adriatic (Fig. 16a). Annual anomalies were dominantly positive from 1999, up to 1.4 °C (Fig. 16b). After 1999, only 2004, 2005 and 2010 were colder than average, with 2005 being noticeably colder (for 0.7 °C). The Pettitt test applied to annual anomalies detected a significant change in 1998, with the same amount of 1 °C of the temperature increase as calculated from the monthly anomalies. Annual temperature trends were the highest in the capital of Zagreb and its surroundings, above 0.6 °C/decade (Fig. 16c). The rest of this western continental region, the large part of the northern mountainous region, the most eastern continental region and the farthest southern Dalmatia experience strong temperature trends of 0.5–0.6 °C/decade. The central continental region, Istria, Lika, the southern mountainous region and most of the Adriatic coast exhibit a trend of 0.4–0.5 °C/decade. There were regions in the Adriatic hinterland and southern Adriatic with trends below 0.4 °C/decade.

Fig. 16
figure 16

Mean annual temperature for the period 1981–2010 calculated from the mean monthly grids (a). Temporal evolution of the temperature anomalies from 1981–2010 normal, spatially averaged over the country (b). Annual temperature trend for the period 1981–2018 calculated from the annual grids (c)

6 Discussion and conclusions

This study presents the most extensive set of monthly homogenised station data and the first set of monthly grids at a 1 km spatial resolution suitable for climate monitoring in Croatia. Well-known homogenisation and interpolation methods have been applied in the study, with the extensive analysis and optimisation of the parameters for both methods.

Homogenisation could not be supported by the digitised metadata, so conservative SNHT critical levels were used to avoid oversmoothing the observed monthly temperatures. They were higher than those suggested in Khaliq and Ouarda (2007), who derived the critical SNHTs from synthetic white noise series. Climatological anomalies will show some degree of station autocorrelation, spatial cross-correlation and local or general trends depending on the type of climatic variable, its spatial variability, the density of the observing network and the temporal scale of the data (Mamara et al. 2013; Guijarro 2021; Willett et al. 2014). That is why the default SNHT thresholds in the climatol are higher than those obtained in Khaliq and Ouarda (2007). We compared the climatol visualisation outputs, breaks and derived trends for several SNHT1 and SNHT2 combinations, resulting in higher than default SNHT thresholds, similar to Curci et al. (2021) and Meseguer-Ruiz et al. (2018). Compared to our preliminary homogeneity study (Nimac and Perčec Tadić 2017), approximately 40% of the breaks identified there were in the same year or even month as the breaks detected in this study. We were able to compare the proposed breaks of homogeneity (Appendix) with the breaks and paper metadata for 22 stations documented in Pandžić and Likso (2010). We found that most of the abrupt breaks are successfully detected in both studies, while the differences could be due to a more general version of SNHT in Pandžić and Likso (2010), which detects artificial linear trends in temperature series in addition to abrupt breaks. The remaining differences can be due to different periods and SNHT critical values and annual time scales in Pandžić and Likso (2010) compared to monthly values in the present study.

Regarding homogenisation, negative adjustments over the climate regions agree with the negative adjustment reported for E-OBS (Squintu et al. 2019). However, the sizes of the adjustments with median values of -0.5–0.0 °C were larger than in Squintu et al. (2019) who assessed the statistical performance of the homogenisation using all the data. In our research, ME and RMSE (Fig. 5) were calculated only from the data corrected due to homogenisation resulting in larger, but we believe more objective estimates of the adjustments.

Monthly grids were accompanied by normalised error maps that provide an estimate of accuracy. Parts of the maps with normalised errors above 0.4 were considered less accurate, which was observed in isolated elevated regions without observations available.

Maps of monthly and annual climate standard normals and trends were calculated from the time series of monthly grids. Another approach to calculate such a product would be an interpolation of the station values of the normal or trend. However, with interpolated monthly values of the temperature, we were able to reproduce the normals and trends of the monthly and annual temperatures comparable to those calculated from station data. The interpolation procedure reproduced the spatial characteristics of colder January than February over the continental regions, warmer spring than autumn months in the continental regions, warmer autumn than spring months in the intermittent and coastal regions and warmer July than August inland (Fig. 13), the characteristics that were also observed from the station data (Fig. 11).

The months with significant positive trends in temperature, that is, April, June, July, August and November, experience stronger trends over continental and mountainous regions compared to the coast (Fig. 15). The established land/sea warming ratio of 1.3–1.4 over Croatia follows the global land/sea warming ratio (Sutton et al. 2007), which is in the range of 1.36–1.84, independent of global mean temperature change. Intuitive and older theories would explain this difference by the smaller heat capacity of soil, which as a consequence heats more rapidly. Manabe et al. (1991) questioned this and pointed out that over the oceans, air can cool down by evaporation at the sea surface, while over the land, the moisture content was constrained, so the warming becomes stronger. In dry air, more sensible heat warms the air, slowing cloud formation and enhancing further drying and consequently more heating. More recent theories would rely more on moisture and energy balance examined by several climate models (Sutton et al. 2007). Joshi et al. (2008) examined a large ensemble of climate model integrations and showed that the land/sea contrast in equilibrium but also in transient radiation simulations was associated with local feedbacks such as different changes in the temperature lapse rate over land and sea and the change in a hydrological cycle over land, rather than with externally imposed radiative forcing. The context for interpreting equilibrium climate sensitivity (ECT) and transient climate response (TCR) from the CMIP6 Earth system models is presented in detail in Meehl et al. (2020).

Elevation-dependent warming (EDW) is another observed feature of global climate change, and there is evidence for it over the Alps, Rockies, and the Tibetan Plateau, all of which are in the mid-latitudes (Miller et al. 2021). Pepin et al. (2015) concluded that there is evidence that many, but not all, mountain ranges show enhanced warming with elevation. Higher elevation stations in Croatia (between 750 and 1594 m) showed significant warming annual and summer trends, but similar and even stronger trends are found at lower elevation stations. A similar absence of EDW was also observed over close regions of the mid-latitude eastern Italian Alps (Tudoroiu et al. 2016).

The highest annual trend (0.67 °C/decade) in the capital of Zagreb is not just a result of climate change but also urbanisation. Nimac et al. (2021) observed a strong increase in the number of summer days (9 days/decade), while the number of tropical nights increased by 8 days/decade during the period 1990–2019. Consequently, the summer months are going to be more uncomfortable for citizens.

Annual trends of 0.3–0.7 °C/decade are significant for the entire country. They are stronger than the trends of 0.26 °C/decade over the Pannonian Basin for 1970–2005 from the E-OBS (Table 2 in Lazić et al. 2021), probably due to the temperature rise after 2005. They are comparable with annual trends of approximately 0.4°/decade for the 1980–2019 period for the Abruzzo region in central Italy (Curci et al. 2021), where a similar approach is used for homogenisation of the data series. It seems that the observed trends over the Mediterranean are 2–2.5 times stronger than the global mean warming, especially during summer (van Oldenborgh et al. 2009). Serious warnings come from the climate model results, which predict that the Mediterranean region is likely to warm at a rate approximately 20% larger than the global annual mean surface temperature (Lionello and Scarascia 2018). The values are particularly high in summer and in the continental areas north of the basin, where warming will generally be 50% stronger than at the global scale and locally even twice as large (Lionello and Scarascia 2018). The region will face consequences such as devastating heatwaves, water shortages, loss of biodiversity and risks to food production.

The monthly anomalies from the 1981–2010 average for Croatia (Fig. 14) are comparable with the temperature anomaliesFootnote 1 over Europe derived from ERA-Interim reanalysis data (subsided later by ERA5 (Hersbach et al. 2020)) and provided by the Copernicus Climate Change Service. Anomalies for April, June, July, August and November (not shown) were mostly strong and positive from the beginning of the twenty-first century, which is related to already discussed significant positive trends in station data over the entire country in those months (Fig. 15).

Even though the presented analysis is based on data for a limited area, we believe that the methodology is useful for the development of any similar research and climate monitoring products. We presented the power of clustering in defining the nine Croatian climate regions and showed how homogenisation affects the trends in the data. Additionally, we highlighted the importance of objectively estimating the parameters of local neighbourhoods for kriging and showed, at least for Croatia, that a single set of these parameters can be used for all months with a negligible increase in the RMSE.

The monthly maps are produced for the period 1981–2018 together with the error estimates. Looking at the aggregated statistical values derived from the monthly maps, it was confirmed that the interpolation method can successfully reproduce the means (Fig. 12 and Fig. 16a), annual cycle differences (Fig. 13) and trends (Fig. 15 and Fig. 16c). The maps were used further to derive the monthly anomalies.

This study presents the last four decades of mean air temperature and its change over Croatia, pointing to prevailing positive annual anomalies in the twenty-first century and strong and significant warming trends. From the beginning of the twenty-first century, average monthly anomalies are often positive and up to 4.7 °C (January 2007) warmer than the average and only occasionally negative (Fig. 14). The measured amount, sign and significance are accurately captured on the trend maps calculated from the monthly maps (Fig. 15). The significant strong warming of 0.3–1.0 °C/decade was observed and mapped over the entire Croatian territory in April, June, July, August and November together with the significant annual trends of 0.3–0.7 °C/decade (Fig. 16c) that were stronger inland than on the coast. The EDW was not confirmed for Croatian Dinarides.

The new monthly temperature grids belong to the developing CroMonthlyGrids dataset that is intended to support climate monitoring, climate change detection and adaptation planning on national and local administrative levels. Further grids are in preparation for minimum and maximum temperatures and monthly precipitation. These grids will form the basis for the calculation of climate normals and operational national climate monitoring products with further applications in sectors such as hydrology, forestry, agronomy or health.