1 Introduction

Hungary, located in the Carpathian Basin in Central Europe, lies in the transitional zone between projected increase and decrease in mean summer precipitation by the end of the twenty-first century, according to the ensemble-median of the EURO-CORDEX regional climate model (RCM) simulations (Jacob et al. 2014; Rajczak and Schär 2017). Even the sign of the change in the warmest season is questionable. Hungarian near- and far-future RCM runs also display large uncertainties regarding the prospective shifts in annual and seasonal precipitation amounts (Krüzselyi et al. 2011). In Hungary, four RCMs have been adapted so far: PRECIS and RegCM by the Department of Meteorology at the Eötvös Loránd University (Bartholy et al. 2009; Torma et al. 2011; Pieczka et al. 2017), and REMO and ALADIN-Climate by the Hungarian Meteorological Service (Szépszó and Horányi 2008; Csima and Horányi 2008). Currently, only ALADIN-Climate and RegCM are in use. Lately, we have begun the preparation of the Weather Research and Forecasting model (Skamarock et al. 2008) for regional climate modeling purposes, intending to contribute to the national RCM ensemble, especially to mitigate the summer precipitation inconsistencies. This sensitivity study represents the first step towards this goal by assessing the general performance of multiple WRF configurations over the Carpathian Basin for surface air temperature, precipitation, and global radiation. The simulations and the evaluation cover the entire year of 2013.

Several studies evaluated the WRF model by producing decades-long historical regional climate simulations for various domains across the world, including European regions (e.g., Caldwell et al. 2009; Heikkilä et al. 2011; Warrach-Sagi et al. 2013; Marteau et al. 2015; Marta-Almeida et al. 2016; Raghavan et al. 2016; Kryza et al. 2017; Annor et al. 2018). However, it is advised to assess the sensitivity of the model to different dynamical configurations and physical parameterizations before long-term integration (Giorgi 2019). For this purpose, the WRF model is a convenient tool as it provides plenty of options. It has been extensively shown that the efficiency of the WRF RCM in simulating climatologically relevant variables (e.g., precipitation) highly depends on the representation of physical processes such as cumulus convection, planetary boundary layer (PBL), land-surface, and microphysical processes (Bukovsky and Karoly 2009; Argüeso et al. 2011; Pieri et al. 2015). The effectiveness of the schemes varies between different geographical regions and seasons, so optimization is required for any location of interest.

Mooney et al. (2013) tested the performance of the WRF RCM for several subregions of Europe based on a set of 6-year-long model runs combining different microphysics, PBL, longwave radiation, and land-surface schemes. For most of the configurations, they found an extensive winter-time negative bias exceeding 5 °C in model-derived 2 m temperature over the Eastern parts of Europe and a general overestimation of daily mean precipitation, surpassing 1.5 mm day−1 in the 1990–1995 period. Politi et al. (2018) also confirmed the wet bias conducting one-year WRF simulations with various physical settings for Greece. WRF results validated within the EURO-CORDEX framework for the 1990–2008 period (Kotlarski et al. 2014; Katragkou et al. 2015), and further five-year-long experiments on the EURO-CORDEX domain (García-Díez et al. 2015) also show an underestimation of 2 m temperature in Eastern Europe in the cold season and a positive precipitation bias. The wintertime cold bias, often exceeding 3 °C averaged for the 2002–2006 period, is argued to be related to snow cover and albedo overestimation. Recently, numerous other investigations have been carried out worldwide with the WRF model to select the most appropriate configuration for regional climate simulations (e.g., Ji et al. 2016; Tariku and Gan 2018; Hu et al. 2018; Kuo and Gan 2018; Hui et al. 2019).

In this study, 1-year simulations are performed with WRF using different initial and boundary conditions (ICBC) datasets, dynamical settings, and physical parameterization schemes. Validation is based on gridded observations of 2 m temperature, precipitation, and global radiation. The objective is to evaluate the general performance of the model and examine the impact of changing configurations. This enables us to identify the causes of systematic errors and to point out possible solutions that must be further investigated. The sensitivity tests might help dismiss less efficient parameterization schemes in midlatitude continental climate regimes, guiding the selection of an accurate experimental setup for long-term simulations.

The paper is structured as follows. Section 2 describes the model setup, features of the different WRF experiments, and datasets used in the study. Simulation results are presented and discussed in Sect. 3. The work is summarized in Sect. 4.

2 Data and methods

2.1 Model configurations and input data

A total number of fifteen 1-year-long regional hindcast simulations were produced using the non-hydrostatic mesoscale Advanced Research WRF (ARW) model version 3.8 (Skamarock et al. 2008). Two one-way nested domains on Lambert conformal projection were defined (Fig. 1). The outer (parent) domain, centered at 43°N, 17.5°E, has a horizontal resolution of 50 km × 50 km, contains 143 × 81 grid points, and is in accord with the Med-CORDEX region (Ruti et al. 2016). The inner (child) domain’s grid spacing is 10 km × 10 km. It consists of 210 × 120 grid points and covers the Carpathian Basin as well as the mountain ranges of the Alps, the Carpathians, and the Dinaric Alps. Land use and soil type, divided into 28 and 16 categories respectively, are based on the United States Geological Survey (USGS) dataset including lakes. The adaptive time step option was turned on for all model runs.

Fig. 1
figure 1

Geographical coverage and terrain height of the model domains. Red rectangles encompass the outer domain and the one-way nested inner domain, with a horizontal resolution of 50 and 10 km, respectively

The simulations were carried out for 2013. A total precipitation of 650 mm averaged over Hungary characterized this year, which is slightly (14%) more than the 1971–2000 reference value. The annual precipitation in the southwestern parts of the country exceeded 800 mm; meanwhile, the eastern and central parts received less than 550 mm (Fig. 2). The spatially averaged annual mean temperature was 11.1 °C, which is 1.1 °C warmer than the 30-year norm.

Fig. 2
figure 2

Annual mean temperature (left column) and total precipitation (right column) for the year 2013, based on the 0.25° E-OBS gridded observational dataset, the 0.3° ERA5, and the 0.75° ERAI reanalysis datasets

For most of the simulations, only one feature was changed at a time, compared to the reference run (Table 1). The acronym of the simulations reflects the change in settings. Physics schemes selected for the tests are among the most widely used ones in the literature of the WRF regional climate model. Our computational and storage capacities naturally limited the number of configurations tested.

Table 1 Summary of the experimental configurations

The ERA5 reanalysis (Copernicus Climate Change Service (C3S) 2017; Hersbach et al. 2020) of the European Centre for Medium-Range Weather Forecasts (ECMWF) provided the initial and lateral boundary conditions, as well as the sea surface temperatures (SSTs) for the majority of the experiments, at 0.3° horizontal resolution, updated in WRF at 6-hourly intervals. It has been well known for long that the quality of the ICBC data crucially affects RCM performance (e.g., Giorgi and Mearns 1991; Giorgi 2019). Hence, to assess the sensitivity of the RCM to the driving fields, the input data for one simulation were derived from the ERA-Interim (ERAI) reanalysis (Dee et al. 2011), at a grid spacing of 0.75°. Both ERA5 and ERAI show an overestimation of annual total precipitation for 2013 compared to the gridded observations (Fig. 2). The annual overestimation averaged over the whole investigation area (see Sect. 2.2) is ≈ 200 mm for ERA5 and ≈ 80 mm for ERAI. This implies that although both driving datasets used in the current study are reanalysis products, differences exist among them in terms of the values of meteorological variables as well as the features of the assimilation system and the model itself. One must note, however, that a comprehensive assessment of the impact of ICBC data on the RCM results should include reanalysis products from other institutions and general circulation model fields.

WRF was initialized on 1 January 2013 at 00 UTC and ran continuously for 1 year, with 1-week restart intervals. The only exception is the run “spinup”, which was started on 1 December 2012 at 00 UTC and went on 1 month longer, to explore the effects of a 1-month spin-up period. The output files were archived at three-hourly intervals. In one instance, the hydrostatic approximation was applied instead of the non-hydrostatic dynamics. Thirteen simulations were performed with 43 unstaggered eta model levels in the vertical, meanwhile runs “lev31” and “lev45” utilized 31 and 45 unstaggered eta levels, respectively, with a top layer at 50 hPa for all cases. For “lev31”, the levels were distributed automatically with equal distance between them. In the other experiments, the vertical grid became denser in the lower layers of the atmosphere. For run “lev45”, two layers were added right above the surface. Figure S1 of the Online Resource provides a schematic of the different model level distributions used in this study.

Microphysical parameterizations tested are the aerosol-aware Thompson scheme (Thompson and Eidhammer 2014), the WRF single-moment 3-class (WSM3; Hong et al. 2004) and 6-class (WSM6; Hong and Lim 2006) schemes. For the PBL and surface layer processes, three configurations were put on trial. Out of the local PBL schemes, mainly the Mellor–Yamada–Janjić (MYJ; Janjić 1994) was employed, together with the Eta model’s near-surface scheme (Janjić 1994). In one experiment, the MYJ scheme was replaced with the University of Washington (UW; Bretherton and Park 2009) parameterization. One simulation applied the Yonsei University (YSU; Hong et al. 2006) nonlocal closure, paired with the MM5 model’s surface layer scheme (Jiménez et al. 2012). In terms of land-surface models, the Noah LSM with multiparameterization options (Noah-MP; Niu et al. 2011) and the Rapid Update Cycle (RUC) LSM (Benjamin et al. 2004; Smirnova et al. 2016) were evaluated. For cumulus convection, the Betts–Miller–Janjić (BMJ; Janjić 1994), the Kain–Fritsch mass-flux (KF; Kain 2004), and the Multi–scale Kain–Fritsch schemes (MSKF; Zheng et al. 2016) were investigated. Shortwave (SW) and longwave (LW) radiation transfer were represented by the scheme of Dudhia (1989) and the Rapid Radiative Transfer Model (RRTM; Mlawer et al. 1997). In one case, the Rapid Radiative Transfer Model for General Circulation Models (RRTMG; Iacono et al. 2008) was applied for both SW and LW processes.

One additional simulation was produced, termed “snow”, in which we modified the Noah-MP LSM code so that snow accumulation was prevented on the surface. The aim of this was to investigate the effect of snow cover on modeled temperatures.

2.2 Validation data and methods

To evaluate model performance, results were compared with the ensemble mean of the regular 0.25° grid version of the E-OBS v20.0e observational dataset, produced by the Royal Netherlands Meteorological Institute (KNMI) within the European Climate Assessment and Dataset (ECA&D) framework (Haylock et al. 2008; van den Besselaar et al. 2011; Cornes et al. 2018). The database covers terrestrial Europe, Northern Africa, and the Middle East. It is based on surface observations and contains the ensemble mean and spread of daily mean, minimum, and maximum temperature, averaged sea level pressure, and total precipitation, starting from 1950. In the 20.0e version, global radiation (direct plus diffuse downward SW radiation flux at the surface) has been introduced. Gridded datasets such as E-OBS might have uncertainties caused by the interpolation method, especially over areas with sparse station network and complex terrain (Hofstra et al. 2009). Spatially inhomogeneous variables like precipitation are more affected by these uncertainties. The investigation is carried out for the geographical area extending from 4.125°E to 29.125°E and 42.375°N to 51.625°N, which encompasses the Pannonian Basin including Hungary, the authors’ primary area of interest (Fig. 3). E-OBS is the only surface-based gridded observational dataset available for 2013 that entirely covers this geographical region at a daily temporal and a relatively high spatial resolution. Within this area, two subregions were assigned: a mountainous terrain located in the Eastern Alps (hereafter EA; 9.875°E–13.375°E, 46.125°N–47.875°N) and a lowland area mostly within Hungary, marginally reaching beyond its southern borders, termed the Great Plain (hereafter GP; 18.375°E–21.375°E, 45.375°N–47.625°N). The aim of this is to assess model results over regions with different topographical features. After interpolating model-derived 2 m temperature, global radiation, and total precipitation to the E-OBS grid within the investigation area, daily, monthly, and annual means and sums were calculated. For temperature and radiation, inverse distance squared weighting; for precipitation, optimal spatial kriging interpolation (Kottek and Rubel 2007) was used.

Fig. 3
figure 3

Investigation areas used for model validation and terrain height of the inner domain interpolated to the 0.25° E-OBS grid. Red rectangles mark the two subregions: the Eastern Alps (EA; left rectangle) and the Hungarian Great Plain (GP; right rectangle)

Observed snow depth and snowfall were derived from the daily snow water equivalent (SWE) product of the Copernicus Global Land Service (CGLS; https://land.copernicus.eu/global/products/swe, accessed 26 Jun 2020), available at a spatial resolution of 0.05° for the Northern Hemisphere, to evaluate the WRF-simulated physical snow depth and water equivalent snow accumulation. The dataset is based on both satellite (microwave radiometer brightness temperature and snow extent) and synoptic weather station snow depth measurements (Takala et al. 2011; Pulliainen 2006). Snow depth was obtained by dividing the SWE by 240 kg m−3, a constant snow density applied in the retrieval algorithm. The comparison was carried out in terms of field mean values. Thus, the datasets were left at their original resolution and spatial averaging was performed over the same geographical area (namely the GP region).

For most parameters presented below, average values were also calculated from the ERA5 and ERAI reanalysis datasets, to examine how variables simulated by WRF relate to the corresponding ones diagnosed from the forcing fields. This allows us to identify the potential strengths and weaknesses of the regional downscaling.

3 Results and discussion

In this section, WRF-derived 2 m temperature, precipitation, and global radiation are compared consecutively to the E-OBS dataset in terms of daily, monthly, and annual mean values. Spatial averages were calculated for the GP and the EA subregions defined above. One must note that although the 1-year experiments reveal some major model deficiencies, the results cannot be considered statistically robust.

3.1 Temperature

The Taylor plots (Taylor 2001) for the two subregions show the normalized standard deviation, the normalized root mean square error (RMSE), and the Pearson correlation coefficient (r) of daily mean temperature derived from the WRF simulations relative to the E-OBS data (Fig. 4, filled circles). For most cases, r ≈ 0.95; in the GP subregion for runs “wsm3” and “wsm6”, r ≈ 0.9. As measured by the r number, the Thompson microphysical scheme slightly outperforms the WRF single moment schemes for temperature. For the Taylor plots showing the non-normalized statistical values, the reader is referred to Fig. S2 of the Online Resource. The RMSE is mostly within 2.5–3.5 °C. The only exception is run “wsm3” in the GP subregion, which displays the largest RMSE of 5 °C.

Fig. 4
figure 4

Normalized Taylor diagram for daily mean temperature (filled circles), daily mean global radiation (filled triangles), and daily total precipitation (filled diamonds) simulated by the different WRF configurations (indicated by different colors) in the GP (left) and the EA (right) subregions. Reference: E-OBS; time period: 2013; horizontal resolution: 10 km

WRF underestimates the annual mean temperature of 2013 regardless of model configuration compared to E-OBS in both subregions (see Fig. 5 for the GP and Fig. S3 of the Online Resource for the EA). In the GP subregion, the cold bias is typically 4 °C or higher and reaches 7 °C in “wsm3”. Changing the boundary layer representation from the MYJ scheme to either the YSU or the UW scheme reduces the underestimation from − 4.4 °C (reference) to − 2 °C or − 0.8 °C, respectively. However, in run “uw”, Noah-MP was also slightly modified besides switching the PBL parameterization (see the footnote in Table 1), so the exact cause of the improvement cannot be determined. Using RUC instead of the Noah-MP LSM or the RRTMG for both SW and LW radiation processes decreases the annual cold bias from − 4.4 to − 1.9 °C or − 2.3 °C, respectively.

Fig. 5
figure 5

Annual and monthly mean temperature from E-OBS and bias of the ICBC datasets and the WRF simulations. Time period: 2013; horizontal resolution: 10 km; GP subregion

The most considerable negative differences can be found in the first 4 months of 2013, peaking in April, with “wsm3” exceeding − 13 °C in the GP (Fig. 5, see also Fig. S4 of the Online Resource for a boxplot of monthly mean temperatures). Starting from May, the underestimation decreases and for a few configurations, slight warm biases of 0.1–4 °C occur. Overall, for temperature, the impact of the microphysics, land-surface, radiation, and PBL schemes can be emphasized. Reducing or increasing the number of model levels leads to a respective increment or decrement of less than 1 °C in the annual underestimation. Applying a 1-month spin-up period or ERAI ICBC data instead of ERA5 has negligible effects. The hydrostatic approximation and the cumulus scheme of choice have a minor impact on the annual temperature estimation, but the difference can reach 2–3 °C on a monthly timescale (e.g., in May). Temperature biases are generally lower in the EA subregion compared to the GP (see Figs. S3 and S5 of the Online Resource).

The general underestimation is also evident from the spatial distribution of the annual mean temperature bias relative to the E-OBS dataset (Fig. 6). The influence of model settings on bias magnitude corresponds well to the variations between simulations discussed above in terms of field-averaged errors (Fig. 5). The magnitude of the underestimation is the largest over Hungary and the northeastern parts of the investigation area.

Fig. 6
figure 6

Annual mean temperature derived from E-OBS and bias of the different WRF simulations for 2013. Horizontal resolution: 10 km

The most pronounced cold bias can be found in the first 4 months of the year over the northeastern parts of Europe, aligning with previous studies using the WRF RCM, relating the problem to snowfall and snow cover (Mooney et al. 2013; Katragkou et al. 2015; García-Díez et al. 2015). Indeed, we found that when compared to the CGLS observations, water equivalent snowfall in January, February, and April is overestimated by all simulations in the GP subregion (Fig. 7). This is connected to the systematic overestimation of large-scale precipitation (see Sect. 3.2 below). Moreover, snow depth is significantly overestimated, and snow residence time is too long, as shown in Fig. 8 for three simulations with contrasting performance regarding modeled temperatures. Differences in snow depth between the experiments correspond well to the temperature errors discussed above, with “wsm3” deteriorating and “ysu” improving the results of the reference run. Erroneous snow cover and insufficient melting are particularly apparent in April when temperature underestimation reaches its maximum. Here we present results from the coarse grid (outer domain) simulations as the snow depth variable, initially not being of main interest, could not be extensively archived in high resolution due to the limited storage capacities. However, temperature and snow depth bias patterns are very similar at 50 km and at 10 km grid spacing, with larger error magnitudes at increased resolution (see Figs. S6 and S7 of the Online Resource).

Fig. 7
figure 7

Boxplot of monthly water equivalent snowfall derived from the different WRF simulations (boxes), CGLS observations, ERA5, and ERAI reanalysis datasets. Time period: 2013; horizontal resolution: 10 km; GP subregion

Fig. 8
figure 8

Daily mean snow depth derived from the CGLS observations and three of the WRF simulations from January to April 2013. Horizontal resolution: 50 km; GP subregion

To understand the behavior of the “wsm3” run, we examined the mean temperature and hydrometeor mixing ratio profiles for situations when 3-hourly accumulated snow water equivalent in this experiment is at least four times as much as in “ref” and “ysu”. For space and time averaging, the GP subregion and the January-March period were considered, and only those output times when the water equivalent snowfall exceeds 0.1 mm in at least one of the three simulations. The instances fulfilling these conditions (a list of dates can be found in Table S1 of the Online Resource) are exclusively connected to cyclonic warm front passages and warm sectors, where temperature inversion often favors mixed-phase precipitation. Figure 9 shows the obtained profiles. In both “ref” and “ysu”, layers with temperatures above zero are observable. Thus, rain and snow particles are present simultaneously in the lowest model levels, considered separately in the Thompson microphysics scheme. On the other hand, WSM3 only predicts three hydrometeor variables, namely water vapor, cloud water or ice, and rain or snow. The distinction between the latter two is based on whether the temperature is above or below the freezing point. Therefore, it does not account for gradual melting and mixed-phase processes (Dudhia 1989; Hong et al. 2004; Skamarock et al. 2008). As seen in Fig. 9, the temperature in “wsm3” is below 0 °C throughout the lowest model layers, so precipitation falls in the form of snow. Even if there were warmer layers aloft, raindrops would immediately convert back to snow after leaving the region with positive temperatures. In conclusion, the WSM3 scheme, originally developed for summertime convective environments, seems to be inappropriate for simulating mixed-phase frontal precipitation related to cyclonic warm sectors, a frequent phenomenon in the Carpathian Basin during the cold months.

Fig. 9
figure 9

Vertical profile from three of the WRF simulations, averaged for periods when run “wsm3” displays excessive snowfall compared to “ref” and “ysu”: a temperature b hydrometeor mixing ratio, solid lines: all precipitation particles, dashed lines: rain (r) particles, dotted lines: snow (s) particles. See text for details. Time period: January to March 2013; horizontal resolution: 50 km; GP subregion

Our results underpin that the snow-albedo-temperature feedback plays a crucial role in temperature underestimation. The cold bias present in all simulations appears to be caused by too high albedo values induced by excessive snowfall and too weak melting. These discrepancies point towards a need for detailed inspection of the surface layer, the PBL, and the land-surface parameterizations. The Noah-MP LSM offers plenty of options that influence snow-related physical processes, either directly or indirectly (e.g., precipitation partitioning between snow and rain, snow temperature time scheme, dynamic vegetation, radiative transfer), which could be addressed through further sensitivity studies (Liu et al. 2017; You et al. 2020). The distinction between snow and rain in our experiments was based on the method of Jordan (1991), which is the default setting in Noah-MP.

Results from the hypothetical simulation without snow cover also show that snow-related processes play a crucial role in temperature underestimation. As seen from comparing the bias values in the last row of Fig. 5 with the reference run results, average cold bias is reduced from − 6.5 to − 1.6 °C in the first 4 months of the year if there is no accumulated snow on the surface. Although this experiment helps to explain most of the model errors from January to April, it must be noted that a negative temperature bias persists in the “snow” simulation for the rest of the year. The overall underestimation is also obvious from the spatial characteristics of the annual temperature bias of the “snow” run (see Fig. S8 of the Online Resource). Therefore, in the warm months, further investigation is required to identify the cause of too low temperatures. It must be noted that a situation with no snow cover is unrealistic and certainly introduces model errors, so this modification should not be used for practical cases.

3.2 Precipitation

Out of the parameters investigated, precipitation shows the greatest variability among the different configurations (Fig. 4, filled diamonds). For simulated daily total precipitation, 0.3 < r < 0.4 and 0.4 < r < 0.5 in the GP and the EA subregions, respectively, compared to the E-OBS data. RMSE values of the experiments correspond to 4–6 mm day−1 and 9–10 mm day−1 for the GP and the EA, respectively (for the Taylor plots showing the non-normalized statistical values, the reader is referred to Fig. S9 of the Online Resource). It must be noted that E-OBS tends to underestimate precipitation over complex topography due to the sparsity of observations and the interpolation method (e.g., Hofstra et al. 2009; Herrera et al. 2019; Kotlarski et al. 2019).

All WRF simulations overestimate the daily mean total precipitation averaged for 2013 by 0.6–1.7 mm day−1 and 1.6–3.8 mm day−1 in the GP (Fig. 10) and the EA (see Fig. S10 of the Online Resource) subregions, respectively. The larger wet bias in the latter might be related to the inadequate representation of orography at 10 km resolution and an underestimation of precipitation in the E-OBS dataset mentioned above. Concerning daily mean precipitation by months, the overestimation is general for the seasons dominated by large-scale weather processes. In the period dominated by convective precipitation (April to August), certain simulations are closer to the observations or even too dry in both subregions (for boxplots of daily mean precipitation by months in the GP and the EA, see Figs. S11 and S12 of the Online Resource). Differences between the configurations are larger in the warm months. In the GP subregion, applying the hydrostatic model dynamics instead of the non-hydrostatic core reduces the annual positive bias from 1.23 mm day−1 (reference run) to 0.61 mm day−1 (Fig. 10). This implies that over plain areas at 10 km resolution, this approximation holds sufficiently. However, over the mountainous EA subregion, run “hydr” introduces an annual bias increase from 1.96 to 3.26 mm day−1, making it the worst-performing run in the region after “uw” (see Fig. S10 of the Online Resource).

Fig. 10
figure 10

Daily mean precipitation by year and month from E-OBS and bias of the ICBC datasets and the WRF simulations. Time period: 2013; horizontal resolution: 10 km; GP subregion

From now on, we will focus on the GP subregion, representative of the authors’ primary area of interest. Out of the microphysical schemes tested, the WSM6 increases annual wet bias from 1.23 to 1.4 mm day−1 and the WSM3 reduces it to 0.7 mm day–1, compared to the reference run applying the Thompson parameterization. Despite WSM3 being closest to the observation, as pointed out in the previous section, its overestimation regarding snowfall and snow depth is the largest of all experiments, which significantly degrades simulated temperatures. Employing the RUC LSM leads to a higher annual error reaching 1.67 mm day−1 compared to 1.23 mm day−1 in “ref” with Noah-MP, despite that the temperature estimation is better for the former simulation. With the UW and the YSU PBL schemes, the model generates an annual precipitation bias of 0.9 and 0.8 mm day–1, respectively, both less than 1.23 mm day−1 with the MYJ scheme (“ref”). Applying the RRTMG parameterization for SW and LW radiation yields a wet bias reduction to 0.63 mm day−1. Changing the BMJ cumulus scheme to KF improves model performance, leading to an annual error of 1.01 mm day−1. Decreasing the number of vertical model levels from 43 to 31 or adding two layers right above the surface both result in a better agreement with E-OBS (the error is 1.02 and 0.95 mm day−1, respectively, instead of the 1.23 mm day–1 reference).

The impact of applying a 1-month spin-up period is negligible on an annual timescale. This is because input fields from the ICBC datasets are already consistent for atmospheric as well as soil variables, therefore no additional time is required in the model for them to reach equilibrium between the soil and the atmosphere. On the other hand, in May and August, the difference is around 1 mm day−1 compared to the reference run. The simulation driven by ERAI instead of ERA5 presents a minor improvement from 1.23 to 1.05 mm day–1.

To summarize, the modeled precipitation is greatly affected by the dynamical assumption, the number of vertical layers, and the microphysical, PBL, land-surface, radiation, and cumulus schemes. It must be pointed out that precipitation derived from either ERAI or ERA5 is also excessive relative to E-OBS, therefore the wet bias might at least partially originate from the driving fields. Moreover, mean sea level pressure calculated from the WRF simulations is underestimated in the months dominated by large-scale processes (not shown), which suggests a more intense cyclonic activity, as formerly argued by Mooney et al. (2013).

Spatial characteristics of daily mean precipitation bias compared to E-OBS for 2013 are shown in Fig. 11. Extensive overestimation can be observed, peaking over the mountain ranges of the Alps, the Carpathians, and the Dinaric Alps. As mentioned earlier, this might be associated with the coarse model resolution and deficiencies in the gridded observations. The wet bias over complex terrain is the smallest in run “vert31” and largest in run “hydr”.

Fig. 11
figure 11

Daily mean total precipitation derived from E-OBS and bias of the different WRF simulations for 2013. Horizontal resolution: 10 km

The magnitude of precipitation overestimation is larger for low (< 5 mm day−1) amounts (Fig. 12). This implies the overestimation of large-scale (stratiform) rather than convective precipitation and is in accord with the smaller biases during the summer months.

Fig. 12
figure 12

The annual number of grid points with daily total precipitation exceeding certain threshold values. Time period: 2013; horizontal resolution: 10 km; GP subregion (left), EA subregion (right)

The mean diurnal cycle of convective precipitation looks adequate in WRF with a pronounced peak in the afternoon (Fig. 13). The convective scheme of ERA5 is more active compared to that of the simulations, especially during night-time, probably caused by the coarser resolution of the reanalysis. Out of the experiments, run “ruc” produces the most convective precipitation, especially from May to July. The reason for this appears to be a too strong latent heat flux in the period dominated by locally forced processes that stems from differences in the LSM formulation (not shown). Specifically, transpiration in RUC is independent of vegetation type, and the exchange coefficients are derived directly from the PBL scheme without being modified in the surface scheme (Smirnova et al. 1997; Niu et al. 2011). The KF cumulus scheme shows enhanced convective activity compared to BMJ in the evening and at night. Run “mskf” seems to have difficulties in representing the diurnal cycle. The reason for this cannot be determined in the present study as this experiment differs from the reference run in three physics schemes simultaneously, namely the RUC LSM, the YSU PBL parameterization, and the MSKF cumulus scheme. However, as runs “ruc” and “ysu” utilize only one of the settings mentioned above each, and do not display such discrepancies, attention must be drawn towards the MSKF scheme. This parameterization was developed for grid distances less than 10 km (Zheng et al. 2016), therefore the horizontal resolution used in the present study might be too coarse for it to work effectively. See the graph for the EA subregion in Fig. S13 of the Online Resource.

Fig. 13
figure 13

Mean diurnal cycle of convective precipitation derived from the different WRF simulations (colored lines) and the ERA5 reanalysis (dashed black line) dataset. Time period: 2013; horizontal resolution: 10 km; GP subregion

3.3 Global radiation

Daily mean global radiation series calculated from the different WRF configurations show a relatively high correlation with the E-OBS values, specifically, 0.8 < r < 0.9 (Fig. 4, filled triangles). The corresponding RMSE numbers are within 50–60 W m−2 (for the Taylor plots showing the non-normalized statistical values, the reader is referred to Fig. S14 of the Online Resource). The simulations displaying somewhat lower correlation and higher RMSE are “wsm3”, “wsm6” and “lev31” in the GP subregion and “hydr” in the EA subregion.

Annual mean global radiation for 2013 is mostly underestimated in the GP (Fig. 14) and the EA (see Fig. S15 of the Online Resource) subregions, however, the error magnitude is somewhat smaller in the latter. The exceptions in the GP are runs “uw” and “ysu” showing a minor positive annual bias and “lev45” with a zero difference. Also, RRTMG stands out by generating a significant overestimation reaching 33 W m−2. Regarding the annual cycle, overestimation is dominant from May to August; meanwhile, in the other months, a negative bias is persistent in both subregions (for boxplots of monthly mean global radiation in the GP and the EA, see Figs. S16 and S17 of the Online Resource). This pattern is consistent with the overestimation of large-scale precipitation, probably accompanied by excessive stratiform cloud cover that prevents solar radiation from reaching the surface. Despite the summertime positive bias, temperature underestimation prevails, especially in the simulations using the MYJ parameterization for PBL processes (Fig. 5). This implies that the schemes responsible for the heat exchange between the land and the atmosphere (the LSM, the surface layer, and the PBL parameterizations) should be further studied to improve modeled temperatures (Chen et al. 2019). The results show that the YSU and the UW boundary layer schemes somewhat mitigate the cold bias in summer. It must be noted though that with enough surface and boundary layer heating, a warm temperature bias would be expected due to the shortwave radiation surplus. The fact that both SW schemes display an overestimation draws attention to the responsibility of insufficient cloudiness, as reported earlier by García-Díez et al. (2015) and Katragkou et al. (2015). Further cumulus schemes will be tested in order to mitigate the summertime positive bias in surface shortwave radiation flux, as both BMJ and KF tend to generate too little cloud cover.

Fig. 14
figure 14

Annual and monthly mean global radiation from E-OBS and bias of the ICBC datasets and the WRF simulations. Time period: 2013; horizontal resolution: 10 km; GP subregion

Global radiation averaged over the year in the GP subregion is insensitive to the input dataset as well as the spin-up period and the LSM of choice; meanwhile, it is only slightly affected by the dynamical core (Fig. 14). Adding two model layers above the surface, changing the MYJ PBL scheme to either the YSU or the UW, switching the cumulus scheme from the BMJ to the KF, and using the hydrostatic dynamics all increase the annual mean solar radiation flux and thus improve model performance compared to the reference run. Employing the RRTMG for radiative transfer instead of the SW scheme of Dudhia (1989) and the RRTM LW parameterization results in the largest values of global radiation among the simulations and an overestimation in all months except for October. Applying the WSM6 scheme for microphysics enhances the annual negative bias from − 12 to − 31 W m−2, and the underestimation is present throughout the whole year, in opposition to all the other experiments. As seen from Fig. 5, for the GP subregion, the cold bias at the beginning of the year is the largest in run “wsm3”, even though it is the only experiment that overestimates global radiation in those months. The explanation for this is that in “wsm3”, very low near-surface temperatures yield dry atmospheric conditions and prevent fog formation. Thus, more solar radiation can reach the surface. Fog is the main cause of the enhanced negative errors in April, as revealed by the mean profiles of cloud water mixing ratio (see Fig. S18 of the Online Resource). The weather situation over Hungary in the second half of April was characterized by anticyclonic conditions that favor the formation of low-level clouds. We hypothesize the modeled fog to be unrealistically frequent and persistent because of the excessively low temperatures. Prevalent fog events during the day presumably contribute to the underestimation of solar radiation at the surface from January to April.

The spatial distribution of annual mean solar radiation difference from the E-OBS data is presented in Fig. 15 for the WRF experiments. Variations are observable between simulations even in the sign of the bias. From the north towards the southern parts of the investigation area, a transition from negative to positive bias or a strengthening positive bias is present in most cases. The underestimation of downward SW radiation reaches its maximum at the northeastern parts, which might contribute to the cold bias found there. The negative differences over mountain ranges in run “hydr” are aligned with the excessive precipitation and cloudiness generated by the hydrostatic approximation over complex topography. It must be highlighted that even the experiments displaying a positive bias in annual SW radiation (Fig. 15) do not eliminate the temperature underestimation (Fig. 6). This confirms that other processes are involved in formulating the cold bias such as those related to snow in winter and possibly to land–atmosphere heat exchange in summer. The reduction of the negative temperature bias by the “rrtmg” simulation can be explained by the significant overestimation of SW radiation, which is an example of false compensation caused by another model error.

Fig. 15
figure 15

Annual mean global radiation derived from E-OBS and bias of the different WRF simulations for 2013. Horizontal resolution: 10 km

4 Summary and outlook

The mesoscale WRF-ARW model was run in regional climate mode (i.e., integrated continuously with only the lateral boundary conditions updated) to produce one-year-long simulations for the Pannonian Basin region. The grid spacing was 10 km and the integration period covered the year 2013. Each simulation differed either in terms of physical parameterization schemes, dynamical settings, or ICBC data. Model-derived 2 m temperature, precipitation, and downward shortwave radiation were validated against the E-OBS observational dataset. The objective was to assess the general performance of the model with different configurations in order to identify the main model errors, their possible causes, and better performing experimental setups. The analysis was expected to point out directions of further improvement. The main findings of the paper are summarized below.

The current study confirms many results obtained earlier with the WRF RCM for Central and Eastern Europe. The annual mean temperature at 2 m is consistently underestimated. The cold bias is remarkably large from January to April. Total precipitation is overestimated, especially in the period driven by large-scale weather processes (e.g., winter). Differences among the experiments are more pronounced in late spring and summer, when locally forced processes dominate (e.g., moist deep convection). The magnitude of the wet bias is smaller from April to August in some of the simulations. Global radiation is underestimated all year except for the warm months (from May to August) when the model displays a surplus of incoming solar radiation compared to the E-OBS data.

The cold bias at the beginning of the year is mainly caused by the significant overestimation of snow depth, a result of excessive snowfall and too weak melting. Using the WSM3 scheme instead of the Thompson microphysics even amplifies snow-related errors by not accounting for mixed-phase precipitation related to warm fronts. This suggests that for regions where temperature inversions are frequent in the cold season (e.g., midlatitude continental climate regimes), it is beneficial to use more complex microphysics parameterizations with multiple hydrometeor species that can represent mixed-phase processes. These findings help to explain WRF temperature biases over Europe demonstrated by previous studies.

The introduction of surface-based global radiation measurements to the E-OBS dataset enabled us to reveal false error compensations and further deficiencies in the model. For example, the RRTMG scheme reduced the annual cold bias as a result of the overestimation of incoming solar radiation. In the summer, global radiation overestimation reaches 40–60 W m−2. This implies insufficient cloudiness, as formerly argued by García-Díez et al. (2015). Therefore, the reduced precipitation bias in summer might be the consequence of a wrong compensation by a convective scheme that is not active enough. Further inspection is needed to confirm this. Another possible explanation is that sub-grid scale convective clouds are not well represented in the radiation calculations. Despite the excessive downward shortwave radiation, most simulations persistently display too low temperatures from May to August. This is indicative of unsatisfactory heat exchange between the land and the atmosphere.

Although an overall best-performing configuration cannot be selected, this paper gives a first implication of the influence of each setting modification on model behavior. Based on the results presented, some parameterizations might be dismissed in our future experiments (e.g., the MYJ PBL, RRTMG radiation, MSKF cumulus, and the WSM3 and WSM6 microphysics schemes). Subsequent directions of research are the improvement of snow cover treatment and land–atmosphere heat transfer through testing the options available in Noah-MP and evaluating further surface layer and PBL parameterizations. More deep convection schemes will be examined in order to mitigate the excessive summertime solar radiation through enhancing cloudiness, and cloud-radiation interactions will also be addressed.

It is important to point out some limitations of this study. As mentioned before, E-OBS gridded data might have significant uncertainties, especially over mountainous terrain. Integration for longer periods is required to test model performance under various annual climatic regimes (e.g., extremely dry or wet years), and to derive statistically representative annual and monthly results. Changing the physical parameterizations according to seasons or even large-scale weather patterns could be favorable. One must be cautious of generalizing the present results to other areas of the world as region-specific model optimization is always required.