Benefits and added value of convection-permitting climate modeling over Fenno-Scandinavia

Convection-permitting climate models have shown superior performance in simulating important aspects of the precipitation climate including extremes and also to give partly different climate change signals compared to coarser-scale models. Here, we present the first long-term (1998–2018) simulation with a regional convection-permitting climate model for Fenno-Scandinavia. We use the HARMONIE-Climate (HCLIM) model on two nested grids; one covering Europe at 12 km resolution (HCLIM12) using parameterized convection, and one covering Fenno-Scandinavia with 3 km resolution (HCLIM3) with explicit deep convection. HCLIM12 uses lateral boundaries from ERA-Interim reanalysis. Model results are evaluated against reanalysis and various observational data sets, some at high resolutions. HCLIM3 strongly improves the representation of precipitation compared to HCLIM12, most evident through reduced “drizzle” and increased occurrence of higher intensity events as well as improved timing and amplitude of the diurnal cycle. This is the case even though the model exhibits a cold bias in near-surface temperature, particularly for daily maximum temperatures in summer. Simulated winter precipitation is biased high, primarily over complex terrain. Considerable undercatchment in observations may partly explain the wet bias. Examining instead the relative occurrence of snowfall versus rain, which is sensitive to variance in topographic heights it is shown that HCLIM3 provides added value compared to HCLIM12 also for winter precipitation. These results, indicating clear benefits of convection-permitting models, are encouraging motivating further exploration of added value in this region, and provide a valuable basis for impact studies.


Introduction
Projected future warming in northern Europe is among the largest in the world, driven to a large extent by the strong positive feedback involving reduction of snow and ice as the climate warms (Collins et al. 2013). As a result of global warming, the probabilities for winter cold episodes in this region are projected to decrease significantly (Benestad 2011) and summer warm extremes to be more pronounced (e.g. Nikulin et al. 2011). Further, the hydrological cycle intensifies (Bengtsson 2010) leading to more precipitation as well as more intense extreme events (e.g. Vautard et al. 2014). Projected changes in precipitation amounts, snowpack and snow cover will considerably impact surface hydrology through, for example, changed surface runoff as well as timing and amplitude of the spring flood (von Storch et al. 2015).

3
Several observational studies have assessed extreme precipitation events in the Fenno-Scandinavian region and the associated role of certain atmospheric circulation patterns, on an annual basis (Irannezhad et al. 2017) as well as for the cold (e.g. Azad and Sorteberg 2017;Mazon et al. 2015) and warm (e.g. Hellström 2005) seasons. In fall and winter, when westerly and south-westerly winds dominate, there is a strong orographic control of precipitation distribution and amounts (Isemer et al. 2015). Weather and climate models' ability to reproduce spatial distributions and amounts in the Scandinavian mountains is therefore related to their representation of topography in the model. For example, Pontopiddan et al. (2017) showed the importance of model grid resolution in accurately reproducing a heavy rainfall event occurring in an area of steep topography in southern Norway. Compared to observations a model with kilometer scale grid spacing performed much better than a coarser model and reanalysis.
During summer, heavy rainfall events are still mostly associated with large-scale cyclonic activity (Hellström 2005), however, the relative importance of convective thunderstorms and rain showers increases (Isemer et al. 2015), which may be diurnally forced or embedded in meso-scale or frontal systems. Weather and climate models struggle to accurately represent atmospheric convection since it involves dynamical and thermo-dynamical interactions from the small turbulent scales (< 1 km) to the large synoptic scales O(1000 km) (Bryan et al. 2003; Molinari and Dudek 1992;Arakawa 2004). With the typical grid spacing of O(10-50 km) of regional climate models (RCMs) convective processes are at best only partly resolved and are therefore usually parameterized, even though some recent studies indicate that convection parameterization may not be needed even at those grid spacings (Vergara-Temprado et al. 2020). RCMs with parameterized convection commonly show biases in precipitation characteristics similar to those seen in coarser-resolution global climate models (GCMs) (Liang 2004;Brockhaus et al. 2008). Major deficiencies include too large areas of precipitation when compared to observations and generally too frequent weak intensities (the "drizzle" problem) (e.g. Dai 2006;Stephens et al. 2010), and a premature onset and too early peak of diurnally forced convective precipitation, presumably because of the difficulty of representing convective inhibition (e.g. Brockhaus et al. 2008;Dai and Trenberth 2004;Dai 2006).
The inherent problems associated with parameterization of atmospheric convection has motivated interest in higher-resolution (< 4 km) models that allow treating deep convection as largely resolved rather than parameterized, so called "convection-permitting" models (Prein et al. 2015). Convection-permitting regional climate model (CPRCM) simulations have been widely shown to alleviate, at least to some extent, these biases. Compared to RCMs with parameterized convection such improvements have been most evident through a closer match of the diurnal cycle to observations (e.g. Leutwyler et al. 2017;Ban et al. 2014;Prein et al. 2013a;Brisson et al. 2016;Gao et al. 2017;Belušić et al. 2020) and improved representation of subdaily high-intensity precipitation events (e.g. Murata et al. 2017: Lind et al. 2016, Ban et al. 2014Kendon et al. 2014;Fosser et al. 2015). CPRCMs have also shown higher skill in simulating seasonal characteristics of snow conditions, such as snow pack and snow cover, in mountainous areas (Ikeda et al. 2010;Prein et al. 2013b: Kawase et al. 2018Rasmussen et al. 2011).
Importantly, uncertainties in future climate responses of precipitation on the regional and local scale, particularly short-duration intense convective precipitation events, are in part related to the inability of coarser resolution climate models to represent these small-scale atmospheric processes, land-sea contrasts, mountain-valley circulations and fine scale surface properties (Kendon et al. 2017;Pontopiddan et al. 2017;Prein et al. 2015;Westra et al. 2014;Langhans et al. 2013). Moreover, CPRCMs have been shown to potentially respond differently to warming with sometimes stronger increase in precipitation extremes than in coarser scale models (e.g. Kendon et al. 2014;Lenderink et al. 2019). Improved performance and different response strongly motivates the use of CPCRMs despite the requirement of large computational resources.
High-resolution climate modeling efforts over Fenno-Scandinavia have so far been relatively few. Many of the existing efforts are either part of large pan-European multimodel experiments (Christensen and Christensen 2007;Jacob et al. 2014) where spatial resolution has been limited to at best around 10 km, or short-term single-model experiments over parts of the Fenno-Scandinavian region at higher spatial resolution (e.g. Xu et al. 2019;Pontopiddan et al. 2017;Heikkilä et al. 2011;Larsen et al. 2013;Mazon et al. 2015;van Pham et al. 2016).
In this study, we present a two-decade long simulation using the HARMONIE-Climate regional climate model, cycle 38 (HCLIM38-AROME) run at 3 km grid spacing over Fenno-Scandinavia. Lateral boundary conditions are provided by data from global reanalysis, with an intermediate nesting step at 12 km grid spacing using HCLIM38-ALADIN (Belušić et al. 2020). The simulations have been conducted within the Nordic Convection Permitting Climate Projections project (NorCP). NorCP aims to increase the knowledge of climate processes and changes as well as provide detailed climate information over the Fenno-Scandinavian region using the next generation high-resolution climate models. Scenario simulations have also been conducted within NorCP, downscaling two GCMs with otherwise the same model setup as in this study. These results will be presented in a separate paper.
The finer model grid resolution combined with the extensive simulated time period enables us, for the first time, to assess the performance of a CPRCM on climatological time scales over Fenno-Scandinavia. The first part of the paper is devoted to describe the ability of HCLIM38-AROME and the intermediate step HCLIM38-ALADIN to represent climate conditions at continental to sub-continental and seasonal to multi-annual space and time scales in Fenno-Scandinavia. The second part is focused on the added value of the high resolution and we investigate to what extent HCLIM38-AROME improves the representation of regional to local scale climate features compared to HCLIM38-ALADIN, primarily in terms of precipitation.

Model and experiments
HCLIM38 (Lindstedt et al. 2015;Lind et al. 2016;Belušić et al. 2020) is a regional climate modeling system based on the ALADIN-HIRLAM NWP system (Bengtsson et al. 2017;Termonia et al. 2018). A comprehensive description of HCLIM38 model system is presented in Belušić et al. (2020) and only a brief overview is provided here. The HCLIM38 model system provides flexibility as it contains a suite of different physics packages, each adapted for different horizontal grid resolutions. In this study, two packages are applied; (1) AROME which is designed for convection-permitting scales (< 4 km) and which is used with non-hydrostatic dynamics (Bengtsson et al. 2017;Seity et al. 2011;Termonia et al. 2018), and, (2) ALADIN which is the limited-area version of the global model ARPEGE and the default option in HCLIM38 for grid spacings ≳ 10 km (Termonia et al. 2018).
HCLIM38-ALADIN has been run over a domain covering a large part of Europe and eastern North Atlantic (Fig. 1) on a grid with horizontal resolution of 12 km, 65 levels in the vertical and the time step of 300 s. The boundary data were taken from the global ERA-Interim reanalysis (Dee et al. 2011) on a grid with approximately 80 km resolution in the horizontal, available every 6 h. The higher resolution simulation was performed using HCLIM38-AROME on a 3-km grid ( Fig. 1) with 65 vertical levels and the time step of 75 s. HCLIM38-ALADIN provided boundary data every 3 h. The model simulations cover the years 1997-2018 treating the first year as spinup not used for model evaluation. For convenience, from now on the shorter HCLIM3 and HCLIM12 acronyms will be used for HCLIM38-AROME and HCLIM38-ALADIN, respectively.

Verification data
HCLIM model simulations are evaluated against a number of different observations and reanalysis products, summarized in Table 1. Observations are associated with, sometimes substantial, uncertainties that originate from multiple sources. These include instrument errors and uncertainties, location representativeness (e.g. point measurements from meteorological stations or areal averages from remote sensing), post-processing (interpolation to a grid, quality checks) and spatial and temporal characteristics of the variable itself Lundquist et al. 2019;Prein and Gobiet 2017;Herrera et al. 2019). Systematic biases in gaugebased observations of precipitation can be large, especially in windy conditions and for snowfall (Lussana et al. 2018;Adam and Lettenmeier 2003;Rubel and Hantel 2001). The main factors contributing to these biases are the local deformation of the wind field by the gauge (Wolff et al. 2015), wetting and evaporation losses as well as underestimation of trace amounts (Yang et al. 2005;Rasmussen et al. 2012), all conspiring to lower precipitation sums compared to the real values. In general for the Baltic Sea area, Rubel and Hantel (2001) estimate that undercatch may be up to 20-50% in winter while being less than 5% in summer.
Climate model evaluation exercises, wherever possible, rely on gridded reference data sets which involves spatial analysis and interpolation of point measurements onto a The nested HCLIM3 domain is represented by the inner black rectangle. The color scale represents the altitude above mean sea level in meters. The magenta colored polygon defines the Fenno-Scandinavian region used in the analysis regular grid. The quality of these products largely depends on the availability of stations to base the interpolation on, implying that in regions where station density is low the quality of the gridded product is also lower . Obtaining high-quality observational data over mountainous areas is notoriously difficult due to the large spatial variability over the terrain and the lack of dense networks of stations in these regions (Hughes et al. 2017;Lussana et al. 2018;Lundquist et al. 2019). Lundquist et al. (2019) even conclude that for many mountain ranges, high-resolution atmospheric models actually capture range-wide annual precipitation with larger accuracy than the collection of precipitation gauges. A few existing gridded datasets quantify uncertainty estimates through the production of ensemble values. For example, Newman et al. (2015) created a stationbased gridded data set of ensembles of daily precipitation and temperature for the conterminous United States. In addition to providing more rigorous estimation of uncertainty this also allows uncertainty estimates to be propagated in downstream applications such as hydrological modeling where uncertainties in input meteorological fields are important. From version 16 and onwards of the pan-European gridded E-OBS data set, a new method of interpolation has been applied generating a 100-member ensemble for each daily field of precipitation and temperature (mean, maximum and minimum) (Cornes et al. 2018). The uncertainty quantified by the ensemble only relates to interpolation uncertainty and, as such, is more closely related to uncertainty due to station density than uncertainty in the original data.
Here the model data is compared to the ensemble mean of the E-OBS ensemble using the spread (given by the 5-95% span of the ensemble members) as a measure of the observational uncertainty. Additionally, in some of the figures, the model mean values are related to the inter-annual variability in E-OBS (not to confuse with the ensemble spread), calculated as standard deviations and presented as either grey vertical bars or shadings. Despite the different horizontal resolutions between E-OBS and NGCD (Table 1), the number of stations over Fenno-Scandinavia used in their generation is similar. However, NGCD is based on different interpolation methods, namely Bayesian spatial interpolation, achieving a very high grid resolution of 1 × 1 km. It is important to note that while terrain height in the Scandinavian mountains can reach 2000 m or more, most stations are located below 1000 m, in for example valleys (see Fig. 1 in Lussana et al 2019). This causes large uncertainty in precipitation and temperature values over high-alpine areas and mountain ridges.
In the national datasets, the number of stations is similar or even somewhat larger than used in E-OBS. However, as illustrated in Fig. S1 in Supplementary material, E-OBS includes very few stations over Denmark, while Klimagrid (Table 1) is based on a much denser network, cf. Wang and Scharling (2010). Still, the hourly datasets for precipitation used here are mostly based on lower number of stations compared to their corresponding daily records and/or cover shorter time periods, and are therefore associated with larger uncertainties. The HIPRAD dataset differs from SeNorge and Klimagrid as it is based on radar data and uses data from around 700 rain gauges to correct climatological bias (Berg et al. 2016), thus providing good spatial coverage (except in the mountains). HIPRAD cover the years 2000-2014, however, due to relatively frequent and occasionally extended gaps in the data during the first few years, we only consider the time period 2005-2014.  Wang and Scharling (2010) For the evaluation of HCLIM on seasonal and monthly time scales, all data is remapped to the coarsest grid unless indicated otherwise. However, it is as important to assess the impact on the results using all available data, especially in assessing the added value of HCLIM on kilometer-scale resolution. Therefore, a section is dedicated to investigate the benefits of HCLIM3 using model data and observations on native grids. Since most verification data is available for land areas only (except ERA5 and CLARA-A2), when data is averaged over the Fenno-Scandinavia region (as defined by the magenta colored polygon in Fig. 1) we consider only land points. The analyzed period is reduced to match the observations (unless otherwise stated). In addition to the data sets in Table 1, stationbased observations provided by the Norwegian meteorological institute (MET Norway) (Lussana et al. 2018) and the Swedish Meteorological and Hydrological Institute (SMHI) (available from https ://www.smhi.se/data) have been used. Standard evaluation metrics such as mean bias, root-mean-square error (RMSE) and Pearson correlation coefficient (PCC) are used in the study.

Precipitation analysis
Precipitation is characterized by strong heterogeneity, both in space and time; from large multi-day synoptic scales associated with cyclones and frontal systems, through meso-scale features like squall lines and orographic uplifting, down to isolated convective showers with lifetimes of an hour or less. The "Analyzing Scales of Precipitation" (ASoP) method (Klingaman et al. 2017;Berthou et al. 2018) provides a valuable means to evaluate the spatial and temporal aspects of precipitation distributions. In ASoP, the precipitation distribution is separated into discrete bins of different precipitation intensities. The bins are defined in such a way that the number of events in each bin is similar (Klingaman et al. 2017). The contribution of each bin to the total precipitation is expressed as either an actual or fractional contribution. In the former, for each bin the frequency of events (i.e. counts) is multiplied by its mean precipitation giving a contribution in units of mm per time unit. The sum of all bins is then equal to the total mean precipitation of the full distribution. The actual contribution provides information on how much each precipitation rate (each bin) contributes to the total mean and which parts of the distribution are responsible for eventual biases (if compared to another distribution). The fractional contribution is retrieved by scaling each bin's actual contribution by the total mean precipitation, thus providing information on the relative contribution of different precipitation intensities, i.e. the shape of the distribution regardless of total precipitation. A fractional contribution index (FC) is defined following Berthou et al. (2018) and given by; quantifying the absolute differences in fractional contributions per intensity bin (FC i ) between a model (mod) and reference data (ref). The FC values ranges between 0 for a perfect match and 2 for no overlap at all. We will show percentage differences between FC(HCLIM3, ref) and FC (HCLIM12,ref) which implies that a negative difference means added value in HCLIM3 compared to HCLIM12 and vice versa for a positive difference.
ASoP may be applied to each grid point in a domain, allowing one to assess the spatial patterns of the precipitation distribution. The method is applicable to any input grid and temporal resolution. Here, ASoP is calculated per grid point, both while keeping data on native grids and when remapped to a common grid before calculation. This is done for both daily and hourly time scales.

Large-scale circulation, precipitation and temperature
The large-scale circulation over northern Europe and Scandinavia is characterized by strong westerly flow during the cold season when the storm track over the North Atlantic is in its most active phase (Fig. 2a), turning to generally higher mean sea-level pressure (MSLP) and weaker gradients during the summer. The seasonal MSLP patterns in both configurations of HCLIM are similar to ERA5 over Fenno-Scandinavia. Positive anomalies (higher pressure) are visible in summer (June, July, August-JJA) over continental Fenno-Scandinavia, and negative anomalies north of and over northernmost Fenno-Scandinavia in winter (December, January, February-DJF). In winter, when low-pressure systems frequently pass over the region, the variability of daily mean MSLP averaged over Fenno-Scandinavia is larger than in summer (Fig. 2b, top panel). The variability is similar in E-OBS and ERA5 and is well reproduced in HCLIM with the exception of an underestimation of the MSLP associated with the strongest high-pressure situations in winter in both HCLIM12 and HCLIM3. Both HCLIM12 and HCLIM3 have, on average, larger precipitation amounts than observations throughout the year over Fenno-Scandinavia ( Fig. 3; Table 2a), although still within E-OBS inter-annual variability (here defined as plus/ minus one standard deviation of monthly mean values in Fig. 3). Further, we note that the seasonally averaged 90% E-OBS ensemble spread, reflecting interpolation uncertainties, is relatively large in both seasons (Table 2a; Fig.  S1), particularly in summer, and the region average biases in both HCLIM12 and HCLIM3 are less than half of this spread. The differences between the models are largest in late spring and summer when HCLIM12 simulates excessive precipitation amounts compared to HCLIM3. In the dominating westerly flow the majority of the accumulated winter precipitation falls at the western continental boundary of Fenno-Scandinavia, west and upslope of the Scandinavian mountains but also with local maxima in western Denmark and southern Sweden (Fig. 4). The lee effect of this barrier  gives rise to a strong negative eastward gradient in precipitation amounts. This feature is well captured by HCLIM; however, larger differences are seen over complex topography and over the northernmost parts of Norway, Sweden and Finland, where observations show a minimum. There simulated precipitation is around 0.5-2 mm/day larger than observations, corresponding to > 50% higher values ( Fig. 4 and supplementary Figure S2). However station density is relatively low and thus uncertainty levels in E-OBS are higher (Fig. S1). In addition to higher amounts of winter precipitation HCLIM and ERA5 overestimate the variance of daily mean values over Fenno-Scandinavia compared to E-OBS (Fig. 2b, middle panel). In summer HCLIM12 is on average approximately 25% wetter than E-OBS (Table 2a) with extensive areas of 10-50% higher amounts (Fig. 4, Fig.  S2). In HCLIM3, on the other hand, average summer precipitation is represented more accurately; somewhat drier in the south and south-east and wetter in the north, again mostly along the mountain range, but on average in good agreement with E-OBS (Figs. 3, 4; Table 2a). As discussed earlier (Sect. 2), high-resolution (~ 5 km grid spacing or less) atmospheric models have often shown improved temporal and spatial distributions of precipitation over complex terrain (Lundquist et al. 2019;Hughes et al. 2017;Prein et al. 2013b), due to better resolved dynamics and processes and their interactions with the strongly heterogeneous surface. Ikeda et al. (2010) performed nonhydrostatic model simulations at kilometer-scale resolution over the Colorado headwaters in the US for a set of winter seasons, and results showed good agreement with observations. Similar conclusions have also been drawn in studies over other mid-latitude areas with complex terrain, for example Japan (e.g. Kawase et al. 2018). However, in HCLIM3 the wet bias over high-altitude terrain (compared to E-OBS and NGCD) is larger than in HCLIM12. Part of this could be related to model physics, for example the micro-physics (e.g. Liu et al. 2011). Also, in HCLIM fixed values are used for the cloud concentration nuclei (CCN) numbers over sea (100/cm 3 ), land (300/cm 3 ) and cities (500/cm 3 ). Preliminary sensitivity results over Norway indicate that using more realistic CCN values can improve the negative bias in precipitation seen in the coastal regions and the positive bias over mountainous regions (O. Landgren, conference presentation, Joint 30th ALADIN Workshop and HIRLAM ASM 2020).
However, part of the wet bias seen here in winter is likely due to too low precipitation values in E-OBS. In addition to undercatch problems (which are not accounted for in the production of E-OBS data set), the sparseness of stations in mountainous regions mostly located below high-alpine areas and peaks, leads to high uncertainties and likely underestimated precipitation sums (Isotta et al. 2015). Also, Lussana et al. (2018) argue that the SeNorge observation data set, covering Norway and part of NGCD data set (see Table 1), underestimates precipitation over southern Norway, a region where HCLIM, especially HCLIM3, has a wet bias compared to E-OBS (Fig. 4). Interestingly, Crespi et al. (2019) was able to improve monthly climatologies of precipitation over Norway, especially in remote mountainous regions, by combining the output of another simulation with HCLIM run at 2.5 km resolution with in situ observations. As E-OBS and NGCD show similar amounts this indicates that the apparent bias in HCLIM may partly be due to problems with the observations. Using only grid points located below 500 m altitude (according to orography in E-OBS) reduces the winter wet bias in HCLIM3 by nearly 50% and to a lesser extent in HCLIM12 (Fig. 3). Still, as seen in Sect. 3 below, added value is emergent in HCLIM3 over the Scandinavian mountains when comparing snow and rain ratios directly to in situ observations.
Prior to analysis of the near-surface temperature (T2m) the model data have been interpolated to the E-OBS grid and height compensated for altitude differences between the topography of the models and E-OBS, assuming a timeinvariant lapse rate of 0.65 K/100 m. Daily mean T2m is overall lower in HCLIM compared to observations and reanalysis in most seasons (Figs. 3, 5; Table 2a), especially in summer when both HCLIM12 and HCLIM3 have larger differences than half of the E-OBS ensemble spread (Table 2a). At the same time, we note that a cold bias in T2m in winter and summer over Fenno-Scandinavia has been seen in other state-of-the-art RCMs as well (see e.g. Figure 2 in Belušić et al. 2020) indicating a common model deficiency. The cold bias in HCLIM is most pronounced in northern Fenno-Scandinavia during the summer months where, on average, the simulated daily mean T2m is around 2 °C too low in both HCLIM3 and HCLIM12 (Fig. 5). In DJF colder temperatures compared to E-OBS are collocated over the Scandinavian mountains, especially over the ridge. The differences are in part related to differences in orographic heights in the model and observations, and with the more accurate representation of orography in HCLIM3 the differences are smaller. Although a lapse rate-correction of T2m has been applied, we note that this can sometimes be problematic during winter when inversions are present in the valleys. East of the mountains, over mid-and northern Sweden and over Finland there are higher winter temperatures of up to 1-2 °C in the model. As seen in Fig. 2b (bottom panel), the variance of daily mean temperature in winter is underestimated (more so in HCLIM3 than in HCLIM12). In particular days with low temperatures (lower quartile and outliers in box plot) are underrepresented, indicating underestimation of the intensity and/or frequency of cold days.
The mean diurnal range of T2m is underestimated compared to E-OBS both in winter and summer except over the Scandinavian mountains, the Baltic states and Denmark where the range is similar or overestimated (Fig. 6). A comparison of daily minimum and maximum temperatures provides further insight. The smaller range in winter is mainly due to higher minimum temperatures than in E-OBS (reaching 3-4 °C higher values in some parts of northern Sweden and Finland), although lower maximum temperatures also contribute to a lesser extent (Fig. S5). In the warm season, with emphasis in summer, HCLIM suffers from strong underestimation of the daily maximum temperatures, with biases larger than the entire 90% ensemble spread in E-OBS (Table 2a). The largest differences, up to approximately 4 °C lower maximum temperatures, occur over northern Scandinavia. However, the minimum temperatures are at the same time also lower than observations (not shown), offsetting some of the bias in diurnal range due to maximum temperatures alone. We note that in parts of Denmark, northern Poland and the Baltic states the combination of daily minimum temperatures being colder and the daily maximum temperatures somewhat warmer results in the diurnal temperature range to be significantly overestimated. However, the uncertainty in E-OBS daily minimum and maximum temperatures is high in these areas with a 90% spread between 3-4 °C (Fig. S1) making it difficult to judge the severity of the model bias. We conclude that • HCLIM exhibits a significant cold bias in summer, below reported observation uncertainty, which is to a large part attributed to too low daily maximum temperatures. • In winter, daily minimum temperatures are higher in HCLIM compared to observations, especially in northern part of Fenno-Scandinavia. • Both above points lead to an underestimation of the diurnal temperature range.
The too warm daily minimum temperatures in winter may partly be due to an underestimation in HCLIM of the most intense MSLP situations (Fig. 2b). Such highpressure conditions are characteristic of a blocking anti-cyclonic circulation pattern, a feature that has been shown to be underestimated also in other RCMs over Europe compared to reanalysis data (Jury et al. 2019). Furthermore, since modeled T2m values presented here are grid point averages, these are weighted values from different surface tiles and patches. In our simulations, two patches per grid point over continental natural surfaces  (Samuelsson et al. 2011;Li et al. 2015). One reason for the colder conditions are the generally higher albedo over open land compared to forest, with an even stronger albedo difference in the presence of snow (as snow partly resides under instead on top of the forest canopy). Also, the forest canopy has generally larger roughness lengths than open land which hinders development of very stable conditions and subsequent strong nocturnal cooling that frequently occurs in winter. Using model open land T2m instead of the grid cell average has distinct seasonal impacts on the results. In winter, open land daily mean temperatures are indeed lowered which leads to an even stronger domain average cold bias compared to observations (dashed lines in Fig. 3b. See also supplementary Figure S4). Both daily maximum and minimum temperatures are colder than grid average values. The warm bias in minimum temperatures seen for grid average values over large areas in northern Sweden and Finland turns to a cold bias compared to E-OBS (Fig.  S4). A consequence of this is that the amplitude of the diurnal cycle in open land T2m is higher compared to that in the grid cell average and is therefore in closer agreement with observations (Fig. S6). In contrast, in summer the grid average and open land temperatures are overall very similar.

Clouds and radiation
In this section, we make a limited evaluation of cloud cover and surface radiation and heat fluxes in HCLIM, focusing on the summer months. As seen in Fig. 7, compared to ERA5 HCLIM3 has around 20-30 W/m 2 lower values of net surface radiation (RNS) during the daylight hours in summer. This is mostly due to lower fluxes of incoming solar radiation (see below) that further cause the sensible heat (SH) fluxes to also be underestimated. In HCLIM12 the bias in RNS is also negative but not as large. However, it overestimates the latent heat (LH) fluxes while underestimating SH fluxes, which is most likely related to the overly wet conditions (Figs. 3, 4) leading to a stronger near-surface  Fig. 5 but for mean diurnal range of the near-surface temperature, i.e. the difference between daily maximum and minimum temperatures evaporative cooling. HCLIM3 on the other hand has similar LH as ERA5 but underestimates SH, which means that the Bowen ratio is lower in HCLIM3. The annual cycle of monthly mean shortwave down-welling radiation (SWd) in HCLIM was further compared to 17 measurement stations located over Sweden and operated by the Swedish Meteorological and Hydrological Institute (SMHI). This confirmed the above results; in the warm season HCLIM values were most often lower than observations by more than one standard deviation of the measurements (not shown), where standard deviation was defined by the monthly inter-annual variability in station data. On average HCLIM has between 10-15 W/m 2 lower SWd fluxes (at individual sites up to 25-30 W/m 2 lower), the bias being most evident for stations located in northern Sweden.
Compared to the CLARA-A2 satellite data, HCLIM and ERA5 both simulate larger total cloud cover (clt) fractions and correspondingly lower fluxes of SWd in JJA over Scandinavia (Fig. 8a, b). The largest differences occur in HCLIM3 (Table 2b), particularly over northern Scandinavia which is also the region with the largest errors in daily maximum T2m (compare Fig. 8 with Fig. 5, see also Fig. S7). The latter is significantly driven by the amount of solar radiation available at the surface which peaks around local afternoon. A comparison of cloud cover for cloud types differentiated by height of occurrence (low-, medium-and high-level clouds) with respect to CLARA-A2 and ERA5 (not shown) indicates that the positive cloud cover bias in HCLIM is primarily due to larger seasonal average fractions of low-level clouds. These clouds often make efficient shields for incoming solar radiation due to their high opacity. The frequency distributions of JJA monthly mean clt fractions differ between HCLIM12 and HCLIM3 (Fig. 8c). In the CPRCM the shape of the distribution is similar to CLARA-A2 but shifted to larger cloud fractions. HCLIM12 on the other hand has a more narrow distribution with higher frequency of cloud cover fractions in the 55-75% range (close to the most typical values in CLARA-A2) and lower frequencies elsewhere. In the frequency distributions of JJA monthly mean SWd (Fig. 8d) there is a clear shift to lower values in HCLIM compared to both CLARA-A2 and ERA5, with similar biases in HCLIM12 and HCLIM3. We further note that the seasonal means of down-welling longwave radiation at the surface (LWd) as simulated by HCLIM are similar to ERA5. The differences are generally within the ± 10% range, somewhat weighted towards lower values in HCLIM (not shown). In conclusion, in the HCLIM simulations there is evidence of: • Too large summer season cloud cover fractions over Fenno-Scandinavia, especially the northern part where differences reach 15-20% compared to satellite data. HCLIM3 has larger bias than HCLIM12 (RMSE of 6.0 and 3.7 respectively, see Table 2b). • Concurrent underestimation of short-wave radiation reaching the surface with a net surface radiation energy deficit compared to ERA5.
The overestimated cloud cover is thus most likely the major cause for the lower than observed daily maximum temperatures through a deficit in SWd (compare e.g.  RCMs inherently struggle to represent cloud structures with, for example, often too large cloud fractions (Kothe et al. 2011) especially for high clouds (Böhme et al. 2011) and do not capture the diurnal cycle of some cloud types particularly well. Using CPRCMs some of these biases are reduced (Hentgen et al. 2019). For example several studies have shown decreased cloud fractions (Ban et al. 2014;Brisson et al. 2016) or more frequent cloud-free conditions (Prein et al. 2013a) in CPRCMs compared to RCMs. As a consequence there is an increase in SWd that impacts the near-surface temperatures.
The larger overestimation of cloud cover in HCLIM3 than in HCLIM12 thus contrasts with other studies showing instead reduced cloud fractions in the warm season in CPRCMs compared to RCMs (noting again that HCLIM12, apart from the presence of convection parameterization, differs in model physics for the atmosphere from that in HCLIM3). The lower Bowen ratio in HCLIM3 compared to ERA5 indicates that more energy is used in surface evaporation and less in heating of the atmosphere through sensible heat. This could mean a too strong moistening of the planetary boundary layer and too high cloud cover fractions during the day which would lead to reduced SWd. In accordance with this, in recent NWP versions of HARMONIE-AROME (similar physics as in HCLIM3) there have been problems with the Leaf-Area-Index (LAI) in the model, causing too much daytime evaporation (S. Tijm, personal communication, May 2020). Further investigation is needed to establish the causes for the temperature, cloud and radiation biases; however, this is beyond the scope of this study.

Benefits of high-resolution HCLIM
In this section, we focus on investigating the added value of applying HCLIM in convection-permitting configuration over Fenno-Scandinavia, with emphasis on precipitation. We address three aspects of added value by showing examples of improved performance in representing; (1) different precipitation intensities including extreme precipitation, (2) the diurnal cycle of summertime precipitation and (3) the fraction of solid precipitation in high-altitude areas.
As shown earlier, HCLIM is on average wetter than the E-OBS and NGCD observations over Fenno-Scandinavia throughout most seasons (Fig. 3a). For winter, the ASoP analysis of daily precipitation shows that almost all precipitation intensities in HCLIM12 and all in HCLIM3 contribute to the higher total mean amounts, with the largest contribution from intensities between 1 and 10 mm/day (Fig. 9a). HCLIM3 overestimates intensities just above 10 mm/day unlike HCLIM12, while both HCLIM3 and HCLIM12 overestimate intensities higher than ca 30 mm/ day, possibly linked to the mentioned observational uncertainty. In terms of fractional contribution, HCLIM3 is in closer agreement to NGCD for low (< 5 mm/day), Fig. 9 Actual contributions per intensity bin to the total mean precipitation, units in mm per time unit. DJF (a) and JJA (b) daily precipitation over Fenno-Scandinavia, and, JJA hourly precipitation over Norway (c), Sweden (d) and Denmark (e) compared to national high-resolution data sets (see Table 1). Lower panels in each row show the differences compared to reference observations (given by black lines in respective upper panels). In a, b all data has been remapped to the E-OBS grid prior to analysis. In c-e the data were kept on native grids, except that for HCLIM3 the analysis was made both on the native-grid data (dashed line) and the data remapped to the HCLIM12 grid (solid line) moderate (5-20 mm/day) and high (> 20 mm/day) precipitation rates (Table 3). Figure 9b shows that the wetter conditions in summer in HCLIM compared to NGCD are due to too large contributions from nearly all intensities, however, the biases are smaller in HCLIM3. Furthermore, the shape of the distribution (fractional contributions per intensity bin) in HCLIM3 is in remarkable agreement with NGCD (Table 3; supplementary Figure S8). HCLIM12, on the other hand, clearly has too large contributions from events with intensities < 10 mm/day compared to higher intensity events.
A significant part of summer precipitation is of convective nature (either "purely" as in meso-scale convective systems or embedded in larger scale features like cold fronts) and tend to be of short duration with moderate or high intensities . The more accurately captured summer daily precipitation distribution in HCLIM3 most likely reflects an improved representation of these convective events. This is further investigated through analysis of summer precipitation on the hourly time scale over three regions where observations are available (Table 1): Norway, Sweden and Denmark. The ASoP analysis (Fig. 9c-e) reveals that for all three regions HCLIM12 overestimates the contribution from low-to-moderate intensities (below ca 3 mm/h, see also Figure S8), which is a symptomatic behavior seen in coarser scale RCMs (Leutwyler et al. 2017;Berthou et al. 2018). The too frequent "drizzle" in HCLIM12 is relatively independent of the geographical location; compared to NGCD the model has generally 30-70% higher frequencies of wet days (see supplementary Figure S3). For higher intensity events (5-20 mm/h), on the other hand, HCLIM12 has lower contribution than observed. In HCLIM3 contributions from low-to-moderate intensity events are reduced and total mean differences are small (Fig. 9c-e). As for daily precipitation the shape of the distributions in HCLIM3 are markedly closer to observations with improvements of around 40-70% for low, moderate and high intensities (Table 3; Fig. S8). There is a tendency in HCLIM3 to have larger contributions from more extreme precipitation rates, between 5 and 20 mm/h, compared to observations, especially in Denmark (Fig. 9e). The small differences between HCLIM3 results on native grid and on the HCLIM12 grid (only a marginal shift to higher intensities on the native grid) bear witness to up-scale added value, i.e. that the improved performance of HCLIM3 still remains even if data is spatially aggregated prior to analysis. The uncertainties in observations are generally larger for more intense precipitation events. In summer, extreme precipitation is often characterized by shortduration localized events and thus may be misrepresented in observations or missed entirely. The larger contributions in HCLIM3 compared to Klimagrid for high precipitation rates (Fig. 9e), where HCLIM12 is in closer agreement, are very likely in part due to underestimation of intense events in the Klimagrid observations. Klimagrid is based solely on station data (rain gauges) and thus intense localized events are expected to be underestimated to some extent due to incomplete spatial coverage.
Of particular interest is the question of how precipitation extremes are represented in HCLIM because of its societal and environmental impacts. Figure 10 shows differences with respect to observations of calculated higher percentiles (above the upper quartile) for daily precipitation (DJF and JJA) over Fenno-Scandinavia and for hourly precipitation (only JJA) over the three selected countries. On daily time scales, irrespective of season, HCLIM3 is in closer agreement with NGCD observations than HCLIM12, with differences being within ± 5%. HCLIM12 and E-OBS have overall lower probabilities. Note that all data is kept on native grid resolutions, however, similar conclusions can be drawn when data has been interpolated to the coarsest grid prior to analysis (not shown). E-OBS has around 10-15% lower estimates than NGCD, with largest differences for the Table 3 Percentage differences in FC index between HCLIM3 and HCLIM12 (see Sect. 2) for three separate intensity levels (in mm/day or mm/h) corresponding to low, moderate and high precipitation rates The last column, Total, considers the full distribution. Negative values means improved performance of HCLIM3 compared to HCLIM12, and vice versa for positive values. Tabulated data is based on results from Figure S8 in Supplement. For daily precipitation NGCD is the reference dataset and for hourly data only JJA season is considered with SeNorge, HIPRAD and Klimagrid as reference datasets respectively highest percentiles, which further emphasize the importance of high-resolution observations in evaluation of high-resolution CPRCMs (Prein and Gobiet 2017). Larger differences between models and observations occur on the hourly time scale (Fig. 10). HCLIM3 has overall larger probabilities than HCLIM12 and, for Norway and Sweden, is in closer agreement with high-resolution observations. Conversely, over Denmark, HCLIM12 has smaller differences compared to Klimagrid observations. The summer mean diurnal variation of precipitation is also more correctly represented in HCLIM3 compared to HCLIM12, both in terms of frequency and intensities (Fig. 11), although the observed inter-annual spread is large. The "drizzle" issue in HCLIM12 again stands out clearly, particularly in the wet-hour frequency. The overestimation Fig. 10 Percentiles of daily (left) and hourly (right) precipitation, given as percentage anomalies with respect to reference data sets. For daily data, values are averages over Fenno-Scandinavia for both DJF (solid lines) and JJA (dashed lines) seasons, and the reference data set is NGCD (Table 1). For hourly data JJA is shown with averages com-puted over Norway (solid lines), Sweden (dashed lines) and Denmark (dot-dashed lines) and reference data sets are SeNorge, HIPRAD and Klimagrid respectively (Table 1). Only wet days (> 1 mm/day) and wet hours (> 0.1 mm/h) are used in the percentile calculations Fig. 11 JJA diurnal cycles of hourly precipitation over Norway (left column), Sweden (middle column) and Denmark (right column). Top row shows wet-hour frequencies in percent (%) and bottom row mean precipitation intensities in mm per hour. Grey shading in bottom panels represent observed ± one standard deviation derived from seasonal averages, i.e. the inter-annual variability is largest during daytime with a distinct and higher peak in precipitation intensity around noon. In all three countries the observed peak in precipitation occurs later in the afternoon and HCLIM3 is able to capture the timing in closer agreement with observations than HCLIM12, most evident in Norway and Sweden. Walther et al. (2013) investigated the sensitivity of the representation of diurnal cycle of precipitation to the model grid resolution in summer over Sweden validating against rain gauges. They found that successively refining the grid spacing from 50 to 6 km in a model that parameterizes convection led to an improved timing of the peak by shifting it toward the end of the afternoon. However, even at 6 km resolution the model showed a too early peak. In contrast, HCLIM3 is able to capture the precipitation peak. Comparing HCLIM to 103 rain gauges covering Sweden for the time period 1998-2017 further supports the results above that HCLIM3 improves the diurnal variation of precipitation amounts compared to HCLIM12 (Fig. S9). Differences in performance seen here, especially for the timing of the diurnal peak, are rather typical when comparing models with parameterized convection and CPRCMs (Prein et al. 2015) and similar added value has been seen in other areas in Europe, for example in Central and Western Europe (Ban et al. 2014;Leutwyler et al. 2017;Berthou et al. 2018;Fosser et al. 2015;Belušić et al. 2020). However, over Denmark the shape of the diurnal cycle is not as well represented in HCLIM3 compared to Klimagrid (HCLIM12 have similar shape as in the other regions). In particular, HCLIM3 exhibits a minimum in precipitation intensity during the morning hours whereas in Klimagrid the minimum occurs earlier during the night. Although there are uncertainties regarding Klimagrid precipitation intensities, particularly high intensities, there is larger reliability in representing the shape of the mean diurnal cycle correctly. Indeed, the minimum during night and afternoon maximum is expected (ERA5 also shows a minimum during night for the majority of land points in the region, including Denmark; not shown). The reason for the anomalous behavior of HCLIM3 is not clear at this stage and will be further investigated.
Evaluating simulated snowfall is challenging for several reasons, the cornerstone being related to the difficulties in measuring snowfall (e.g. Rasmussen et al. 2012). Rasmussen et al. (2012) have shown that for the same area, snow observations can largely differ due to the wind influence, adding difficulties to validate simulated snow accumulation even with in-situ observation. To overcome this issue, we use the annual fraction of solid precipitation to validate the simulated results with station-based observations (Fig. 12). This variable is calculated by dividing the number of days with solid precipitation by the total number of wet days. The observations are provided by the Norwegian Meteorological Institute (Lussana et al. 2018). Figure 12b is showing a clear increase of the fraction of solid precipitation with higher elevation when compared to Fig. 12a; mostly due to a more realistic representation of the topography in HCLIM3. There are, however, also considerable changes over low elevation areas in Sweden and Finland, indicating that the different physics and microphysical schemes in the two model versions are also impacting the amount of snowfall. When zooming in on a smaller area (Fig. 12c, d), one can see how the topography has an important impact on solid precipitation; the smoother topography of HCLIM12 results in an Fig. 12 Annual mean fraction of solid compared to total precipitation as simulation by HCLIM12 (a) and HCLIM3 (b), while c, d are showing a regional zoom comparing the simulated results to station-based observations (black circles). e The fractions of solid precipitation as a function of elevation for observations (black) over Norway and the associated nearest grid point from HCLIM12 (blue) and HCLIM3 (red). The top and right panels are showing the density plots for the fraction of solid precipitation and elevation, respectively underestimation. The observations overlaid on Fig. 12c, d (dots) reveal that HCLIM3 overall has a better representation of simulated solid precipitation, as a combined result of the changed physics schemes and higher spatial resolution. Figure 12e shows the fraction of solid precipitation as a function of the elevation in HCLIM12, HCLIM3 and the observations (blue, red and black, respectively). It appears that while HCLIM3 is still not reproducing the observed snowfall frequency at the station locations, there is a clear shift toward a larger snowfall fraction for the higher resolution model, making HCLIM3 more consistent with the observations. Note though that the approach used to select grid-point does not take into account the elevation, which might have resulted in larger differences for some locations due to the highly complex topography of the region.

Summary and conclusions
21-year long simulations covering the years 1997-2018 have been conducted using the HCLIM cycle 38 regional climate model: HCLIM12 using ALADIN physics configuration at 12 km grid spacing with hydrostatic dynamics and a convection parameterization scheme; and HCLIM3 using AROME physics with non-hydrostatic dynamics and convectionpermitting horizontal resolution of 3 km. HCLIM12 was applied over a domain covering the main part of Europe (excluding the southernmost regions) while HCLIM3 was applied over an inner nested domain covering the Fenno-Scandinavian region. The HCLIM3 simulation is, to the best of the authors' knowledge, the first long-term simulation at convection-permitting scales covering Fenno-Scandinavia. Our motivation is to investigate the benefits from such highresolution climate simulations in this region.
The results are summarized as the following: • The long-term annual and seasonal means of the climate variables analyzed here are reasonably well captured over Fenno-Scandinavia by HCLIM (both on 12-and 3-km grid resolutions) mostly within reported uncertainties in observations. • One important exception is the inadequate representation of the diurnal variation of the near-surface temperature, particularly during the summer season when HCLIM strongly underestimates daily maximum temperatures over widespread land areas. • The underestimated daily maximum temperatures entails a negative bias in the model in shortwave down-welling radiation (SWd) at the surface, both compared to satellite, reanalysis and station data, spatially correlated with a positive bias in cloud cover. • Over Fenno-Scandinavia, HCLIM is able to reproduce spatial and temporal characteristics of daily mean pre-cipitation, although the coarser model (HCLIM12) has in general wetter conditions in summer with on average 25% more precipitation than E-OBS observations. • The relative contributions from different daily precipitation rates to the total mean in summer are remarkably well captured by HCLIM3 indicating high skill in representing the underlying processes. HCLIM12, with parameterized convection, instead distinctly overestimates low-to-moderate precipitation rates, a major culprit for the wetter conditions. • On the sub-daily time scales there is more clear evidence of added value for precipitation in HCLIM3, especially in summer. In addition to improved representation of the contribution of different intensities to the total mean, including extremes, HCLIM3 also shows improved timing and amplitude in the diurnal cycle. • HCLIM3 has improved fractions of solid to total precipitation in high-altitude areas for which the high-resolution representation of the complex orography in mountain areas plays an important role.
Observations are inherently associated with uncertainties, and parts of the model biases seen here may be related to observations, especially in the Scandinavian mountains where sparse networks and systematic undercatch of precipitation negatively impacts observational products for this region. Compared to daily time scales the uncertainties in observations are larger on the hourly time scale, primarily due to reduced number of available stations and observed time periods. Results here emphasize the importance of high-resolution observations in evaluation of high-resolution models. There are indications that the strong summer temperature bias in HCLIM are related to surface processes, for example too strong latent heat fluxes relative to sensible heat fluxes (compared to ERA5) which could impact cloud fractions positively and SWd negatively. Also, further investigation of the role of micro-physics and CCN values for the precipitation biases over Scandinavian mountains in winter would help gain further insight into the model biases and deficiencies, as well as more effort to include additional high-quality observations. This is prospect for future studies.
The results presented here are generally consistent with other studies applying CPRCMs. We conclude that there is a clear benefit of using HCLIM38 at the convection-permitting scale over northern Europe, in the summer as well as in the winter season. This demonstration of added-value of high-resolution modeling clearly indicates that such highresolution models should be taken into consideration in studies of future climate change in mountain areas and, consequently, in design and implementation of climate services for such regions.
Acknowledgements This study has been undertaken as part of the NorCP project which is a Nordic collaboration involving climate modeling groups from the Danish Meteorological Institute (DMI), Finnish Meteorological Institute (FMI), Norwegian meteorological institute (MET Norway) and the Swedish Meteorological and Hydrological Institute (SMHI). The authors acknowledge the use of computing and archive facilities at ECMWF and at the National Supercomputer Centre in Sweden (NSC) which is funded by the Swedish Research Council via Swedish National Infrastructure for Computing (SNIC).

Data availability
The authors assure that all data and materials as well as software application or custom code support the published claims and comply with field standards. All data and material is available upon request.

Code availability
The ALADIN and HIRLAM consortia cooperate on the development of a shared system of model codes. The HCLIM model configuration forms part of this shared ALADIN-HIRLAM system. According to the ALADIN-HIRLAM collaboration agreement, all members of the ALADIN and HIRLAM consortia are allowed to license the shared ALADIN-HIRLAM codes within their home country for non-commercial research. Access to the HCLIM codes can be obtained by contacting one of the member institutes of the HIRLAM consortium (see links at: https ://www.hirla m.org/index .php/hirla m-progr amme-53). The access will be subject to signing a standardized ALADIN-HIRLAM license agreement (https ://www.hirla m.org/index .php/hirla m-progr amme-53/acces s-to-the-model s). Some parts of the ALADIN-HIRLAM codes can be obtained by non-members through specific licenses, such as in OpenIFS (https ://confl uence .ecmwf .int/ displ ay/OIFS) and Open-SURFEX (https ://www.umr-cnrm.fr/surfe x).

Compliance with ethical standards
Conflict of interest The authors declare that they have no competing interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.