1 Introduction

For more than 30 years, regional climate models (RCM) have proved to be a valuable tool (Giorgi 2019; Tapiador et al. 2020) to perform long climate change projections at high spatial resolutions that are not computationally affordable by global climate models (GCM), owing to the use of a limited area domain on which available computer power is concentrated. However, as for their GCMs counterpart, conventional RCMs using grid meshes of 10 km and above suffer from the use of deep convection parameterization that is a known source of modeling uncertainty (Kendon et al. 2012; Fosser et al. 2020). Recent progress in high performance computing and in efficient non-hydrostatic atmospheric dynamical cores led to the development of convection-permitting RCMs (CPRCM) capable to perform long-climate simulations with grid meshes finer than 4 km in which deep convection can be explicitly simulated, allowing the removal of deep convective parameterizations (Leutwyler et al. 2016; Lucas-Picher et al. 2021). Following those developments, over the last years, many CPRCMs have emerged in different climate research institutes (Prein et al. 2013; Prein et al. 2015; Ban et al. 2021; Belusic et al. 2020; Coppola et al. 2020, 2021; Pichelli et al. 2021). For instance, a coordinated multi-CPRCM project so-called FPS-convection (Coppola et al. 2020) aims to build an ensemble of CPRCM simulations in which many of these recent CPRCMs are intercompared to investigate present and future convective processes and related extremes over Europe and the Mediterranean Sea region.

Taking advantage of the AROME (Applications of Research to Operations at MEsoscale) non-hydrostatic numerical weather prediction (NWP) model (Seity et al. 2011; Bengtsson et al. 2017; Termonia et al. 2018) running operationally at Meteo-France since 2008, the CNRM-AROME CPRCM has been developed at Meteo-France since 2014. The pioneer work of Deque et al. (2016) led to the quick release of a first version of the CNRM-AROME based on the cycle 38t1. This version showed clear added value with respect to the RCM CNRM-ALADIN through the improvement of the localization and intensity of extreme Mediterranean heavy precipitation events on daily and hourly time scales (Fumière et al. 2020). Since these early developments, a new version of the CNRM-AROME based on the cycle 41t1 has been developed and used for several years (Caillaud et al. 2021; Monteiro et al. 2022). For instance, this latest version has been used to perform the long climate simulations over some regions of Europe required for the FPS-Convection (Coppola et al. 2020) and the EUCP H2020 (Hewitt and Lowe 2018) projects in which the Centre National de Recherches Météorologiques (CNRM) was involved.

In the regional climate modeling literature, several CPRCM have already been described and evaluated using different observations. Since the CPRCM literature starts to be quite extensive, and because this paper is based on a CPRCM developed in Europe and that the evaluation is related to a simulation performed over western Europe, the literature cited below is mostly referring to studies over Europe. For a more comprehensive review of CPRCM models and analyses, the interested readers are referred to Lucas-Picher et al. (2021). Most CPRCMs are driven, and often compared, with an intermediate resolution RCM simulation, which reduces the step change in resolution with their driving field (Matte et al. 2017). Among the first to perform a long-climate simulation at convection-permitting scale, Kendon et al. (2012) compared a 10-year 1.5 km CPRCM (adapted from the UKV forecast model) simulation over southern U.K. with a 12-km RCM (HadGEM3-RA) simulation. Their 1.5-km simulation gave a much better representation of rainfall duration and spatial extent, and a reduction of the long-standing tendency for too much persistent light rain and errors in the diurnal cycle from conventional RCMs. Ban et al. (2014) evaluated a 10-year CPRCM simulation with the COSMO-CLM model over a domain centered over the Alps. They found that the precipitation diurnal cycle and the frequency of heavy hourly events are greatly improved at 2.2-km compared to a 12-km simulation. In their comparison of 50, 7 and 2.8 km of 30-year COSMO-CLM simulations over southwestern Germany, Fosser et al. (2015) showed that their highest RCM resolution significantly improves the representation of both hourly intensity distribution and diurnal cycle of precipitation. Brisson et al. (2016b) reached similar conclusions with their convection permitting COSMO-CLM simulation over Belgium, which clearly improves the representation of precipitation, especially the diurnal cycle, intensity, and spatial distribution of hourly precipitation. However, they found an overestimation of high temperature extremes that was attributed to deficiencies in the cloud properties and to a smaller cloud cover fraction simulated by COSMO-CLM at 2.8 km compared to that observed. In a study over the maritime continent, Argueso et al. (2016) revealed that the amplitude of the diurnal cycle of precipitation and the time of the maximum of the diurnal cycle of precipitation simulated by WRF are improved at 2 km compared to 10 km and 50 km simulations. Knist et al. (2020) evaluated 3- and 12-km WRF simulations over a large Central European domain using hourly rain gauges stations over Germany and Switzerland. Their study underlined that the 3-km simulation reproduced both the diurnal cycle and hourly intensity distribution of precipitation more realistically than the 12-km simulation. In their analysis of Mediterranean high intensity precipitation events in southeastern France, Fumière et al. (2020) and Caillaud et al. (2021), using CNRM-AROME cycle 38t1 and cycle 41t1 respectively, found clear improvements in the localization and intensity of extreme rainfall on daily and hourly time scales compared to the CNRM-ALADIN RCM 12-km simulation.

Refining the grid spacing to convection-permitting scale (< 4 km) to perform multi-year climate simulations is demanding in terms of computer power and storage capabilities (Schär et al. 2020). Thus, CPRCM simulations are still most of the time limited to subcontinental domains. However, Leutwyler et al. (2017) took advantage of a new COSMO-CLM version capable of exploiting graphics processing units (GPU) accelerators to perform a 10-year 2.2-km simulation over a domain covering most of Europe. As previous studies, they found improvements in the diurnal cycle of precipitation, with substantial improvements of the wet-hour precipitation intensity and frequency distributions in summer over complex topographic terrain. Additionally, their analysis showed that the convection-permitting simulation can reproduce the annual cycle of the convective activity that correlates well with the lightning flash density in Europe. Berthou et al. (2020) compared two CPRCMs (Met Office Unified Model and COSMO-CLM) covering a pan-European domain on the representation of precipitation distribution at a climatic scale. They found the largest improvements at 2.2 km for hourly precipitation distribution in regions and seasons where deep convection is a key process: in summer across whole Europe and in fall over the coasts of the Mediterranean Sea. Moreover, they noticed that mean precipitation is increased over high orography, with an increased amplitude of the diurnal cycle. Belusic et al. (2020) introduced the HCLIM38 CPRCM, which uses a different version of AROME than CNRM-AROME described hereafter. HCLIM38 is developed by a consortium of Nordic countries research centres (Sweden, Denmark, Finland, Norway), Spain and the Netherlands and has been used over several regions. Belusic et al. (2020) showed that HCLIM38 can realistically simulate the diurnal cycle and maximum intensity of sub-daily precipitation, something that cannot be accomplished by coarser RCMs or GCMs. Finally, Coppola et al. (2021) introduced a non-hydrostatic version of RegCM4 using three case studies of intense convection events, in which substantial improvements from the 3-km simulations were found when compared to the corresponding 12-km simulations.

In addition to improvements owing to the use of explicit instead of parameterized deep convection, CPRCM climate simulations also benefit from an improved description of the orography. Better-resolved mountains and valleys can lead to the improvements of other meteorological variables than traditional 2-m temperature and precipitation. Indeed, Rasmussen et al. (2011) showed that the simulation of snow water equivalent in the Rockies is more realistic at 2-km than at 6-, 18- and 36-km grid spacings when compared to a network of stations measuring snow water equivalent. Furthermore, focusing on snow over the Alps, Lüthi et al. (2019) showed that a 2-km simulation clearly outperforms simulations with grid spacing of 12 and 50 km. The cumulative amount of snow water equivalent simulated over Switzerland over the whole annual cycle was underestimated by 33% at 12 km and 56% at 50 km, while at 2 km, the difference with observations was less than 1%. In their study comparing 4-, 12- and 36-km RCM simulations, Mendoza et al. (2016) highlighted the effect of the RCM’s horizontal resolution on basin-averaged precipitation amounts, which in turns also affects the river flow simulations of three high-elevation catchments of the Colorado River basin. Finally, Monteiro et al. (2022) highlighted clear advantages of 2.5-km CNRM-AROME compared to 12-km CNRM-ALADIN for 2-m temperature and accumulated precipitation, especially over higher altitudes where they recognized that the simulated precipitation could be sometimes more realistic than their high-resolution regional reanalysis reference, but with an excessive accumulation of snow above 1800 m.

Clouds and surface incoming solar radiation, that are critical for the Earth energy budget and climate, are generally poorly represented in conventional climate models (Hentgen et al. 2019). Indeed, it is recognized that summer high cloud cover, as well as total cloud cover fraction, is particularly overestimated in climate simulations using parameterized deep convection (Brisson et al. 2016b; Prein et al. 2013; Fosser et al. 2015; Hentgen et al. 2019). Generally, CPRCM contribute to reducing high cloud cover overestimation and simulate more frequent clear-sky conditions, increasing solar shortwave radiation reaching the surface, thanks to a better representation of afternoon convective clouds (Keller et al. 2016; Vanden Broucke and van Lipzig 2017) and stronger vertical exchanges (Leutwyler et al. 2017; Hentgen et al. 2019). Sometimes, the reduction of clouds with CPRCM leads to an adverse overestimation of surface solar radiation (Brisson et al. 2016b; Leutwyler et al. 2017), highlighting the need for a diligent calibration of CPRCMs (Leutwyler et al. 2017). For further details on the benefits in using CPRCMs, Lucas-Picher et al. (2021) performed a comprehensive review of climate variables and meteorological phenomena improved at higher resolutions when deep convection is explicitly simulated instead of parameterized.

While CNRM-AROME simulations are used in different European projects and additional simulations are planned, a detailed evaluation of the performance of CNRM-AROME is missing. Since CNRM-AROME papers in the literature focus mainly on extreme precipitation over southeastern France (Fumière et al. 2020; Caillaud et al. 2021) and the Alps (Ban et al. 2021; Pichelli et al. 2021; Monteiro et al. 2022), the ability of CNRM-AROME to simulate other climate characteristics and variables, and for other regions of Europe is currently unknown. A proper evaluation of the new generation of RCMs (CPRCMs) is needed and useful, especially for the users of the simulations produced by these models, and also opening avenues to their further improvement. Therefore, this paper aims at filling these gaps by evaluating a long 19-year 2.5-km CNRM-AROME hindcast simulation over a large northwestern European domain covering the UK, France, almost all of Germany, Switzerland, Denmark, a large part of Spain, and the Benelux. In this evaluation, the CNRM-AROME CPRCM simulation is compared with its driving CNRM-ALADIN RCM 12-km simulation and different observation datasets, including several kilometer-scale, hourly, gridded precipitation datasets covering France, Germany, Great Britain, Denmark, Switzerland, Italy, and the Netherlands. Standard climate statistics such as long-term means and mean annual cycles for precipitation and near-surface temperature over different European regions will be presented. Additional analyses will also include sub-diurnal statistics such as hourly extremes and diurnal cycle of precipitation. Finally, some attention will be paid to other variables such as snow indicators in mountainous regions (Alps), surface incoming radiation, cloud cover and mean sea level pressure. Taking advantage of a large domain centred over France, a special attention will be dedicated to climate characteristics simulated over the whole of France compared to previous CPRCM studies (Fumière et al. 2020; Caillaud et al. 2021) that focused over southeastern France and the fall season only.

This article is structured as follows: in Sect. 2, the CNRM-AROME CPRCM and the CNRM-ALADIN RCM are described and details of the simulations and the observational datasets used for the evaluation are provided. In Sect. 3, the ability of the two models to simulate climate variables such as mean sea level pressure, precipitation, 2-m temperature, snow cover, incoming radiation and cloud cover is evaluated, emphasizing the benefits of finer resolutions. General conclusions and final remarks are reported in Sect. 4.

2 Methodology: description of the models and of the simulations

In this study, a simulation performed at 2.5 km grid resolution by the CNRM-AROME41t1 CPRCM, based on the non-hydrostatic NWP AROME model in which deep convection is explicitly simulated, is compared to a 12-km grid resolution simulation performed by the CNRM-ALADIN RCM based on the hydrostatic dynamics equations and using deep convection parameterization. Both simulations are compared over a large domain covering northwestern Europe. To evaluate both simulations, a collection of national high-resolution sub-daily gridded precipitation datasets based on weather stations, and sometimes including radar estimates, is used. Both climate models and the observational datasets are described below.

2.1 Description of the CNRM-AROME CPRCM

CNRM-AROME41t1 is a high-resolution limited-area CPRCM that has been developed recently at CNRM (Deque et al. 2016; Caillaud et al. 2021). CNRM-AROME is largely based on the operational short-range NWP model AROME (Seity et al. 2011; Termonia et al. 2018) used at Météo-France to produce everyday weather forecasts since 2008. Its standard horizontal resolution is 2.5 km, the resolution previously used in the NWP AROME model today applied operationally at 1.3 km resolution. Initial attempts to use AROME in “climate mode” rather than in “weather prediction mode” can be found in Déqué et al. (2016) and in Lind et al. (2016). The current model’s name CNRM-AROME 41t1 is divided into several parts, CNRM for “Centre National de Recherches Météorologiques”, AROME for “Applications de la Recherche à l’Opérationnel à Méso-Échelle” and “41” for the cycle inherited from the NWP system developed collectively by Meteo-France and the European Centre for Medium-Range Weather Forecast (ECMWF). The “t1” signifies “Toulouse 1” because the cycle 41 is shared with ECMWF and the cycle 41t1 corresponds to the cycle 41 with improvements provided by Météo-France. The AROME NWP model cycle 41t1 was used operationally by Meteo-France to provide weather forecasts over France from December 2015 to December 2017.

The dynamical core of CNRM-AROME is the bi-spectral non-hydrostatic version of the limited-area ALADIN model (Bénard et al. 2010) with a two-time level semi-lagrangian, semi-explicit scheme. The prognostic variables of CNRM-AROME are the same as those of CNRM-ALADIN except for the pressure departure, the vertical divergence of the wind, and the solid and liquid phases of water. CNRM-AROME has no deep convection parameterization since it is assumed that the deep convection is explicitly resolved by the dynamics of the model at 2.5 km resolution. Most physical parameterizations of CNRM-AROME originate from the sub-kilometric Meso-NH model (Lac et al. 2018), including a bulk one moment microphysics scheme, which represents five water species (ICE3 scheme, Pinty and Jabouille 1998) and a sedimentation scheme (Bouteloup et al. 2011; Caniaux et al. 1994). The turbulence of the atmospheric boundary layer is represented by the prognostic turbulent kinetic energy (TKE) equation (Bougeault and Lacarrere 1989). The TKE scheme is derived from the equations of the second-order moment developed by Cuxart et al. (2000). The shallow convection is parameterized using a sub-grid effect (Pergaud et al. 2009) based on the eddy-diffusivity mass-flux scheme (Soares et al. 2004). The radiation scheme of CNRM-AROME comes from the ECMWF radiation parameterizations, the FMR scheme with six bands for shortwave (SW) radiation (Fouquart and Bonnel 1980; Morcrette 2002) and the RRTM scheme for longwave radiation (Iacono et al. 2008; Mlawer et al. 1997). To correct precipitation overestimation and unrealistic divergent winds in the vicinity of convective clouds, the COMAD scheme (Malardel and Ricard 2015) was added to CNRM-AROME. For additional details on CNRM-AROME, the reader is referred to Caillaud et al. (2021) and Seity et al. (2011). The surface scheme of CNRM-AROME is SURFEX 7.3, which is a soil-atmosphere interface (Masson et al. 2013) that procures detailed description of continental surfaces with a high-resolution physiographic database. This version of SURFEX uses a force-restore scheme to transfer heat and water in the soil that has been used for decades and is still in use operationally for NWP. However, the force-restore scheme showed limitations in the representation of surface and soil processes such as the interaction between snow and soil freezing (Le Moigne et al. 2020). Each model grid cell is made of a mosaic of four tiles of different surfaces using different schemes: land (ISBA-3L (Noilhan and Planton 1989) and D95 snow model (Douville et al. 1995)), urban (Town Energy Budget (TEB) (Masson 2000)), sea (COARE3 (Fairall et al. 2003)), and inland waters (lakes and rivers) (Charnock formulation (Charnock 1955)).

In this study, a 2.5 km horizontal resolution with 60 vertical levels and a 60-s time step have been used. A weak relaxation toward CNRM-ALADIN is applied between 15 and 20 km in the vertical for wind divergence, wind vorticity and temperature over the entire CNRM-AROME domain, only for long waves. This nudging compensates for a poor representation of the lower stratosphere in CNRM-AROME in which the 60 vertical levels are mainly located in the troposphere with 21 levels below 2000 m. The monthly sea surface temperatures (SST) used in this simulation are taken from the 80-km ERA-Interim reanalysis (Dee et al. 2011) spatially interpolated to 2.5 km and linearly interpolated in time to obtain SST at daily time step. The aerosol concentrations are taken from the aerosol dataset of Nabat et al. (2013) and the greenhouse gases (CO2, N2O, CH4, CFC11, CFC12) come from those observed before 2005, and then from the Representative Concentration Pathway (RCP) 4.5 scenario (Moss et al. 2010) afterwards.

The CNRM-AROME simulation covers the mandatory northwestern European domain (Fig. 1) of the EUCP H2020 project that has also been performed by the KNMI with HCLIM38 (Lenderink et al. 2021). The lower left corner is located in Portugal and the upper right corner is located in Sweden. Thus, the domain (Fig. 1) covers the UK, Ireland, France, most of Germany, half of Spain and Portugal, Switzerland, northwestern Italy, western Austria, the Benelux, and Denmark. The full domain consists of 720 × 900 grid cells that include an 11 grid-cell bi-periodization zone and a 2 × 21 grid-cell relaxation zone. CNRM-AROME is driven every hour by the CNRM-ALADIN RCM, which is itself driven every 6 h by the ERA-Interim reanalysis. The 12-km RCM CNRM-ALADIN acts thus as an intermediate driving model to reduce the step change in resolution between ERA-Interim at 80 km and CNRM-AROME at 2.5 km. The CNRM-AROME simulation starts on January 1 1998 and finishes on December 31 2018. The first two years considered as spin-up, allowing the three soil layers that are represented by different water reservoirs to reach equilibrium, are discarded from the analysis. Thus, the analysis takes place over the remaining 19 years (2000–2018). Both models are compared over the CNRM-AROME northwestern European domain and integration period. For the sake of brevity, from this point forward, AROME will be used instead of CNRM-AROME.

Fig. 1
figure 1

Elevation (m) of the northwestern European domain of the 2.5-km CNRM-AROME simulation (left) and of the European domain of the 12-km CNRM-ALADIN simulation (right). The black polygons on the left figure indicate the climatic regions of interest for the analysis. The rectangle on the right figure indicates the domain of the CNRM-AROME simulation

2.2 Description of the CNRM-ALADIN RCM

CNRM-ALADIN is the RCM used by the CNRM at Météo-France since the early 2000s (Spiridonov et al. 2005). CNRM-ALADIN is based on the NWP model ALADIN (Aire Limitée Adaptation dynamique Développement InterNational) that is developed by a consortium of European research centres now known as ACCORD. ALADIN is a bi-spectral hydrostatic limited area numerical model with a semi-lagrangian advection and a semi-implicit scheme. In this study, the recent CNRM-ALADIN version 6.3 (Nabat et al. 2020) has been used. It differs from the version 5 (Colin et al. 2010) that has been widely used until today in the CORDEX initiative over the Med-CORDEX, EURO-CORDEX and CORDEX Africa domains (Tramblay et al. 2013; Jacob et al. 2014; Kjellström et al. 2018; Nikulin et al. 2018). The hydrostatic dynamical core of CNRM-ALADIN63 is based on the cycle 37t1 of ARPEGE-IFS and the physical package has been largely renewed since version 5. In particular, CNRM-ALADIN63 includes a new turbulence scheme (Cuxart et al. 2000); a new convection scheme including dry, shallow, and deep convection (Piriou et al. 2007; Guérémy 2011), and a new large-scale microphysics scheme with prognostic liquid/solid cloud/rain variables based on Lopez (2002). An updated version (6 bands) of the shortwave radiation scheme is used (Fouquart and Bonnel 1980; Morcrette et al. 2008). The mixing length is based on Bougeault and Lacarrere (1989) and the PDF-based cloud scheme is based on Ricard and Royer (1993). The deep convection of CNRM-ALADIN63, as well as the dry and shallow convection, is parameterized using a unified scheme (PCMT) proposed by Piriou et al. (2007) and Guérémy (2011). Moreover, CNRM-ALADIN takes advantage of the more advanced SURFEX 8 surface scheme (Decharme et al. 2019) compared to CNRM-AROME SURFEX7.3. Therefore, the CNRM-ALADIN63 takes into account the more recent maps from the ECOCLIMAP-II dataset (Faroux et al. 2013) used in SURFEX 8 for describing its surface, while CNRM-AROME uses the older ECOCLIMAP-I dataset (Masson et al. 2003) used in SURFEX 7.3. Moreover, SURFEX 8 includes ISBA-DIFF, which solves heat and water transfer in the soil using the diffusive equations, rather than the force-restore method in SURFEX 7.3 (Le Moigne et al. 2020). For more details on CNRM-ALADIN63, the reader is referred to Nabat et al. (2020) and Voldoire et al. (2019).

The CNRM-ALADIN63 simulation in this study complies with the EURO-CORDEX standards (Jacob et al. 2014) and its domain that uses a Lambert conformal projection covers most of Europe (Fig. 1). The full domain consists of 480 × 480 grid cells of 12-km grid spacing that include an 11 grid-cell bi-periodization zone and a 2 × 8 grid-cell relaxation zone. The atmosphere is divided with 91 vertical levels, which are distributed between 10 m and 1 hPa with a much higher top of the atmosphere than for AROME, and the time step is 450 s. Despite having more vertical levels in total, CNRM-ALADIN has less vertical levels than CNRM-AROME at lower elevation (17 compared to 21 below 2000 m and 12 compared to 15 below 1000 m). No spectral nudging has been used in this simulation. In this study, CNRM-ALADIN is driven at its lateral boundaries by the ERA-Interim reanalysis that has an approximate horizontal resolution of 80 km (Dee et al. 2011). As for CNRM-AROME, the SST used in this simulation are taken from ERA-Interim interpolated at 12 km and the greenhouse gases convention before and after 2005 are the same as for CNRM-AROME. However, the aerosols fields taken from the TACTIC_v2 climatology (Michou et al. 2020) are different from those from CNRM-AROME. The CNRM-ALADIN simulation is performed over the period 1979–2018. For the sake of brevity, from this point forward, ALADIN will be used instead of CNRM-ALADIN.

2.3 Description of the observational datasets

To evaluate the AROME and ALADIN climate simulations over northwestern Europe, long-term high spatio-temporal gridded observations are required. The daily E-OBS gridded 0.1° (~ 11 km) minimum and maximum 2-m temperature, and precipitation datasets (Cornes et al. 2018) that cover Europe are available, but the quality of this dataset over many regions is questionable due to the low density of weather stations (Lucas-Picher et al. 2013; Prein and Gobiet 2017). To fill the gaps of the sparsely distributed weather stations for producing a gridded product, interpolation is used which will have a smoothening effect on small-scale spatial details. While E-OBS might be fine for evaluating 2-m temperature that varies smoothly in time and space, E-OBS is questionable to evaluate precipitation because of the fine-scale details and large variability in time and space of precipitation. However, a recent addition of about three thousand stations for precipitation, including approximately one thousand stations over France, in the latest E-OBS version 23.1 improved substantially the quality of this product, explaining why we decided to include it in the analysis, and also because it covers the entire Europe.

Thus, as done in Fantini et al. (2018), Prein and Gobiet (2017), Berthou et al. (2020), and Caillaud et al. (2021), high-resolution national gridded observation precipitation datasets have been combined in a single dataset (hereafter called HROBS) to evaluate the climate simulations. Eight national gridded datasets (Table 1 and Fig. 2) have been interpolated to the 2.5-km AROME grid using the CDO (Climate Data Operator; Schulzweida 2021) first order conservative remapping in order to have the smallest impact as possible on the analysis as done in recent studies (Ban et al. 2021; Pichelli et al. 2021; Caillaud et al. 2021). The different national high-resolution gridded datasets are summarized in Table 1 and described in more detail in the appendix. In Fig. 2 showing mean fall (September–October–November: SON) precipitation, HROBS and E-OBS have similar precipitation distributions, but with locally higher values over the Cevennes, northwestern Spain, and western Scotland in HROBS, likely a consequence of a higher density of weather stations and radar coverage, and higher resolution in HROBS than in E-OBS.

Table 1 List of gridded observed precipitation datasets
Fig. 2
figure 2

2000–2018 mean fall (September–October–November) precipitation (mm/d) from eight national high-resolution spatio-temporal observed gridded datasets (HROBS: left figure) and the E-OBS 0.1° dataset (right figure). The datasets in red combine radars and stations, while those in black use only stations. All high-resolution datasets are hourly, except SPREADv2, which is daily. The spatial and temporal resolutions and period of each dataset are indicated on the left figure. Note that some of the datasets cover a shorter period than 2000–2018

Furthermore, to give some explanations of the models’ behaviour, different variables available via satellite datasets from the EUMETSAT CM SAF website (https://wui.cmsaf.eu/safira/action/viewProduktSearch) were used (see overview in Table 2). The 0.25° global CLARA-A2.1 (hereafter CLARA: Karlsson et al. 2017) provides daily mean surface incoming SW radiation, and cloud cover fraction (CCF) for the period 1982–2019. CLARA is based on the Advanced Very High-Resolution Radiometer onboard the polar orbiting NOAA and Metop satellites. The 0.05° European/African COMET (Stöckli et al. 2019) provides hourly CCF for the period 1991–2015. Finally, the 0.05° European/African SARAH-2.1 (hereafter SARAH: Kothe et al. (2017) provides 30-min surface incoming SW radiation for the period 1983–2017. COMET and SARAH are derived from two channels MVIRI and SEVIRI instruments onboard the geostationary Meteosat satellites. Finally, to evaluate the simulated snow cover by the two models, the two high-resolution surface reanalyses UERRA MESCAN-SURFEX (Bazile et al. 2017) at 5.5 km resolution and ERA5-Land (Munoz-Sabater et al. 2021) at 9 km resolution were used.

Table 2 List of satellite datasets

2.4 Description of the regions of interest

In addition to maps, the analysis focuses on four climatic regions (see Fig. 1) characterized by different climate types: (1) Mediterranean climate with hot and dry summer, and rainy fall and winter seasons (region centred over the city of Nimes), (2) humid continental climate with four distinct seasons and for which precipitation is equally distributed throughout the year (box centred over the city of Leipzig), (3) alpine tundra climate with no monthly average temperature above 10 °C (referred to Alps), and (4) oceanic temperate climate (box centred over the city of London). The Alps region corresponds to the ALADIN grid cells above 1500 m over Switzerland. The Mediterranean, humid continental, and oceanic temperate climate regions were determined using 11 × 11 ALADIN grid cell boxes centered over three cities (Nimes, Leipzig and London respectively), except for Nimes that contain land only grid cells below 500 m. The humid continental region centred over Leipzig is near the eastern lateral boundary of the CNRM-AROME domain and can be affected by spatial spin-up issues (Brisson et al. 2016a; Matte et al. 2017). However, considering the prevailing west to east atmospheric circulation over the northern mid-latitudes, the humid continental region is located close to the domain border where the boundary atmospheric outflow occurs, which is less a concern compared to the boundary atmospheric inflow where a spatial spin-up is required to produce small-scale details. Table 3 provides details about each region of interest.

Table 3 List of regions of interest for the analysis

3 Evaluation and comparison of the AROME and ALADIN simulations

3.1 Mean sea level pressure

The large-scale atmospheric circulation over the northwestern European AROME domain located in the mid-latitudes is characterized by a faster winter circulation linked to a larger north to south gradient in mean sea level pressure (MSLP) compared to summer (see Fig. 3). Deviations in the MSLP spatial distribution by AROME and ALADIN from that of ERA-Interim may affect the simulated large-scale atmospheric circulation and explain some of the systematic biases of other meteorological variables indicated below. Since the ERA-Interim reanalysis forces the lateral boundaries of the large European ALADIN domain, small MSLP differences can develop in the center of the domain, keeping in mind that the ERA-Interim fields are not nudged in the interior of the domain. Thus, AROME inherits the MSLP biases of the driving model ALADIN that share similar biases with AROME with respect to ERA-Interim (Fig. 3). However, some small differences in the MSLP seasonal means can be seen over southern France in summer and over the entire France in fall and winter in both of ALADIN and AROME with respect to ERA-Interim. These differences are potentially linked to the internal variability that can be developed due to the freedom of a model that is run over a limited area domain forced only at its lateral boundaries (Lucas-Picher et al. 2008; Sanchez-Gomez et al. 2009). The differences of ALADIN and AROME MSLP with respect to ERA-Interim of a few hPa are small and comparable to those seen in previous RCM simulations over Europe (Sanchez-Gomez et al. 2009).

Fig. 3
figure 3

2000–2018 mean sea level pressure (hPa) of ERA-Interim and biases of AROME and ALADIN with respect to ERA-Interim for each season

3.2 2-m minimum and maximum temperature seasonal mean and mean annual cycle

Figure 4 shows the 2000–2018 mean minimum (Tmin) and maximum (Tmax) 2-m temperature biases for each season for AROME and ALADIN with respect to E-OBS. The biases are mostly between ± 2 °C for each model and for each season, indicating a generally good performance of both models. Both models are too warm (+ 2 °C for Tmin and Tmax) in summer over most of continental Europe, linked to an overestimation of surface solar radiation and an underestimation of CCF (see Sect. 3.7). This summer warm bias in Tmax is smaller in AROME. However, AROME suffers from a cold Tmax bias (− 2 °C) in spring that could be linked to a wet bias and a tendency of the surface to maintain too high levels of soil moisture. Finally, a large cold Tmin bias (down to − 5 °C) over the Alps, and also over the Pyrenees, by ALADIN seen in winter and spring is mainly corrected by AROME. It is important to mention that the weather stations used in E-OBS are principally located in the valleys that are warmer than the mountains nearby, which could contribute to an overestimation of temperature locally over the Alps and the Pyrenees in E-OBS and explain partly the cold biases from the models over this region. Overall, both models share similar spatial distribution of temperature biases, except for Tmax in spring, but biases tend to be smaller for AROME.

Fig. 4
figure 4

2000–2018 2-m minimum (Tmin) and maximum (Tmax) temperature biases (°C) for AROME and ALADIN with respect to E-OBS for each season

In Fig. 5, the multi-year mean annual cycles of Tmin and Tmax from AROME, ALADIN and E-OBS for each of the four climatic regions of interest underline the biases discussed in the previous paragraph. The large cold bias (− 5 °C) over the Alps by ALADIN in winter and spring is partly corrected by AROME for both Tmin and Tmax. In addition, ALADIN has warm Tmax biases in summer for Nimes and Leipzig that are also reduced in AROME. Probably due to the large sea surface temperature influence, the temperate climate of London is well simulated by both models. In all four cases, the cold spring Tmax bias by AROME seems to have a prolonged effect reducing the warm Tmax bias in summer. Globally, AROME has smaller biases than ALADIN, especially in summer that is too warm in ALADIN for London, Nimes and Leipzig, and in winter over the Alps where ALADIN is too cold. However, AROME Tmin is too warm and AROME Tmax is too cold in winter and spring over Nimes, while ALADIN is closer to EOBS.

Fig. 5
figure 5

2000–2018 mean annual cycle of 2-m minimum and maximum temperature (°C) for AROME (red) and ALADIN (blue), and E-OBS (black) for the four climatic regions of interest

3.3 Precipitation seasonal mean and mean annual cycle

Figure 6 shows the seasonal mean precipitation depicted by HROBS and E-OBS and simulated by AROME and ALADIN and the relative biases of both models with respect to HROBS for each season. Both observations are similar, except over the Alps, the Cevennes and western Scotland where HROBS has larger values than E-OBS, explaining why we choose HROBS, which considers more weather stations and radars, to compute the biases. Clearly, both models have similar biases being too wet in winter and spring, and too dry in summer over France, while the wet biases are smaller in fall. Using a relative precipitation bias metric, biases may sometimes seem exaggeratedly large due to the dry climate of a specific region, such as the dry bias over southeastern France in summer and the wet bias in northeastern Spain in fall and winter. The wet bias in spring by AROME is larger than ALADIN, explaining partly the cold bias from AROME in spring. ALADIN simulates too much precipitation over the Alps, a feature that is mostly corrected by AROME for all seasons, likely because orographic precipitation is better simulated with AROME at 2.5 km. In more detail, from the precipitation mean annual cycle of the four climatic regions in Fig. 7, the summer dry biases of both models are highlighted over Nimes, while the wet biases over London and Leipzig are taking place almost all year long, AROME being especially wet over Leipzig. However, the wet bias over the Alps by ALADIN is clearly reduced by AROME. In general, both observations have similar mean annual cycles, except over the Alps where HROBS is maybe better than E-OBS due to the radar coverage and additional weather stations, but one should keep in mind that HROBS is only covering the period 2004–2010 compared to the period 2000–2018 by E-OBS. Overall, both models have similar biases, except over elevated regions (Alps, Pyrenees, and northwestern Scotland) where AROME has smaller biases than ALADIN.

Fig. 6
figure 6

2000–2018 seasonal mean (mm/d) precipitation for HROBS, E-OBS, AROME, and ALADIN and the relative biases of AROME and ALADIN with respect to HROBS for each season

Fig. 7
figure 7

2000–2018 mean annual cycle of precipitation (mm/d) for AROME (red), ALADIN (blue), E-OBS (black), and the HROBS (grey) for the four climatic regions of interest

3.4 Daily precipitation statistics (frequency and 99th percentile)

Both models simulate too frequent wet days (> 1 mm/d) in winter, spring and to a smaller extent in fall (Fig. 8). The too frequent wet days simulated by ALADIN over elevated regions (Alps, Pyrenees, and northwestern Scotland) is largely corrected by AROME. Concerning daily extreme precipitation (99th percentile) shown in Fig. 9, both models generally overestimate daily precipitation extremes, except over southeastern France and the Po Valley in summer and fall where the underestimation by ALADIN is to some extent corrected by AROME. Again, the overestimations of the precipitation extremes by ALADIN over the Alps and the Pyrenees are partly corrected by AROME. The extent to which AROME overestimates precipitation extremes over elevated areas is difficult to assess due to the large uncertainties in the observations. Indeed, over elevated areas, even high-resolution observational datasets are known to underestimate precipitation extremes due to precipitation undercatch by the rain gauges and by the shielding of the mountains affecting radar estimates (Prein and Gobiet 2017; Piazza et al. 2019; Caillaud et al. 2021). Overall, improvements of AROME over ALADIN are limited for these two daily indicators, except over high elevated regions such as the Alps and northwestern Scotland.

Fig. 8
figure 8

2000–2018 frequency (ratio) of wet days (> 1 mm/d) for HROBS, E-OBS, AROME, and ALADIN and the biases of AROME and ALADIN with respect to HROBS for each season

Fig. 9
figure 9

2000–2018 99th percentile (mm/d) of daily precipitation for HROBS, E-OBS, AROME, and ALADIN and the relative biases of AROME and ALADIN with respect to HROBS for each season

3.5 Hourly precipitation statistics (frequency, intensity and 99.9th percentile)

The frequency of wet hours (> 0.1 mm/h) in Fig. 10 is generally improved by AROME compared to ALADIN that simulates too many wet hours, especially in winter and spring, and especially over Great Britain and the Alps. In Fig. 11 showing the wet-hour precipitation intensity (corresponding to the mean of precipitation > 0.1 mm/h), the large ALADIN underestimation is generally corrected by AROME, especially in spring, summer, and fall when convective precipitation occurs. The overestimation of mean daily precipitation in spring by AROME and ALADIN (Fig. 6) is in disagreement with the wet-hour intensity (Fig. 11) that is underestimated by ALADIN and well simulated by AROME. This inconsistency between hourly and daily precipitation indicators comes from the smallest amount of precipitation (threshold) used by the observed gridded hourly precipitation datasets, which is often 0.1 mm/h. Thus, considering that the observed gridded precipitation datasets have no values between 0 and 0.1 mm/h, while AROME and ALADIN do simulate values in this range, such inconsistency can occur. Hourly precipitation extremes (99.9th percentile) in Fig. 12 are generally underestimated by ALADIN, a feature that is corrected by AROME, and sometimes even overestimated in spring and summer by AROME, likely a long-term memory consequence of the wet bias in spring. Notable improvements of hourly precipitation extremes with AROME are associated with the Mediterranean heavy precipitation events taking place in southeastern France, the Cevennes, and on the western Mediterranean Italian coast in fall that are underestimated by ALADIN, but much better represented by AROME (Fumière et al. 2020; Caillaud et al. 2021).

Fig. 10
figure 10

2000–2018 frequency (ratio) of wet hours (> 0.1 mm/h) for HROBS, AROME, and ALADIN and the biases of AROME and ALADIN with respect to HROBS for each season

Fig. 11
figure 11

2000–2018 intensity (mm/h) of wet hours (> 0.1 mm/h) for HROBS, AROME, and ALADIN and the relative biases of AROME and ALADIN with respect to HROBS for each season

Fig. 12
figure 12

2000–2018 99.9th percentile (mm/h) of hourly precipitation for HROBS, AROME and ALADIN and the relative biases of AROME and ALADIN with respect to HROBS for each season

Compared to daily and seasonal statistics showing smaller improvements with AROME, the frequency, intensity and extremes of hourly precipitation are clearly improved by AROME, indicating added value for sub-daily precipitation at higher resolution with the explicit simulation of deep convection. These findings agree with Kendon et al. (2012), which revealed that their 12-km RCM simulated too weak heavy precipitation, and that precipitation tended to be too persistent and widespread, in contrast to their 1.5-km CPRCM, which despite a tendency to overestimate heavy rain, provided a much better representation of precipitation duration and spatial extent. However, improvements of AROME over ALADIN are not systematic and there are regions and periods, such as spring, where ALADIN is most of the time better than AROME.

3.6 Summer precipitation diurnal cycle

Another feature that is generally improved by CPRCMs is the summer precipitation diurnal cycle (Argueso et al. 2016; Brisson et al. 2016b; Berthou et al. 2020; Lucas-Picher et al. 2021) illustrated in Fig. 13 for the four climatic regions of interest. For London, the later rise of precipitation from AROME compared to ALADIN is in good agreement with HROBS despite an overestimation of the amplitude of the cycle. Similarly, even if AROME overestimates the diurnal cycle for Leipzig, it does a much better job than ALADIN, which has a relatively flat diurnal cycle. For Nimes, the flat diurnal cycle simulated by ALADIN is improved by AROME, but the amplitude of the latter remains too weak compared to that observed. As for the Alps region, the general overestimation of precipitation throughout the day by ALADIN is corrected by AROME with a rise of the diurnal cycle that is in good agreement with that observed, but with a shift of the maximum (16 UTC for AROME and ALADIN compared to 19 UTC for the observations). In Fig. 14, the time of the maximum precipitation in the diurnal cycle from AROME at about 17 UTC over the land is much improved compared to ALADIN, which simulates a much earlier maximum between 6 and 11 UTC over land. The amplitude of the diurnal cycle that is largely underestimated by ALADIN is corrected, and sometimes overestimated by AROME when compared with the values of HROBS (Fig. 14). In general, AROME shows clear added value compared to ALADIN by simulating a better timing of the diurnal cycle maximum value in the late afternoon and with an amplitude, though mostly overestimated, that is in better agreement with HROBS than ALADIN.

Fig. 13
figure 13

2000–2018 mean diurnal cycle of precipitation (mm/h) for AROME (red), ALADIN (blue) and the HROBS (grey) for the four climatic regions of interest in summer (JJA)

Fig. 14
figure 14

2000–2018 time of the maximum (UTC) and amplitude (mm/h) of the precipitation diurnal cycle for AROME, ALADIN and HROBS in summer (JJA)

3.7 Cloud cover and surface radiation

In trying to provide some explanations of the different biases seen for both models, this section focuses on the evaluation of the cloud cover and the surface incoming shortwave radiation. Unfortunately, only the total cloud cover fraction (CCF) variable was saved when the AROME simulation was performed. Thus, the analysis will focus only on that cloud variable and will not allow differencing the radiative behaviour of clouds at different heights.

Figure 15 shows the CCF of the reference satellite dataset COMET and the biases of the AROME and ALADIN simulations, and another satellite dataset CLARA (0.25°) with respect to COMET (0.05°). The difference between AROME and ALADIN is also displayed in Fig. 15. CLARA and COMET exhibit differences in estimates of CCF that vary from one season to another, giving an indication of the uncertainties from the satellites. In general, AROME and ALADIN underestimate the CCF over land in summer, while they overestimate it in winter and spring. For all seasons, AROME simulates most of the time more clouds than ALADIN, except for high-elevated regions (Alps, Scotland, and Spain). This leads to a degradation of CCF in winter and spring in AROME compared to ALADIN with respect to satellite data, and an improvement in summer. Additional variables such as low and high CCF would be needed to better understand this behaviour in AROME.

Fig. 15
figure 15

2000–2018 total cloud cover (%) of COMET and biases of AROME, ALADIN and CLARA with respect to COMET, and comparison of AROME and ALADIN for each season

As indicated in Fig. 16, the summer underestimation of CCF for both models (Fig. 15) results in an overestimation of summer surface shortwave radiation (SW) in both models with the same order of magnitude, except over northwestern France, Benelux and northwestern Germany where the overestimation of SW from AROME is smaller than that of ALADIN. Over the same region, the SW radiation overestimation of AROME in spring and summer is smaller than that of ALADIN. A smaller simulated overestimation of SW radiation by AROME and ALADIN is also present in spring and fall (Fig. 16) despite an overestimation of the CCF for these seasons (Fig. 15). This could imply that the biases in CCF in both models do not affect the same height of clouds. In ALADIN, Nabat et al. (2020) have shown an underestimation of the low CCF, which could explain the overestimation of SW radiation.

Fig. 16
figure 16

2000–2018 surface down shortwave radiation (w/m2) of SARAH and biases of AROME, ALADIN and CLARA with respect to SARAH, and comparison of AROME and ALADIN for each season

In detail for the four climatic regions of interest, a difference in CCF between the two satellite datasets (COMET and CLARA) of around 5% can be seen in summer over London, Leipzig and Nimes (Fig. 17). AROME overestimates CCF in winter, spring and fall compared to the satellite datasets for London and Leipzig, but underestimates it for the Alps. All the datasets are in good agreement all year long for Nimes, which has a strong annual cycle with lower values in summer. For the surface SW radiation (Fig. 18), ALADIN and AROME have larger values than the satellite datasets that are in good agreement, except that ALADIN has larger biases than AROME for London and Leipzig, while it is the opposite over the Alps. The diurnal cycles of SW radiation of AROME and ALADIN (Fig. 19) in summer are similar, both showing an overestimation compared to SARAH, especially at its maximum. In Fig. 19, it is important to notice that only 3-hourly values of ALADIN are available compared to hourly values for AROME and SARAH, affecting the visual comparisons between the diurnal cycles.

Fig. 17
figure 17

2000–2018 mean annual cycle of cloud cover fraction (%) for AROME (red), ALADIN (blue), CLARA (grey) and SARAH (black) for the four climatic regions of interest

Fig. 18
figure 18

2000–2018 mean annual cycle of surface down shortwave radiation (w/m2) for AROME (red), ALADIN (blue), CLARA (grey) and SARAH (black) for the four climatic regions of interest

Fig. 19
figure 19

2000–2018 surface down shortwave radiation mean diurnal cycle (w/m2) for AROME (red), ALADIN (blue), and SARAH (grey) in summer (JJA) for the four climatic regions of interest

3.8 Snow cover

To evaluate the snow simulated by ALADIN and AROME, the mean snow cover fraction simulated in winter and spring (Fig. 20), the mean annual duration of snow cover (# days per year) (Fig. 21), and the mean annual cycle of snow cover fraction for the Alps region (Fig. 22) by both models are compared to the values from two surface reanalyses (ERA5-Land and UERRA MESCAN-SURFEX). With its higher resolution allowing a better representation of the Alps peaks and valleys (Fig. 1), AROME (2.5 km) can simulate smaller snow cover in the Alps valleys (Figs. 20 and 21) compared to the three other coarser-resolution datasets (ALADIN: 12 km, ERA5-Land 9 km, MESCAN-SURFEX 5.5 km) that have a continuous high snow cover throughout the Alps. The larger snow cover (Fig. 20), which translates into a longer snow cover duration (Fig. 21), by AROME compared to ALADIN over the Massif Central (Central southeastern France), higher mountains of Spain, the Pennines (Scotland), the Ardennes (southern Belgium), and the Eifel region (western Germany) is in good agreement with the two reanalyses (MESCAN-SURFEX and ERA5-Land). This larger snow cover by AROME is related to its higher resolution, which better represents higher elevations, and thus simulates colder conditions, allowing precipitation to sometimes fall as solid rather than liquid precipitation, and also to maintain snow on the ground compared to ALADIN. In Fig. 22 showing the mean annual cycle of snow cover over the Alps region, with its higher mountains, the snow cover of AROME remains higher than that of ALADIN in summer because of the quasi-permanent snow at the mountain tops for AROME. On the contrary, in winter, the lower snow cover in the Alps valleys in AROME keeps the snow cover at around 0.9 rather than 1 for ALADIN, ERA5-Land and MESCAN-SURFEX. However, the large spread between the reanalyses, AROME and ALADIN emphasizes the challenge to observe and simulate the snow cover over mountainous regions.

Fig. 20
figure 20

2000–2018 snow cover fraction (ratio) for AROME, ALADIN, MESCAN-SURFEX and ERA5-Land in winter (DJF) and spring (MAM)

Fig. 21
figure 21

2000–2018 mean annual duration of snow cover (# days per year) for AROME, ALADIN, MESCAN-SURFEX and ERA5-Land

Fig. 22
figure 22

2000–2018 annual cycle of snow cover fraction (ratio) for AROME, ALADIN, MESCAN-SURFEX and ERA5-Land for the Alps climatic region

4 Discussion and conclusions

Convection-permitting regional climate models (CPRCM) emerged about 10 years ago showing promising results in their ability to improve the simulation of precipitation characteristics, especially at the sub-daily timescale (Kendon et al. 2012; Ban et al. 2014). Since then, a few international projects such as FPS-convection (Coppola et al. 2020) and HORIZON2020 EUCP (Hewitt et al. 2018) were launched to compare climate simulations produced by different CPRCMs from several climate research centres and Universities in Europe. This paper presents in detail the CPRCM CNRM-AROME (version 41t1) that has been developed during the last years at the Centre National de Recherches Météorologiques at Météo-France and which was used in the projects mentioned above. In order to evaluate the ability of CNRM-AROME to simulate fine-scale climate features, a 19-year long (2000–2018) 2.5-km resolution simulation was performed over a large northwestern European domain in the so-called evaluation or hindcast mode. This simulation was compared with its lateral boundary driving field, a 12-km CNRM-ALADIN (version 6.3) simulation performed over the EURO-CORDEX domain, using different gridded observation datasets. In order to evaluate these simulations and facilitate the identification of added value over the large CNRM-AROME domain simulation, a hourly kilometric precipitation dataset was built from the collection of seven national (France, Germany, U.K., Italy, Switzerland, Netherlands, Denmark) datasets, four of them using radar estimates. Additional gridded observed datasets of 2-m minimum and maximum temperature, satellite estimates of surface incoming radiation and cloud cover, snow cover from surface reanalyses and mean sea level pressure were used to perform a comprehensive evaluation of CNRM-AROME and CNRM-ALADIN. While the analysis focuses on maps showing spatial and seasonal variability of the biases, some attention was paid on four regions of Europe with contrasting climate conditions (continental, oceanic, Mediterranean, mountainous).

The most important findings of this evaluation are summarized as follows:

  • The mean seasonal spatial distribution of mean sea level pressure (MSLP) of CNRM-AROME is close to that of CNRM-ALADIN, likely because of the rather small CNRM-AROME domain. However, small differences were seen between the spatial distribution of MSLP of CNRM-ALADIN and CNRM-AROME when compared with ERA-Interim because the CNRM-ALADIN domain covering most of Europe is rather large and that the CNRM-AROME domain is relatively far from the lateral boundaries of CNRM-ALADIN.

  • Both models show similar multi-year mean seasonal biases of 2-m minimum and maximum temperature that are within a range of − 2 to + 2 °C, except in summer where the warm biases are larger for some regions. Major differences between the models occur in spring when CNRM-AROME exhibits a widespread cold bias for maximum temperature (− 1 to − 2 °C) compared to a more reasonable performance of CNRM-ALADIN with biases between − 0.5 to 0.5 °C. However, CNRM-AROME does reduce the cold bias produced by CNRM-ALADIN over elevated regions such the Alps and Pyrenees for each season.

  • Both models are generally too dry in summer and too wet for the other seasons. However, the summer dry bias remains small when the absolute bias is computed. The wet biases between 50 to 100%, that is even higher in spring for CNRM-AROME, is important, but relatively common when compared to other RCMs (Kotlarski et al. 2014) and CPRCMs (Fosser et al. 2015; Leutwyler et al. 2017; Lind et al. 2020; Ban et al. 2021). CNRM-AROME clearly reduces the overly wet biases of CNRM-ALADIN over elevated regions.

  • In association with their wet biases in winter, spring and fall, both models overestimate the frequency of daily precipitation. Daily precipitation extremes are only locally improved by CNRM-AROME over the mountains all year long, and over southeastern France in fall. On the contrary, CNRM-AROME substantially improves the wet-hour precipitation intensity and extremes that are largely underestimated by CNRM-ALADIN, as well as hourly precipitation frequency that are overestimated by CNRM-ALADIN. Fall Mediterranean heavy precipitation events, that are underestimated with CNRM-ALADIN, are more realistically simulated by CNRM-AROME, as noted by Fumiere et al. (2020), Caillaud et al. (2021), and Pichelli et al. (2021).

  • The summer diurnal cycle of precipitation, which is rather flat with CNRM-ALADIN, is greatly improved with CNRM-AROME with a more realistic maximum that occurs later in the afternoon and with an amplitude, though overestimated, that is in better agreement with the observations than CNRM-ALADIN. The overestimation of the summer diurnal cycle amplitude is probably connected to the too warm and too dry conditions, a direct consequence of the lack of cloud cover and an overestimation of shortwave incoming solar radiation (Hentgen et al. 2019; Leutwyler et al. 2017).

  • Generally, CNRM-AROME and CNRM-ALADIN underestimate the total cloud cover fraction (CCF) in summer over the land, while they overestimate it in winter and spring. CNRM-AROME produces more clouds than CNRM-ALADIN all year long, a feature that is in contrast to other CPRCMs (Leutwyler et al. 2017; Hentgen et al. 2019). Unfortunately, since only total CCF was saved in the CNRM-AROME simulation, it is not possible to further the analysis into clouds at different heights. Locally, the deficit of clouds is particularly severe over the Alps by CNRM-AROME, especially in summer.

  • In agreement with the underestimation of CCF in summer, CNRM-AROME and CNRM-ALADIN severely overestimate (+ 30 w/m2) incoming shortwave radiation reaching the surface, especially over the Alps (+ 50 w/m2). The overestimation is smaller for CNRM-AROME over the northwestern France and Germany, but more severe over the Alps. The overestimation is around + 75 w/m2 at noon in summer for London and Leipzig for both models. This overestimation may explain the warm summer biases.

  • The snow cover simulated by CNRM-AROME is in better agreement than that of CNRM-ALADIN when compared to the two surface reanalyses, especially over mountainous areas, thanks to a more accurate representation of the elevation contributing to colder conditions more favorable to snowfalls and to preserve snow on the ground for longer periods. Since CNRM-AROME does not include a glacier model, it suffers from excessive accumulation of snow at high altitude (> 2000 m). For more details on the performance of CNRM-AROME over the Alps, the interested readers are invited to read Monteiro et al. (2022).

Overall, the analysis showed that CNRM-AROME is a suitable CPRCM that provides added value over its RCM counterpart CNRM-ALADIN, especially over mountainous areas (Lucas-Picher et al. 2017), and for precipitation indicators using hourly values for which convection is a key process (Berthou et al. 2020) such as precipitation extremes, intensity and diurnal cycle. No major degradation of the CNRM-ALADIN performances by CNRM-AROME have been identified despite the novelty of the use of CNRM-AROME in a climate mode at CNRM and the limited tuning due to its heavy computational cost. The collection of seven national sub-daily kilometric precipitation datasets covering a large part of the northwestern domain allowed to identify the added value of CNRM-AROME more easily at sub-daily time scales, rather than to use individual weather stations. One should keep in mind that the observed gridded datasets are dependent on the distribution and quality of radars and weather stations that vary in space and time, which is even more true for sub-daily datasets that rely only on a subset of weather stations. Thus, in more remote locations such as mountainous or sparsely populated regions, the precipitation and temperature datasets are subject to uncertainties that can affect the analysis. Further, it is now recognized that the skill in modelling mountainous rain and snow can be better than the skill of a collective network of precipitation gauges (Lundquist et al. 2019). However, despite the challenges of radars to measure precipitation in mountainous areas, the improvement brought using precipitation radars over station data is recognized (Piazza et al. 2019; Caillaud et al. 2021).

Despite the satisfactory performance of CNRM-AROME, this model suffers from systematic biases, also sometimes common to CNRM-ALADIN and to other CPRCMs (Ban et al. 2021). The excessive precipitation produced by CNRM-AROME in spring and winter over continental Europe stands out compared to CNRM-ALADIN, which also produces too much precipitation for the same seasons, but still less than CNRM-AROME. The excess of precipitation over continental Europe potentially explains the underestimation of maximum temperature of CNRM-AROME in spring due to the large energy absorption required to evaporate the excess of water, which somehow seems beneficial to limit the warm bias of CNRM-AROME in summer. The too cold and too wet biases of CNRM-AROME in winter and spring are common to many other RCMs and CPRCMs for different regions of Europe (Ban et al. 2014, 2021; Belusic et al. 2020; Kotlarski et al. 2014; Leutwyler et al. 2017; Lind et al. 2020). The excess of precipitation in winter and spring of CNRM-AROME seems related to the intermediate CNRM-ALADIN simulation that also has a wet precipitation bias. A comparison of the seasonal mean vertically integrated water vapor simulated by CNRM-ALADIN with that of ERA5 indicated that the CNRM-ALADIN atmosphere is too humid in spring and winter (not shown). Thus, when this too humid atmosphere is advected in the CNRM-AROME domain through its lateral boundaries, CNRM-AROME accentuates the overestimation of precipitation simulated by CNRM-ALADIN. A further characterization of the CNRM-ALADIN humid atmosphere bias and a correction of this bias would be the next steps to reduce the overestimation of precipitation in CNRM-ALADIN and CNRM-AROME in winter and spring.

Both CNRM-AROME and CNRM-ALADIN suffer from warm biases in summer, but that warm bias is lower for CNRM-AROME, likely due to the colder conditions in spring. Even though CNRM-AROME produces more clouds than CNRM-ALADIN all year long, except for the Alps, both models underestimate the cloud cover in summer, which directly lead to the overestimation of incoming solar radiation in summer for both models, likely explaining the summer warm bias of both models. This warm summer bias, seen in other RCMs and CPRCMs (Ban et al. 2014, 2021; Berthou et al. 2020; Kotlarski et al. 2014; Leutwyler et al. 2017), is often connected to a lack of precipitation and a dry bias of soil moisture, especially in southern Europe, where the surface scheme can produce too dry soil conditions in summer (Boberg and Christensen 2012). CNRM-AROME includes an older version of SURFEX (7.3) that uses the force-restore scheme to transfer heat and water in the soil compared to CNRM-ALADIN that includes SURFEX 8.0 that uses the more advanced ISBA-DIFF, which is rather based on the soil diffusive equation. The force-restore scheme used in CNRM-AROME is known to lead to too dry soil moisture conditions (Le Moigne et al. 2020). Therefore updating the land surface scheme and correcting the cloud cover bias constitute the main promising ways to reduce the CNRM-AROME summer warm bias in future versions. This has been already partially achieved for the cloud cover representation by tuning the condensation threshold for undersaturation conditions in preliminary tests reported in Lemonsu et al. (2023).

Other important biases of CNRM-AROME consist in the quasi-constant overestimation of precipitation and underestimation of temperature over the Alps, also common to other CPRCMs (Ban et al. 2014, 2021; Lüthi et al. 2019). These biases are more severe in CNRM-ALADIN and somehow partially corrected in CNRM-AROME. However, this correction seems to occur for the wrong reasons since CNRM-AROME severely underestimates the cloud cover associated with an important overestimation of the incoming solar radiation. These cloud cover and incoming solar radiation biases are milder in CNRM-ALADIN. However, one should keep in mind that the gridded observations probably underestimate the orographic precipitation due to the lack of stations at high elevations, precipitation undercatch, and the difficulty of radars to measure precipitation over the mountainous regions. CNRM-ALADIN is affected by a too large frequency of precipitation that likely maintains wet or snowy surface conditions and alters the albedo and soil moisture.

This evaluation paper focused on the ability of CNRM-AROME in simulating some climate characteristics in a comparison study with gridded observations. Beyond the identification of biases, it is important to continue improving CPRCM such as CNRM-AROME. Though out of scope of the current study, this improvement can be done using sensitivity studies that can shed more light on the causes of the deficiencies of CNRM-AROME, allowing to find ways to solve such systematic biases. One should keep in mind that the model presented in this paper is the first operational CPRCM developed at Meteo-France. Improvements are expected in the next version taking into account the developments of the NWP model AROME that is in continuous development within the new ACCORD international consortium. Efforts are underway to develop the next version of CNRM-AROME, which should benefit from the substantial progress achieved until cycle 46 of the AROME NWP model. Updating CNRM-AROME to that cycle would allow it to be in phase with CNRM-ALADIN and ARPEGE-Climat, reducing as much as possible incoherencies between climate models operating at different resolutions (Boé et al. 2020). CNRM-AROME should improve further with time by including interactive couplings with other components (regional sea, anthropogenic and natural aerosols, surface hydrology) of the regional climate system such as in CNRM-RCSM (Sevault et al. 2014; Nabat et al. 2020), as well as more sophisticated land and hydrology surface processes recently developed in SURFEX (Le Moigne et al. 2020). Additional information on the behavior of CNRM-AROME is expected from the contribution of CNRM-AROME in the international FPS-Convection and EUCP projects where its skill will be compared to other CPRCMs.