Modelling Mediterranean heavy precipitation events at climate scale: an object-oriented evaluation of the CNRM-AROME convection-permitting regional climate model

Modelling the rare but high-impact Mediterranean Heavy Precipitation Events (HPEs) at climate scale remains a largely open scientific challenge. The issue is adressed here by running a 38-year-long continuous simulation of the CNRM-AROME Convection-Permitting Regional Climate Model (CP-RCM) at a 2.5 km horizontal resolution and over a large pan-Alpine domain. First, the simulation is evaluated through a basic Eulerian statistical approach via a comparison with selected high spatial and temporal resolution observational datasets. Northwestern Mediterranean fall extreme precipitation is correctly represented by CNRM-AROME at a daily scale and even better at an hourly scale, in terms of location, intensity, frequency and interannual variability, despite an underestimation of daily and hourly highest intensities above 200 mm/day and 40 mm/h, respectively. A comparison of the CP-RCM with its forcing convection-parameterised 12.5 km Regional Climate Model (RCM) demonstrates a clear added value for the CP-RCM, confirming previous studies. Secondly, an object-oriented Lagrangian approach is proposed with the implementation of a precipitating system detection and tracking algorithm, applied to the model and the reference COMEPHORE precipitation dataset for twenty fall seasons. Using French Mediterranean HPEs as objects, CNRM-AROME’s ability to represent the main characteristics of fall convective systems and tracks is highlighted in terms of number, intensity, area, duration, velocity and severity. Further, the model is able to simulate long-lasting and severe extreme fall events similar to observations. However, it fails to reproduce the precipitating systems and tracks with the highest intensities (maximum intensities above 40 mm/h) well, and the model’s tendency to overestimate the cell size increases with intensity.


Introduction
The northwestern Mediterranean is affected by Heavy Precipitation Events (HPEs), mainly during the fall season Nuissier et al. 2008Nuissier et al. , 2011. These extreme events, with rainfall amounts greater than 100 mm recorded in less than a day and often within just a few hours, lead to devastating flash flooding and floods. They have most often been studied in the framework of HyMeX (Hydrological cycle in the Mediterranean eXperiment, Drobinski et al. (2014); Ducrocq et al. (2014)). The occurrence of these events can be explained by several factors: -propitious slow-evolving synoptic-scale situations (Toreti et al. 2010;Nuissier et al. 2011;Ricard et al. 2012): southwesterly and southerly upper-level flows associated with potential vorticity anomalies; -conditional convective instability and moisture supply through evaporation from the Mediterranean Sea remaining warm after summer and/or low-level moist marine flow associated with the synoptic situation Ducrocq 2011, 2013;Krichak et al. 2015); -convection triggerings through orography lifting, cold pool Bresson et al. 2012), and mesoscale low-level convergence Nuissier et al. 2008;Bresson et al. 2009).
The different combination of these ingredients and the slowevolving nature of the synoptic situation often lead to the formation of quasi-stationary Mesoscale Convective Systems (MCSs), which can cause most significant damage in the Mediterranean region (Hernandez et al. 1998;Ducrocq et al. 2008). At climate scale, General Circulation Models (GCMs) cannot properly represent Mediterranean HPEs: their coarse spatial resolution, around 150 km, does not allow them to represent topography and mesoscale processes involved in these phenomena correctly (Ruti et al. 2016). Since the late 1980s, the climate modelling community's solution has been to increase the resolution in specific regions and implement Regional Climate Models (RCMs) (Giorgi and Bates 1989;Giorgi 2019) with a current resolution of 50 km up to 12 km. The use of these high-resolution limited area models provides a clear added value compared to 150 km GCMs, especially for precipitation Ruti et al. 2016;Prein et al. 2016;Fantini et al. 2018). Multimodel studies based on Euro-CORDEX, ENSEMBLES and/ or Med-CORDEX simulations also show the larger added value of 12 km RCMs compared to 50 km RCMs (Jacob et al. 2014;Prein et al. 2016;Fantini et al. 2018). In addition, RCMs can locally modify the expected climate change signal compared to GCMs, for example over North America (Di Luca et al. 2013), the Alpine region (Torma et al. 2015;Giorgi et al. 2016) and all of Europe (Rajczak and Schär 2017;Coppola et al. 2020a). The climate change signal can also be modified when the RCM resolution is increased from 50 to 12 km (Jacob et al. 2014;Luu et al. 2018). However, even with a 12 km resolution, deep convection parameterization is recommended, which limits the representation of precipitation patterns and distribution: a shift in the diurnal cycle of precipitation (Hohenegger et al. 2008), overestimation of the frequency of light precipitation, and underestimation of extremes Fumière et al. 2020). Moreover, the resolution remains too coarse to represent the fine scale processes involved in the triggering of sub-daily precipitation (Berg et al. 2019) and convective phenomena correctly. It therefore limits confidence in the expected evolution of extreme precipitation in the context of climate change (Bony et al. 2015).
Beginning in the late 1990s, limited-area Convection Permitting Models (CPMs) have been developed in Numerical Weather Prediction (NWP) to predict extreme events as HPEs. Combining a spatial resolution of around 1 to 3 km and high-resolution data assimilation, these models allow a realistic description of convection and the associated triggering mechanisms. They benefit from the implementation of high-resolution topography, a detailed description of surfaces and sophisticated physical parameterisations such as microphysics, turbulence and shallow convection. Moreover, deep convection is explicitly simulated by the model's nonhydrostatic dynamics. Prognostic equations for hydrometeors allow the transport and memory of cloud and precipitation from one time step to the next: advection, development and decay of organised convection such as MCSs are correctly represented by the model dynamics. Therefore, CPMs provide a clear improvement in representating convective phenomena compared to 10 km convection-parameterised models Richard et al. 2007;Khodayar et al. 2016).
At the time when CPMs could not be implemented by the climate community due to computing cost limitations, studies mixing dynamical and statistical approaches were used to conduct high-resolution climate studies on extremes, such as Beaulant et al. (2011) or Meredith et al. (2018) for precipitation or Najac et al. (2009 for wind. However, these methods were based on a limited number of cases, and did not yet provide a robust answer to the question of the future evolution of extreme rainfall. In recent years, as societal needs for predictions of future changes in extreme precipitation coincided with an increase in computing power, the regional climate modelling community began to run Convection Permitting Models for decadallong simulations, creating a new family of regional models called Convection-Permitting Regional Climate Models (CP-RCMs), with resolutions of around 1 to 3 km. The use of CPMs in climate mode led to the need for studies to validate the models and demonstrate the expected added value of CP-RCMs compared to RCMs (Prein et al. 2015). Indeed, while CPMs have already shown their added value for NWP, which will necessarily benefit the CP-RCMs, the use of these models in climate mode still involves evaluation. For example, in the NWP community, data assimilation has shown its importance in improving the prediction of extreme precipitation events (Gustafsson et al. 2018). In climate mode, the model is initialized only once at the beginning of the simulation period and only boundary conditions are given to the CP-RCM, which can express its own internal variability and biases within its domain. If we focus on precipitation, pioneer evaluation studies over parts of Europe show realistic rainfall characteristics, especially at an hourly time scale with CP-RCMs: the first multi-model study over the eastern part of the Alpine region (Prein et al. 2013) and mono-model studies over the entire Alpine region (Ban et al. 2014;Lind et al. 2016;Reder et al. 2020), Central Europe (Knist et al. 2020), United Kingdom (Kendon et al. 2014), southern part of Germany , Belgium (Brisson et al. 2016), Fenno-Scandinavia ) and southeast of France (Fumière et al. 2020). Leutwyler et al. (2017) and  confirm these results over a larger pan-European domain with, in particular, an improved precipitation diurnal cycle and improved summer and fall hourly precipitation distribution compared to RCMs. European multi-model studies were also carried out over a common pan-Alpine domain within the framework of the CORDEX Flagship Pilot Study (FPS) on Convection program (Coppola et al. 2020b;Ban et al. 2020;Pichelli et al. 2020).
Few studies have been conducted specifically for the fall HPEs in the Mediterranean region using CP-RCMs. In the first European multi-model CP-RCMs intercomparison (Coppola et al. 2020b), one case study focuses on a Mediterranean HPE referred to as HyMeX-IOP16. In , improvement in the representation of the daily fall extreme precipitation in CP-RCMs compared to RCMs is shown for a ten-year period and also through a detailed case study. For a ten-year period focusing on the Cevennes region, the southern part of the Massif-Central most affected by fall HPEs (cf. Fig. 1), Fumière et al. (2020) use the new French reference hourly and kilometric observational dataset COMEPHORE to demonstrate the added value of CNRM-AROME CP-RCM, compared to CNRM-ALADIN RCM, for daily and hourly precipitation.
To our knowledge, until now, all CP-RCM and RCM studies dedicated to Mediterranean HPEs have focused on local statistics of the surface rainfall without considering them as organised precipitation systems. However, HPEs are suitable for going beyond standard climate statistics. Indeed, they can be defined as moving objects with specific characteristics, such as life time, area, maximum intensity or velocity. It is now possible to implement object-oriented approaches as done by Prein et al. (2020) in the US to validate the representation of MCSs by a CP-RCM. The use of a precipitating system detection and tracking algorithm will allow us to evaluate the model's ability to reproduce the main characteristics of these events, with the longer-term goal of applying these methods in order to characterise the change in future convective events.
Similarly, until now, studies have been limited to relatively short periods due to calculation limitations for the model's simulations and the limited availability of reliable high temporal and spatial resolution observational datasets. We now have a 38-year-long simulation of the new version of the CNRM-AROME CP-RCM with a 2.5 km resolution over the pan-Alpine domain defined through the CORDEX FPS on Convection program (Coppola et al. 2020b). The new French reference hourly and kilometric observational dataset COME-PHORE has recently been extended to 20 years.
Our study takes advantage of these opportunities. Its main objectives are: (a) to confirm the main results of Fumière et al. (2020) over a longer period, a larger domain and with an improved version of the CNRM-AROME CP-RCM by comparing it to observations and to the CNRM-ALADIN RCM. (b) to consider French Mediterranean HPEs as objects and apply a precipitating system detection and tracking algorithm in order to evaluate whether the CP-RCM is able to reproduce the main characteristics (duration, area, intensity, severity) of the convective systems observed in COMEPHORE.
Section 2 presents the observational datasets, models, simulation setup and methods used in the study. Section 3 proposes standard climate statistics and a comparison between models and observations, focusing on extreme precipitation and interannual variability at daily and hourly time-scales. Section 4 presents the object-oriented approach applied both to CNRM-AROME and COMEPHORE, allowing comparison between the CP-RCM and observations for the main characteristics of the precipitating systems and trajectories. Discussion is provided in Sect. 5 and a conclusion is given in Sect. 6.

Observational precipitation datasets
In our study, we need high-resolution and high-quality observational datasets for three main reasons: Fig. 1 CNRM-AROME topography in the physical central zone for the pan-Alpine domain. Delimited by grey lines, the regular ALP3 domain is used as the evaluation target grid. The tracking domain is delimited by black lines and the French MED Area by black dotted lines -we want to evaluate a high spatial and temporal resolution model simulation, i.e. kilometrics and hourly, correctly; -our evaluation study focuses on precipitation, a parameter particularly difficult to estimate because of its significant variability in space and in time; -we are dealing with extreme rainfall, meaning the tail of the rainfall distribution, which is often poorly represented in observed datasets.
Particular attention must therefore be paid to the choice of reference precipitation observed datasets (Zolina et al. 2014;Prein and Gobiet 2017;Fantini et al. 2018;Kotlarski et al. 2019). The selected hourly and daily precipitation gridded datasets are described in Table 1.
Hourly precipitation accumulations are based on gridded datasets merging radar and rain gauge observations over France (COMEPHORE), Germany (RADKLIM) and Switzerland (RdisaggH), and only on rain gauge observations over Italy (GRIPHO).
Daily precipitation accumulations are based on the available hourly data from COMEPHORE, RADKLIM and GRIPHO over France, Germany and Italy. To extend the covered area, we also use the APGD-EURO4M daily dataset on the Pan-Alps domain. Over Switzerland, where RdisaggH is only available for a period of 8 years, we prefer the daily APGD-EURO4M with a longer period available.
As COMEPHORE is the reference for our extreme precipitation evaluation study over the French Mediterranean in Sects. 3.3 and 4, we propose a detailed description in appendix A.

Models and simulations
To evaluate Regional Climate Models, the current practice is to run a so-called "evaluation run" in which the model is driven by "perfect" lateral boundary conditions, that is to say reanalysis. In order to evaluate the CP-RCM, we perform a long past simulation of CNRM-AROME with a horizontal resolution of 2.5 km, driven by the 80 km ERA-Interim global reanalysis (Dee et al. 2011), described in Fig. 2. In order to limit the resolution jump (Laprise et al. 2008;Prein et al. 2015;Matte et al. 2016Matte et al. , 2017, a two-tier forcing strategy is chosen: the intermediate step is the CNRM-ALADIN RCM with a 12.5 km horizontal resolution.
In this study, CNRM-ALADIN presents a 12.5 km horizontal resolution and 91 vertical levels from 10 m to 1 hPa. The time step is 450s. The CNRM-ALADIN simulation is driven by the 80km global reanalysis ERA-Interim every 6 hours, on a Med-CORDEX domain (Ruti et al. 2016). A spectral nudging technique (von Storch et al. 2000;Radu et al. 2008) is applied to CNRM-ALADIN (Colin et al. 2010;Herrmann et al. 2011) in order to impose large-scale conditions given as close as possible to reality on the CP-RCM.
CNRM-ALADIN is a bi-spectral, hydrostatic limited-area RCM with a semi-Lagrangian advection scheme and a semiimplicit time discretisation. This model requires a parameterisation of deep convection. While Fumière et al. (2020) (Daniel et al. 2019;Nabat et al. 2020) with a new convection scheme including dry, shallow and deep convection (Piriou et al. 2007;Guérémy 2011). The previous deep convection scheme was the mass-flux scheme based on a moisture convergence closure (Bougeault 1985): deep convection with condensation and precipitating following the rule "everything that condenses precipitates". The new PCMT scheme (Prognostic Condensates Microphysics and Transport) allows convection to be represented with or without condensation, precipitating or not, with a closure in relaxation of the Convective Available Potential Energy (CAPE). In the following, CNRM-ALADIN will refer to CNRM-ALADIN v6.2.
In this study, CNRM-AROME presents a horizontal resolution of 2.5 km, which was the resolution used in NWP AROME from the beginning of its operational use until April 2015. The number of vertical levels is 60, with hybrid coordinates levels  from 10 m to about 1 hPa and 21 levels under 2000 m, which allows a good description of the lower layers of the atmosphere. The time step is 60 seconds.
The version of the CNRM-AROME model used in this study (and also in Coppola et al. (2020b), Ban et al. (2020) and Pichelli et al. (2020)) is related to the cycle 41t1 of the NWP AROME (Termonia et al. 2018) in operational use at Meteo-France between December 2015 and December 2017. The version of CNRM-AROME used in Déqué et al. (2016), Fumière et al. (2020, Belušic et al. (2020) and Lind et al. (2020) is related to cycle 38.
The dynamical core is the non-hydrostatic ALADIN bispectral core (Bénard et al. 2010) with a semi-Lagrangian advection scheme and a semi-implicit time discretisation. With high-resolution and non-hydrostatic dynamics, deep convection is no longer parameterised, but sub-grid shallow convection still requires parameterisation.
With the exception of the radiation scheme, most of the physical parameterisations come from the Meso-NH research model (Lafore et al. 1998;Lac et al. 2018) and are presented in Table 2.
Due to a first vertical layer at 10 m, the Surface Boundary Layer scheme CANOPY-SBL (Masson and Seity 2009) is activated. This parameterisation implements additional prognostic atmospheric layers between the ground and the lowest level of the atmospheric model and makes it possible to explicitly calculate the 2 m temperature and 10 m wind. Physiographic data is based on the global ECOCLIMAP I database (Masson et al. 2003) at 1 km resolution.
The CNRM-AROME41t1 version used in climate mode in this study is very close to the NWP version. The only model differences are: -the parameterisation of the turbulence fluxes above the sea, ECUME (Belamari and Pirani 2007), is replaced by the COARE3 scheme; -the relaxation zone where the atmosphere lateral boundary conditions are imposed is extended from 8 to 21 grid points to avoid losing information from the driving model when entering the CP-RCM domain.
The main modifications between the cycle 38 used in Fumière et al. (2020) and the cycle 41 used in this study are the following: -a new database for orography with higher resolution: the GTOP030 with 1 km resolution (Survey 1993) is replaced by the GMTED2010 with 250 m resolution (Carabajal et al. 2011); -a new version of SURFEX: from V7.2 to V7.3; -some minor modifications in the dynamics in order to improve the numerical cost; -two physical parameterisation modifications: the increase in the self-conversion threshold of primary ice crystals to snow and the addition of orographic shadowing and slope parameterisations (Senkova et al. 2007); -the main modification is the implementation of the COMAD scheme (Malardel and Ricard 2015). Indeed, in the vicinity of convective clouds, AROME had the general tendency to produce unrealistic divergent winds at the edges of the cold outflows generated by the precipitation evaporation. More conservative semi-Lagrangian horizontal weights were proposed in order to take into account the deformation of air parcels along each direction. Through the use of these new weights, the problem of "grid-point storms" was corrected.
In Fumière (2019), a comparison between the two versions of the model (cycle 38 and 41) showed an improvement in the representation of fall rainfall in the southeast of France with CNRM-AROME41t1 (not shown). In the following, CNRM-AROME will refer to CNRM-AROME41t1.

Evaluation simulation setup
We performed a 38-year CNRM-AROME evaluation simulation from 1981 to 2018 which is, to our knowledge, the longest run ever performed at 2.5 km with a climate model. It is worth mentioning that the simulation was technically carried out in two time slices: from 1997 to 2018 with a 3-year spin-up period (1994)(1995)(1996) and then extended from 1981 to 1996 with a 2-year spin-up period (1979)(1980). Thanks to the spin-up strategy and the limited-area model framework, however, we consider the break to be very limited.
The CNRM-AROME evaluation simulation is presented in Fig. 2.

Domain
The limited-area CP-RCM domain is the common pan-Alpine domain (cf. Fig. 1) defined through the CORDEX FPS on Convection program (Coppola et al. 2020b). The 2.5 km CNRM-AROME computational domain consists of 345600 grid points (640 points of longitude and 540 points of latitude) with a conformal-Lambert projection. On the north and east sides, the extension zone (E zone) where artificial periodic extensions of the spectral fields are inserted contains 11 grid points. On each side of the domain, the intermediate zone (I zone) where the lateral boundary conditions are imposed by a relaxation comprises 21 grid points. The physical central zone (C zone) consists of 587 × 487 points, meaning around 1500 km × 1200 km at 2.5 km resolution. On the Meteo-France supercomputer, 5 days are needed to run one year of CNRM-AROME on the pan-Alpine domain (142kh TEI = number of cores × number of hours per simulated year), which is 16 times longer than for a CNRM-ALADIN 12.5 km simulation on the Med-CORDEX domain.

Initial conditions
The inital atmospheric and surface initial conditions are interpolated fields from the CNRM-ALADIN RCM simulation and are given to the CNRM-AROME CP-RCM over the entire pan-Alpine domain, only once at the beginning of the simulation.
Lateral and upper boundary conditions CNRM-ALADIN driven by ERA-Interim reanalysis gives the lateral boundary conditions to the CNRM-AROME CP-RCM every hour. Upper boundary conditions are also given by the RCM every hour for levels between 15 and 20 km with weak relaxation coefficients and only for long waves,  Fouquart and Bonnel (1980) and FMR 6 bands for shortwave) Morcrette (2001) applied to wind divergence, wind vorticity and temperature all over the CP-RCM domain. These upper boundary conditions are added to avoid numerical stability problems linked to a low number of vertical levels in the high troposphere and stratosphere.

Surface forcing
For the surface boundary conditions, we impose the Sea Surface Temperatures (SSTs) from interpolated monthly ERA-Interim SSTs (around 80 km) as recommended in the CORDEX-FPS on Convection simulation protocol. The Mediterranean sea plays an important role in supplying moisture and convective instability during HPEs and in fine-scale interactions occurring between the low-level atmosphere and the ocean mixed layer (Lebeaupin Brossier et al. 2008;Rainaud et al. 2017;Bouin and Lebeaupin Brossier 2020). Higher resolution SSTs than the ERAinterim ones could provide a better representation of the sea surface conditions with finer-scale patterns. A step forward to take advantage of a higher resolution SST would be to have a high-resolution high-frequency coupling between atmosphere and ocean to represent the complex air-sea interactions, but is not yet affordable in long climate simulations.

Radiative forcing
In addition to water vapour directly simulated by the model, the concentrations of five greenhouse gases are imposed: CO 2 , N 2 O, CH 4 , CFC11, CFC12 with one value per year and per species, homogeneous in the spatial dimension. The values come from observations before 2005 and from the RCP4.5 scenario afterwards (Moss et al. 2010).
As the differences in greenhouse gases radiative impacts between observations (Friedlingstein et al. 2019) and RCP4.5 scenario are very small, the use of RPC4.5 concentrations after 2005 is not expected to have significant impacts on the results. The Nabat et al. (2013) aerosols dataset, a mixture of remote sensing and model data, provides monthly evolutive two-dimension maps for dust, sea salt, organic carbon and black carbon. Due to an error in the preparation of the simulation, the sulfate forcing is missing in this simulation. Moreover, the optical properties of the aerosols have not been updated as advised in table 6 of Nabat et al. (2013).

Common grids and interpolation method
Model evaluation requires the choice of common grids where comparisons are performed. The first grid is the common regular 0.0275° resolution ( ≈3km) ALP3 grid which is the evaluation target grid defined in the framework of the CORDEX FPS on Convection program (cf. Fig. 1). In addition, for a fair comparison between the 2.5 km CP-RCM and the 12.5 km RCM, and following Hong and Kanamitsu (2014), the comparisons must also be made on the coarsest resolution between compared datasets. The second grid is the pan-Alpine part of the common regular Euro-CORDEX Europe high-resolution grid, EUR-11i (0.125°, i.e. around 12 km), called ALP11. All model and observational datasets are interpolated on these common grids. Among the different ways to interpolate horizontal fields, we select the cdo first order conservative remapping, a common method chosen in recent studies Pichelli et al. 2020).
Spatial spin-up effect that means zones of the domain that may be subject to boundary effects should also be taken into account in limited-area modelling. Following Matte et al. (2017),  and Fumière et al. (2020), we estimate it to around 30 grid points for each border of the domain for the 2.5 km resolution. The chosen ALP3 or ALP11 evaluation target grids allow us to implicitly remove the required spatial spin-up zone from the CNRM-AROME computing pan-Alpine domain.

Composite of datasets
As we can see in Fig. 3, the lengths of the available periods of the different observational datasets are not homogeneous; some are constrained by the availability of radar archives, and others are limited by databases not extended after the end of production projects. But extreme precipitation events in the northwestern Mediterranean present high temporal variability, and the chosen study period might strongly impact the calculation of extremes.
Therefore, when calculating differences between models and observations, we choose only common periods. Further, in order to take full advantage of the databases and simulations, we use common time periods that can be spatially different. The figures will be composites with various time period depending on the datasets used (cf. Fig. 4).

Percentiles
In Sect. 3, percentiles are calculated for each model and observational datasets over each common period. The calculation takes into account all-day precipitation (both dry and wet-days) following Schär et al. (2016).
Interpolation on common ALP3 and ALP11 grids is performed before percentile computation in order to compare models and observation percentiles. For each grid point, absolute differences are then calculated and preferred to relative differences in order to highlight the areas with high percentiles values.
CNRM-AROME and observations are compared on both 3 and 12 km grids. When considering CNRM-ALADIN, data are interpolated only on the ALP11 grid for a fair comparison.
A bootstrapping method (described in appendix B1) is used to validate the statistical significance of model bias differences at the 90% confidence level. On the figures representing percentiles differences on the 12 km grid (cf. Fig. 6de and 8de), only data with significant differences between model and observations are plotted.
Added value A spatial added value index of the CNRM-AROME CP-RCM compared to CNRM-ALADIN RCM (cf. Fig. 6f and 8f) is computed. As for the calculation of differences, the absolute added value is preferred in order to highlight the areas with high percentile values. The added value index is defined on each grid point as the difference of the absolute values of the differences between the RCM and observations on the one hand and the CP-RCM and observations on the other hand, which means that for the Q studied statistics: A positive added value index means that the CP-RCM bias is smaller than that of the RCM: we can therefore conclude that the CP-RCM behaves better than the RCM for the studied The available periods of the different precipitation observational datasets used for hourly and daily precipitation evaluation, compared to the period of the model evaluation simulation statistics. A negative added value index corresponds to a larger bias for the CP-RCM with respect to RCM.

Definition of the French MED area
To compute statistics on French Mediterranean HPEs for both standard climate statistics and precipitating system detection and tracking, a specific area is defined, called French MED (cf. Fig. 1). Following Ribes et al. (2019), we computed the annual maximum 1-day rainfall RX1d for each grid point for the 20 years of the COMEPHORE observational dataset. Then we selected the rectangular limits of the area where the all-period average of RX1d is higher than 60 mm, the threshold used in Ribes et al. (2019) to select stations affected by Mediterranean events. As the COME-PHORE data does not cover the sea, the grid points over the sea are also removed for the model dataset. We excluded Corsica from this study area because of its specific orography, the difficulty of capturing trajectories on this small mountainous area of land, and the fact that COMEPHORE is not able to represent precipitation correctly on a large part of the island (cf. Fig. 16 in appendix A). The French MED area is, for example, used in Figs. 9 or 13.

Precipitating system detection and tracking algorithm
The object-oriented approach used here to study HPEs involves the implementation of a precipitating system detection and tracking algorithm. The chosen method developed at CNRM and already applied in precipitation nowcasting and for AROME evaluation follows Morel and Senesi (2002a). For this study, the tool is applied to the 1-h accumulated precipitation field, instead of the 5-min radar reflectivities used in its nowcasting version. This is possible thanks to the specific characteristics of Mediterranean HPEs, which are organised convective systems with a much longer lifetime than those of ordinary cells.
The method can be summarised as follows: - Step 1: interpolation and smoothing; -Step 2 : detection of the precipitating systems every hour with a minimum surface of 20 km 2 and seven intensity thresholds (2, 5, 10, 15, 20, 25 and 30 mm/h); -Step 3: tracking of system trajectories by indentifying links between systems at different time steps according to overlapping and correlation conditions; -Step 4: diagnostics.
A further description of the algorithm can be found in appendix C and Fig. 18.

Severity index
To characterise the most severe tracks, we introduce an index with a hydrological impact point of view, called severity index (cf. Fig. 14f). Considering the track of a cell exceeding a given intensity threshold, the severity index is defined as the sum, over the whole lifetime of the cell, of the product of mean intensity and area at each time step with a weight coefficient that takes into account the effect of the cell's moving velocity. The severity index therefore corresponds to the amount of surface precipitation during the track of a given precipitating system weighted by the system velocity. Indeed, for convective systems of comparable precipitation intensity, the given areas concerned by the systems will be affected by higher rainfall accumulations if the system moves more slowly or is quasistationary Bresson et al. 2012). Thanks to the introduction of a coefficient inversely proportional to the system horizontal moving velocity, a quasi-stationary system will have an higher severity index than the same system moving fast. Therefore, the formula of the severity index is as follows: with severity in m 3 , = 1∕1000 coefficient to convert mm.km 2 to m 3 , I mean the mean intensity of each cell of the track in mm/h, A its area in km 2 , V its moving velocity in m/s, V max =35m/s, dt = 3600s.
The V max value is set to 35m/s, consistent with the maximum velocity observed in COMEPHORE during the twenty fall seasons (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) and with convective systems' motion speeds (Johnson and Ciesielski 2020). For the numerical implementation of the severity index and within the framework of the study with a horizontal resolution of 3 km, the velocity V is taken equal to max(0.4, V) (with 0.4 m/s=1.5 km/h corresponding to the minimum  Spatial distribution of the 99th percentile for daily precipitation (mm/d) during an extended fall (SOND) for a CNRM-AROME, b CNRM-ALADIN and c observation for the 12 km grid. The second line presents d, e absolute differences between models and observation (mm/day) and f the added value of CNRM-AROME compared to CNRM-ALADIN (mm/day). Differences are plotted only when significant at the confidence level of 90% (bootstrapping method). Added value is plotted in green if CNRM-AROME better than CNRM-ALADIN, in brown otherwise. On all figures, orography (500 m and 1000 m) is plotted in brown lines velocity for a system to leave the 3 km grid point within an hourly time step).

Climate statistics with a focus on extreme precipitation
We first focus on standard climate statistics: extreme precipitation can be studied through high percentiles at daily and hourly time scales or the definition of events exceeding daily and hourly thresholds. The studied period is an extended fall season (from September to December), the period in which most of the heavy precipitation events occur in the Mediterranean region .
Percentiles are presented on the 3 km grid for CNRM-AROME and observations, then on the 12 km grid for CNRM-AROME, CNRM-ALADIN and observations.

Daily precipitation statistics
The 99th percentile for daily precipitation is a good representation of daily extreme precipitation. On average, for a period of 20 years and our 4-month selection, it corresponds to the daily precipitation amount exceeded about once per year for each grid point.
As we can see on the observed 99th percentile figure (cf. Fig. 5b), in the fall season, daily extreme precipitation is focused on areas affected by Mediterranean HPEs and more specifically the foothills of the Massif Central, Alps and Northern Apennines subject to southwesterly to southerly flows ). The four regions most affected are the Cevennes in France, the Swiss Alps, the Carnic and Julian Alps between Italy, Austria and Slovenia and the Italian Liguria in the foothills of the Northern Apennines, with amounts exceeding 100 mm/day. We also find heavy precipitation amounts along the coasts, especially on the French Riviera and Croatian coasts, and in Corsica.
When considering CNRM-AROME on the 3 km grid and comparing with observations (cf. Fig. 5), we can observe a good representation of daily extremes by the model, with well localised small-scale features and maxima of comparable intensity. For instance, values above 120 mm/day are present in both CP-RCM and observations in the Cevennes region. The figure of differences between CNRM-AROME and observations shows a good match between the CP-RCM and observations. We note, however, a small overestimation inland and a small underestimation in the foothills of the Cevennes or in Croatia. Over mountains, the signal is noisier: the overall tendency of overestimation by the model can be limited by a possible observational rainfall underestimation in mountainous areas. Therefore, it is difficult to draw conclusions for this location due to observation discrepancies. On the 12 km grid (cf. Fig. 6acd), similar comments can be made when comparing the CP-RCM and observations. The statistical significance of the differences between model and observations is added using the bootstrapping method defined in Sect. 2.3.3: the differences are plotted only when significant at the confidence level of 90% . The underestimation of CNRM-AROME in the foothills of the Cevennes is spatially more limited when considering significance.
Analysing the figures with the CNRM-ALADIN RCM (cf. Fig. 6bce) with a native resolution of 12.5 km, as expected, the daily 99th percentile is smoother, with lower values. A clear underestimation is observed in the Cevennes region and in a large part of the plains of Provence, French Riviera, Liguria and in the foothills of parts of the Alps such as the Carnic and Julian Alps. When comparing the differences between CNRM-AROME and observations and CNRM-ALADIN and observations and when calculating added value defined in Sect. 2.3.3, we can observe a strong added value of the Convection-Permitting model compared to the deep convection parameterised model where the highest values of quantiles are observed; that is in the foothills and near the coasts of the Mediterranean region. The signal is noisier in the mountains, where it is even more difficult to draw conclusions due to observational discrepancy. The added value of the CP-RCM on fall extreme precipitation confirms the results of Fumière et al. (2020) on the Cevennes and extends it to the northwestern Mediterranean.
At the daily time scale, we also calculate mean precipitation and the number of wet days (days with precipitation above 1mm/d) (presented in appendix D, Fig. 19 and 20). Less expected than for the 99th percentile, the use of CP-RCM allows an improvement of fall precipitation climatology with a better representation of mean precipitation and number of wet days, clearly overestimated over mountains by CNRM-ALADIN and closer to observations with CNRM-AROME.

Hourly extreme precipitation statistics
At the hourly time scale, we calculate the 99.9th percentile. On average, for a period of 20 years and our 4-month selection, it corresponds to the hourly precipitation amount exceeded about three times per year for each grid point.
On the observed fall hourly 99.9th percentile figure (cf. Fig. 7b), the list of regions most affected by heavy precipitation differs slightly from the list of those observed at the daily time scale. But, in the southeast of France, we observe an extension of the highest values from the Cevennes foothills to the plains, with maxima exceeding 20 mm/h. The Italian Ligurian coast, the eastern part of Corsica and the foothills of the Carnic and Julian Alps also stand out with the same range of values. Unfortunately, there is a lack of hourly observed data for Croatia; otherwise, we would be able to extend the study to another area which is also heavily affected by HPEs (Ivusic et al. 2020).
When focusing on the CNRM-AROME 99.9th percentile (cf. Fig. 7 on the 3 km grid and Fig. 8acd on the 12 km grid), we observe an overall good representation of hourly extremes by the model, with similar maxima locations and similar ranges of high values. In the comparison between model and observations, we can however conclude that CNRM-AROME has a tendency to overestimate the hourly heavy precipitation (except local underestimation in eastern Switzerland). This tendency is present mainly over regions where low values of quantiles are observed. If we focus on regions with high quantile values (for example, the Cevennes foothills and the French Riviera in France or Liguria in Italy), CNRM-AROME is able to represent the 99.9th percentile correctly, close to observations. The slight underestimation observed in Fig. 7c is not significant when using the bootstrapping method on data on the 12 km grid (cf. Fig. 8d).
When comparing CNRM-AROME and CNRM-ALADIN (cf. Fig. 8), we can draw similar conclusions as for the daily 99th percentile, but with a larger added value of the CP-RCM compared to the RCM. Indeed, the RCM seems unable to reproduce realistic hourly high precipitation. While CNRM-ALADIN sets maxima locations quite well inland, we notice a frequent shift towards areas with high relief while the heaviest rains are observed in the foothills or the plains (of the Cevennes or the Carnic and Julian Alps, for example). In terms of intensity, CNRM-ALADIN clearly underestimates hourly quantile values with around 10 mm/h maxima compared to the observed 20 mm/h: the convectionparameterised model fails to correctly reproduce the highest intensity maxima.
Over the sea, we can also observe a real difference in behaviour between CNRM-AROME and CNRM-ALADIN with a much more active CP-RCM. Active sub-daily convection often takes place over the Mediterranean Sea in the fall season, but CNRM-ALADIN is unable to simulate strong hourly convective rainfall, which is all the more true over the sea, far from topographic lifting. This difference of precipitation intensities between the two models over sea, away from orographic or coastal effects, shows that the CP-RCM is able to modify the representation of sub-daily convective phenomena, which is also shown in .

Interannual variability of events exceeding thresholds
After considering rainfall intensities for a given frequency, we now focus on frequencies of given intensity thresholds. We consider events defined by the rainfall amount exceeding thresholds over a defined period and area. For this study, we follow Ribes et al. (2019), who calculated the annual numbers of daily events in order to characterise the evolution of observed extreme precipitation in the southeast of France. The chosen area is therefore the French MED area ( This specific interannual evolution study is possible thanks to the use of both the ERA-Interim reanalysis driving and spectral nudging in the CNRM-ALADIN simulation: propitious synoptic scales for Mediterranean events are well represented in the RCM, and it also benefits the CP-RCM without spectral nudging but with a smaller domain. This allows us to compare year-by-year models and observations. Daily French Mediterranean events First, we focus on the daily time scale and count the annual numbers of days where at least one grid point exceeds given thresholds in the French MED area for an extended fall season, both for CNRM-AROME, CNRM-ALADIN and COMEPHORE. The common period is 1997-2016. We performed this calculation for thresholds from 100 to 250 mm, every 50 mm on a common 3 km grid for CNRM-AROME and COMEPHORE and on a 12 km grid when considering CNRM-ALADIN.
We present the interannual variations of the number of events for only the 100 and 150 mm/day thresholds in Fig. 9. For all thresholds, the mean annual numbers of events are presented in Table 3, as well as interannual correlations between models and observations. The tests used to validate the statitical significance of mean differences and correlation are described in appendix B.
If we focus on Fig. 9ab after interpolation on the 3 km grid, we can first notice an observed high interannual variability of daily extreme precipitation events in the French MED area, with years with a very low number of events and years with very high number of events. For example, for the 150 mm/day threshold, fewer than 5 events are observed in 1998, 2004, 2007, 2009 and 2012 whereas 10 or more events are observed in 2003, 2008, 2011 and 2014. When considering the CP-RCM behaviour and comparing to observations, we notice a good match and a good representation of interannual variability by the CP-RCM. The years with a low number of events are the same for model and observations. In the same way, 2014, a particularly active year in the French Mediterranean region (Vautard et al. 2015), with an observed maximum of HPEs, stands out both for COMEPHORE and CNRM-AROME, especially for the 100 mm threshold (24 events for COMEPHORE and 23 for CNRM-AROME). Interannual correlations between CP-RCM and observations are significant at the confidence level of 99% for thresholds up to 200 mm/day. For the 100 mm/ day threshold, the model reproduces well the annual mean number of events: the differences between model and observations are not significant at the confidence level of 95% for data both on the 3 and 12 km grids. For the highest thresholds, even if the differences are sometimes not statistically significant, it should be noted that CNRM-AROME underestimates the number of daily events with differences between model and observations of −30% for 200 mm/day and −62% for 250 mm/day on the 3 km grid.
For the CNRM-ALADIN RCM, we focus on the data on the 12 km grid (cf. Fig. 9cd and second part of Table 3). Unlike the CP-RCM, the CNRM-ALADIN RCM is not able to reproduce the interannual variability of the of highintensity daily events: interannual correlation between RCM and observations is much lower than between CP-RCM and observations and only significant at the confidence level of 99% for the first 100 mm/d threshold. For all thresholds, the Table 3 Annual mean numbers of daily events for COMEPHORE, CNRM-AROME and CNRM-ALADIN (in bold when the differences between model and observation means are not significant at a confidence level of 95%) and interannual correlation between models and observation (in bold if significant at a confidence level of 95% and underlined if significant at a confidence level of 99%). The results are presented for the 3 and 12 km grids, for all daily thresholds and in the French MED area during the extented fall (SOND) between 1997-2016

Hourly French Mediterranean events
We perform the same exercise for hourly precipitation: hourly events are defined by counting the number of hours per year where at least one grid point exceeds given thresholds in the French MED area during an extended fall season for the 1997-2016 period. We choose thresholds from 10 to 50 mm/h every 5 mm, and calculate the number of events both on the 3 km grid for COMEPHORE and CNRM-AROME and on the 12 km grid for all models and observations.
We present the interannual variations of the numbers of events for only the 10 and 20 mm/h thresholds in Fig. 10 and the main results for all thresholds in Table 4.
Similar conclusions can be drawn as for daily events: we observed a high interannual variability of the number of hourly events. The year 2014 stands out once again with a maximum of observed events for all thresholds (411 events for COMEPHORE for the 20 mm threshold). CNRM-AROME demonstrates its ability to represent this interannual variability: interannual correlations between CNRM-AROME and COMEPHORE present high values above 0.8 for thresholds up to 30 mm/h and significant at a confidence level of 99% on the 3 km grid for all studied thresholds. 2014 also presents the maximum number of events (407 for CNRM-AROME for the 20 mm threshold). Considering the annual mean number of events, we notice an overestimation by the model of the number of events for thresholds between 10 and 20 mm/h (respectively + 40% and + 27%) and an underestimation for thoses above 40 mm/h. The 25 and 30 mm/h thresholds are those with statistically similar annual numbers of events between model and observations (relative differences of +11 and −4%).
The interannual variability is once again not reproduced by the CNRM-ALADIN RCM, with an interannual correlation between model and observations significant only for the 10 mm/h and 15 mm/h thresholds. Moreover, we notice a clear and statistically significant underestimation

Intermediate conclusion on climate statistics
The analysis of daily and hourly quantiles in Sects. 3.1 and 3.2 and the study of interannual variability of daily and hourly events in Sect. 3.3 show that CNRM-AROME performs quite well in reproducing location, intensity, frequency and interannual variability of fall heavy precipitation events, in particular in the French MED area. However, we observe a tendency in CNRM-AROME to overestimate the low values of high precipitation percentiles, except in the Mediterranean area, and to underestimate the daily and hourly numbers of events for thresholds exceeding 200 mm/day and 40 mm/h, respectively.
With these overall positive results and in order to go beyond counting daily or hourly events exceeding thresholds, we can now study convective phenomena through a Lagrangian approach. Indeed, the basic statistical approach (Eulerian) neglects the spatial and temporal connections that may exist within a given event. An object-oriented approach with precipitating system detection and tracking will therefore allow us to study extreme precipitation events by evaluating the model's ability to reproduce the main characteristics of the convective events (duration, intensity, area...) represented by the convective systems, and not only by local rainfall amounts.
We show that the CNRM-ALADIN 12.5 km RCM, unlike the CP-RCM, is not able to reproduce realistic location and intensity values for extreme precipitation, and clearly underestimates the number of heavy precipitation events. Our comments done on the daily time scale are even more true when focusing on hourly convective precipitation. As CNRM-ALA-DIN is not able to produce realistic precipitating systems, the tracking method will not be applied to this model in the following section.
4 Object-oriented approach: precipitating system detection, characteristics and tracking

Implementation
The precipitating system detection and tracking algorithm described in Sect. 2.3.5 and appendix C is applied on the 1-h accumulated precipitation field both to the COME-PHORE observational dataset and to CNRM-AROME on the overlapping domain, called "tracking domain" and plotted in black on the Fig. 1. Focusing on fall Mediterranean HPEs, we apply it to twenty extended fall seasons (from September to December) from 1997 to 2016. On a given area, for each time step, the algorithm firstly detects the precipitating systems, defined by contiguous grid points above given thresholds from 2 to 30 mm/h with In Sect. 4.2, we first focus on the precipitating system characteristics, mainly in function of the different thresholds.
In Sect. 4.3, we focus on fall Mediterranean HPEs and present the characteristics of selected trajectories. To select the tracks, we have to keep in mind that one Mediterranean HPE can correspond to a combination of several convective systems (Duffourg et al. 2016), and therefore to several tracks. Similarly, the large diversity of HPEs in terms of duration and surface extension must be taken into account. To select tracks taking part in these heavy precipitation events, the following criteria are finally set : -temporal and spatial criteria : all the tracks crossing the French MED area (defined in Sect. 2.3.4), in an extended fall season (September to December), are taken into account. -a criterion to select convective precipitation : systems trajectories with mean intensities above 10 mm/h are selected. -a criterion to select tracks occurring during days with heavy precipitation : we select all tracks occurring during days with 24-h accumulated precipitation exceeding the 100 mm threshold at least once over the French MED area (the 24-h accumulations are calculated on 24-h sliding windows in order to take into account events occurring between two calendar days).
We are aware that other options would have been possible, but we tried to choose a reproducible one, consistent with the rest of the study and based on the Mediterranean HPEs' characteristics.
We are also aware that the trajectories starting or ending over the Mediterranean sea, out of the tracking domain, will not be complete trajectories for these events but will be present in the tracking database. As the same information is missing in both the observations and the model, this does not bias the evaluation study, but as there is a missing element in the life cycle of some systems, the tracking database cannot be considered as a climatology of precipitating system trajectories.

Mean behaviour
A first step in studying the characteristics of precipitating systems is to plot the spatial distribution of the annual mean number of cells for the twenty fall seasons (cf. Fig. 11 for the 10 mm threshold). When focusing on the figure for COME-PHORE observations, the southeast of France stands out in the fall season with a maximum exceeding 25 cells per year in the Cevennes region and the surrounding plains. French Riviera and parts of Corsica also present a high number of cells above 10 mm.
For CNRM-AROME, the southeast of France is also the region with maximum values exceeding 25 cells per year above 10 mm, close to observations. We notice, however, a overestimation over central France in the western part of the tracking domain (to be related to the overestimation of the hourly 99.9th percentile in Sect. 3.2). The position of the maximum in the southeast of France is shifted slightly to the southeast and extended easterly with CNRM-AROME. We also count fewer cells over Roussillon and Corsica.
In the Alps, the absence of convective cells with COME-PHORE could be related to the known underestimation of rainfall by rain gauge and radar in the observational dataset over mountainous regions. If so, CNRM-AROME could present a more realistic behaviour in these areas, but the lack of better quality observed hourly databases prevents us from coming to a definitive conclusion on this point.
The main statistics above threshold (annual mean number of cells, mean intensity and mean area) are presented in figure 12, both for COMEPHORE and CNRM-AROME, on the French MED area, where we find the major part of the convective cells in the fall season (around 70% of the total number of cells for the 20 mm threshold).
We first consider the annual mean numbers of cells and use a logarithmic scale to highlight the tail of the distribution for high thresholds. As expected, the number of observed cells strongly decreases when thresholds increase. We can see an overall good match between the number of cells of CNRM-AROME and observations. However, when examining details and computing significance statistical tests (described in appendix B), we can observe an overestimation of the number of cells by the model for the lowest thresholds and an underestimation for the highest thresholds.
The two other main characteristics of precipitating cells, mean intensity and mean area, also seem well represented by the model whatever the threshold and domain. However, model mean intensity biases are all significant at the confidence level of 95%: CNRM-AROME has a tendency to underestimate the intensity of the cells, with higher biases for higher thresholds. We can put the systematic errors of model intensity for low thresholds into perspective if we compare these biases (for example 0.1mm for the 2mm threshold and 0.5 mm for 10 mm) to observational measurement errors, which are of the same order of magnitude since the uncertainties in the precipitation measurement are about 5%. We also notice that the model produces overly small cells for low-intensity rainfall (2 mm threshold) and overly large cells above 10 mm. The areas are statistically comparable only for the 5 and 30 mm thresholds at a confidence level of 95%.

Interannual variability
Focusing on the French MED area, we present the variation of the annual numbers of cells in Fig. 13 for the 10 and 20 mm thresholds and the interannual correlations between model and observation for all thresholds in Table 5.
We notice a strong interannual variability of the number of cells exceeding defined thresholds, to be compared to the variability observed in the hourly precipitation event-based study (cf. Sect. 3.3). The year 2014 also appears here as an exceptional year for heavy rainfall in the French MED area, with more than twice the mean number of cells above 10mm.
CNRM-AROME is able to reproduce the interannual variability of the numbers of cells with significant interannual correlations for all thresholds at a confidence level of 99 % and using values exceeding 0.8 for the thresholds from 5 to 25 mm. 2014 also appears as the year with the highest number of cells simulated by the model in the French MED area. However, when we look at the highest thresholds, a smaller number of cells is simulated by the model with respect to observations. This underestimation may be related to the underestimation found in the event-based study in Sect. 3.3 and in the previous subsection about the mean behaviour of the precipitating systems.

Track characteristics
The comparison of tracks between observations and the model allows us to evaluate the model's ability to produce  Statistics above threshold of annual mean number of cells, mean intensity (mm/h) and mean area (km 2 ) during the extended fall (SOND) between 1997-2016 for CNRM-AROME in red and COME-PHORE in blue for the French MED area. A blue star is added when the differences between model and observation means are not significant at a confidence level of 95% realistic trajectory characteristics, such as number, duration, maximum intensity, mean intensity, maximum area, horizontal moving velocity and severity (cf. Eq. (2) in Sect. 2.3.5) for a track selection focusing on Mediterranean fall HPEs, as defined in Sect. 4.1.
Variations of the number of tracks in function of these different characteristics are presented in Fig. 14a-f. The logarithmic scale for the number of tracks used in all the bar plots highlights the tail of the distribution but should be analysed with caution for the extreme part of the tail with a very low number of tracks. Similarly, the differences between model and observation systematically not significant at the confidence level of 95% for the tail of the distributions should not be over interpreted because of the low number of tracks.
We count 5054 tracks in the French MED area within the COMEPHORE observational dataset (around 250 tracks/year). If we focus on extreme events, we can observe around 10% of the observed trajectories lasting more than 5 h (with a maximum duration of 31 h), 10% presenting maximum intensities exceeding 30 mm/h (reaching nearly 100 mm/h for the most intense one) and 10% with maximum areas above 1200 km 2 (with a maximum of 13,000 km 2 ). The two most severe tracks in COMEPHORE as defined through our severity index correspond to two extreme events known as hydrological reference events: the Gard event of September 8-9, 2002 (Delrieu et al. 2005;Chancibault et al. 2006 We now focus on the track characteristics simulated by the CNRM-AROME CP-RCM. The first important result is the similar total numbers of tracks between the model and observations for the twenty years of extended fall: 4291 tracks simulated by CNRM-AROME compared to the 5054 observed tracks (215 tracks/year for CNRM-AROME compared to 250 tracks/year for COMEPHORE).
For the duration plotted in Fig. 14a, we can see that the track durations are very similar for CNRM-AROME and COMEPHORE with differences between model and observation almost always not significant at a confidence level of 95%: the model is able to simulate not only short events, but also long ones. Indeed, frequencies of tracks lasting more than 12 hours are close together, with 1.1% for COMEPHORE and 0.9% for CNRM-AROME. Likewise,  the maximum duration reaches nearly the same value, 30 hours for the model and 31 for observations. If we now examine the intensity (cf. Fig. 14bc), we observe that the CP-RCM seems able to represent maximum intensity values below 40 mm/h correctly. However, above 40 mm/h, there is an increasing underestimation of the maximum intensities by CNRM-AROME that fails to reproduce the most intense events with a maximum of 75 mm/h for the model compared to the observed maximum of nearly 100 mm/h. From a statistical point of view, the underestimation by the model is significant for most of the intensity values. Similar comments can be done focusing on mean intensity.
For the maximum areas of the tracks in Fig. 14d, we notice an overall good representation by the CP-RCM with differences between model and observation almost always not significant at a confidence level of 95%. However, we observe a tendency of CNRM-AROME to overestimate the largest areas above 5500 km 2 . The maximum area is much larger for the model with 18,000 km 2 than for observations with 13,000 km 2 .
For the maximum horizontal velocity plotted in Fig. 14e, we can see a good match between model and observation with similar moving speed distribution. However, a significant underestimation of the number of tracks by the model is observed for the lowest velocities corresponding to slowmoving systems. More generally, CNRM-AROME tends to simulate tracks with higher velocities than COMEPHORE.
The severity index (cf. Fig. 14f) proposes very similar distribution between model and observations. The most severe tracks (severity above 10,000 m 3 ) represent around 0.4% of the total number of tracks for both the model and the observations. The most severe track with CNRM-AROME also corresponds to the hydrological reference event of the Gard event in September 8-9, 2002 (duration : 30h, maximum intensity : 72 mm/h, mean intensity : 17.5 mm/h, maximum area : 9921 km 2 , maximum velocity : 7 m/s, severity : 29,303 m 3 ). Unlike other characteristics and except three severity values standing out in the observations and linked with the highest intensities, we notice similar severity values between model and observation even for the tail of the severity index distribution. We should perhaps question whether the good results for the CP-RCM in terms of representing this tail of distribution are due to bad reasons: the intensity underestimation might be compensated by the overestimation of cells areas by the model or differences in the systems motion.

Intensity duration frequency (IDF) and intensity area frequency (IAF) plots
In order to combine two track characteristics, we present an IDF plot: I for maximum Intensity on the y axis, D for Duration on the x axis and F for Frequency in color (cf. Fig. 14g for CNRM-AROME and COMEPHORE). Frequency is defined as the combined occurence of maximum intensity and duration values in chosen bins, normalised by the total number of tracks. In the same way, we present an IAF plot for maximum Intensity Area Frequency plot in Fig. 14h.
The comments on maximum intensity and duration bar plots can also be applied when analysing the IDF plots, which have the advantage of presenting a combination of the two parameters. The patterns of IDF plots are very similar between model and observations with the same range of values for the two studied parameters. Some long events are present in both figures, but the most extreme events (above 40 mm/h) are missing within CNRM-AROME. An additional piece of information provided by these figures is that the long-lasting tracks correspond to intense events for both the model and the observations (with maximum intensities above 30 mm/h), which can be related to mesoscale convective organisation such as large MCSs involved in Mediterranean HPEs (Hernandez et al. 1998).
The IAF plots show similar patterns for CNRM-AROME and COMEPHORE, but we can see a shift towards larger areas, increasing as the intensities increase : the model tends to simulate cells that are too large compared to observations. As we have noticed the correspondence between the longlasting and intense events on the IDF plots, we observe here a relationship between the tracks presenting the largest areas and the most intense ones.

Bubble plot
In order to combine more tracks characteristics, we also present a bubble plot inspired by Darmaraki et al. (2019): four characteristics of each individual track are plotted with the maximum intensity on the y axis, the duration on the x axis, different sizes of circle corresponding to the maximum area and different colors in function of the severity index (cf. Fig. 15).
These plots are of little interest over all trajectories, but can be relevant for a defined subset of tracks. Here, we choose to plot the ten most severe tracks for CNRM-AROME and COMEPHORE in order to examine the possible compensation of model intensity underestimation and area overestimation in the tail distribution of the severity index. The plotted tracks correspond to around 0.2% of the total number of tracks for both subsets. The majority of the most severe tracks (7/9 for COMEPHORE and 7/7 for Fig. 14 Variation of the number of tracks in function of a duration, b maximum intensity, c mean intensity, d maximum area, e velocity and f severity for CNRM-AROME in red and COMEPHORE in blue. A blue star is added when the differences between model and observation means are not significant at a confidence level of 95%. g presents IDF plot and h IAF plot for CNRM-AROME and COME-PHORE. In all the figures, we consider all tracks of cells exceeding 10 mm/h occurring during days exceeding 100 mm/day in the French MED area during the extended fall (SOND) between 1997 and 2016 ◂ CNRM-AROME) are related to known HPEs after comparing the dates with the established list of HPEs up to 2010 by Ponzano et al. (2020). Similarly, most severe tracks of COMEPHORE correspond to events present in the Meteo-France database of extreme precipitation (http://pluiesextremes.meteo.fr/) and four of these are related to events with accumulated rainfall exceeding 400 mm. The three observed tracks with duration around ten hours and moderate intensities correspond to quasi-stationary events, showing the relevance of adding velocity in the severity index definition. In COMEPHORE, three tracks present much higher severity values than the others : these high values are linked with very high maximum intensities for two of them and stationarity associated to long duration for the third one. As expected, we can observe larger cell areas with CNRM-AROME than with COMEPHORE for the ten most severe tracks, and a larger range of intensities for the observations with three tracks whose maximum intensity exceeds 80 mm/h. With the CP-RCM, there is a rather good match between long-lasting and intense events, which is less true in the observations, where maximum intensities or stationarity can prevail in the most intense events. In fact, these figures allow us to summarise the main results concerning the tail of the track distribution: CNRM-AROME is able to simulate fall extreme events in the French MED area, which are long-lasting and severe. However, it does not succeed in proposing maximum intensities of the same range of values as those observed and tends to overestimate the maximum areas of cells.

The need for reliable precipitation datasets
High temporal and spatial resolution observational datasets are necessary for carrying out climate model evaluation studies dealing with kilometric-scale CP-RCMs. The study of hourly precipitation extremes at the kilometric scale imposes strong constraints on the choice of observational reference databases because of the high variability in space and time of surface rainfall. For example, with daily and 25 km resolution data, the European EOBS database (Haylock et al. 2008;Cornes et al. 2018) is not relevant (Hofstra et al. 2010;Lenderink 2010;Flaounas et al. 2012;Prein and Gobiet 2017). Similarly, the satellite precipitation analyses of CMORPH (Joyce et al. 2004), TRMM (Huffman et al. 2010) and GPM (Hou et al. 2011) present a too-coarse resolution of 25km. A problematic bias can also be added for datasets using a convection parameterised model to disaggregate hourly fields from the daily amount (Bosilovich et al. 2008;Parker 2016), such as the SAFRAN precipitation analysis (Durand et al. 1993;Quintana-Segui et al. 2008;Vidal et al. 2011). A solution for improving precipitation datasets and compensating for the insufficient density of rain gauges is the use of datasets merging rain gauges and high-resolution (around 1 km), temporally and spatially continuous weather radar data. Relevant hourly and kilometric gridded datasets of this kind already exist in France, Switzerland and Germany, but not yet in Italy, Spain or Croatia. The limitations of observations should also be kept in mind when analysing comparisons between models and observations before jumping to conclusions. For example, rain gauge measurement errors such as possible underestimation of rainfall due to undercatchment in case of strong wind and/or low rainfall intensity or snowfall conditions (Neff 1977;Yang et al. 1999) is particularly true in mountainous areas. Radar measurement also presents errors such as a masking effect leading to precipitation underestimation in complex topography areas or overestimation in case of hail. In the study, temporal and spatial discontinuities inside and between datasets must be taken into account as much as possible. For example, it is worth mentioning that in the COMEPHORE product, the number of radars increases over time and the processing method has evolved since 2007 (cf. appendix A). Its quality and therefore its ability to detect precipitating events are improving over time, which can affect the temporal homogeneity of the results.
It is also important to note the difference between grid resolution (i.e. grid spacing) and effective resolution of observational datasets. For gridded datasets based on stations, the effective resolution depends mainly on local stations densities. Isotta et al. (2014) indicate an effective resolution of around 10-20 km for the 5 km daily precipitation dataset APGD-EURO4M. To our knowledge, no studies about effective resolution were carried out for the other datasets. This difference between grid resolution and effective resolution should prevent us from over-interpreting the results because some details and phenomena cannot be represented in the datasets.
In the near future, improvements are expected in precipitation measurement and datasets. For example, the recent use of X-band dual-polarization radar (Anagnostou et al. 2018) limits the problematic underestimation over mountains. We can also imagine a European precipitation gridded product based on radar data from the OPERA European operational Weather Radar Network (Saltikoff et al. 2019) and European rain gauge data. Over the sea, rainfall estimation products based on satellite imaging should increase temporal and spatial resolutions.

Added value of the CP-RCM compared to the RCM
The added value of the CNRM-AROME 2.5 km CP-RCM for representing fall Mediterranean HPEs is proved compared to the CNRM-ALADIN 12.5 km RCM, extending the added value found over the Cevennes region by Fumière et al. (2020). This study also confirms the better representation of Mediterranean fall intense events mentioned in  in two 2.2 km CP-RCMs compared to their driving RCMs. Focusing on fall HPEs, it also illustrates the improvements in the representation of fine-scale details of heavy precipitation in the CP-RCMs compared to RCMs found in the recent multi-model studies on the common pan-Alpine domain Pichelli et al. 2020). Disentangling the origin of the CP-RCM added value is beyond the scope of this study and would require additional and costly experiments. One reason is undoubtedly the higher horizontal resolution, which allows a more detailed representation of topography and surfaces. This improvement likely leads to stronger mesoscale convergences and orographic circulations. However, the CP-RCM higher horizontal resolution cannot explain all the observed differences between CNRM-AROME and CNRM-ALADIN. The explicitly simulated deep convection and physical parameterisations adapted to mesoscale processes in the CP-RCM probably also contribute to improving the simulation of organised convection by CNRM-AROME. For example, CNRM-AROME reproduces the location of hourly extreme rainfall in the foothills and plains, while CNRM-ALADIN tends to limit them over mountain peaks. This can probably be related to the CP-RCM's ability to simulate mesoscale processes such as cold pools, which can trigger convection far from the topography. In addition, the CP-RCM seems to better represent hourly organised convection over sea, away from topographic triggering. The processes involved, such as cold pools and mesoscale circulations, are explicitly simulated by the CP-RCM dynamics while they are subgrid processes and therefore parameterised in the RCM. Further, it is worth mentioning that the CP-RCM also clearly improves precipitation means and the number of wet days over mountains at the daily time scale with respect to the RCM (cf. appendix D). The overestimation in the RCM could be linked to a too-strong orographic lifting effect in the convection parameterisation, while it is correct in the CP-RCM. We consider that extending the above assumptions with further process-oriented analyses of the CP-RCM/RCM differences may guide improvements in the RCM physical parameterisations.

The CP-RCM weaknesses
While the CP-RCM's behaviour is mainly positive, the study also highlights some limitations of CNRM-AROME. The main one is that the model cannot correctly reproduce very high precipitation intensities at daily scale above for 200 mm/d and at hourly scale above 40 mm/h. This limitation has also been shown in the previous study by Fumière et al. (2020): on a Cevennes box, the model is no longer able to reproduce realistic percentiles within the COMEPHORE confidence interval above the value of 230 mm/d for daily rainfall and 30 mm/h at hourly scale. This underestimation of high intensity rainfall is also found in Brousseau et al. (2016) with the 2.5 km AROME model used in NWP. But we cannot generalise this CPM bias, which depends on the model: for example, Lean et al. (2008) highlighted that the 4 km UM model simulated too heavy rainfall : an opposite behaviour to the CNRM-AROME one. The diversity of CP-RCM's behaviours concerning precipitation intensity is also demonstrated in the CP-RCM multi-model evaluation by Ban et al. (2020). Other CNRM-AROME limitations include a tendency to overestimate hourly extreme precipitation over regions with low quantile values, a tendency to overestimate low precipitation and a tendency to produce too-large cells, especially for high intensities. This weakness can similarly be seen in the studies by Brousseau et al. (2016) for AROME, Lean et al. (2008) and Stein et al. (2015) for the UM model. CNRM-AROME also presents a tendency to simulate higher values of the precipitating systems' horizontal moving speed, whereas the translation speed of the MCSs is remarkably well simulated by the 4 km WRF model over the US in the study by Prein et al. (2020).
It is difficult to determine if the CNRM-AROME weaknesses are all linked and to understand their origin, but two main reasons can be proposed: the need for physical parameterisation improvments and the increase in resolution. Indeed, while deep convection is explicitly resolved by the model dynamics in convection-permitting models, subgrid processes such as shallow convection, turbulence and microphysical processes still need to be parameterised. Moreover, no specific changes have been made in these parameterisation schemes when using the AROME version from NWP to climate mode, while the assimilation of observed data can no longer correct the model. Improvements or adaptations in these parameterisations can be a way to solve the model limitations. Another possible explanation comes from the 2.5 km horizontal resolution of CNRM-AROME and the 60 vertical levels, which may not quite be sufficient to explicitely resolve deep convection and correctly represent the characteristics of some convective cells. Indeed, 2.5 km is the grid resolution and not the effective resolution, which is the resolution from which smaller scales processes are not adequately represented by the model. A study by Ricard et al. (2013) based on kinetic energy spectral analysis (Skamarock 2004) shows that the effective resolution of AROME at 2.5 km resolution is around 9-10Δ x and the smallest wavelength represented in the model is 5 km, i.e. twice the grid resolution. The efficient semi-implicit semi-Lagrangian numerical scheme used limits the model's resolving capability and prevents some phenomena to be correctly represented by the model. An increase in resolution might improve these problems. Indeed, Brousseau et al. (2016) showed the improvement of AROME when the horizontal grid resolution increases from 2.5 to 1.3 km and vertical resolution from 60 to 90 levels: higher resolution allows more realistic convective cells in terms of size, number, intensity and lifetime. We can assume that increasing the resolution in CNRM-AROME could also improve the representation of cell characteristics in the model, but computational capabilities do not yet allow the resolution of the climate convection-permitting model to be increased.

Conclusion
A 38-year continuous simulation of the CNRM-AROME Convection-Permitting Regional Climate Model (CP-RCM) at the resolution of 2.5 km is performed over a large pan-Alpine domain, which is to our knowledge the longest CP-RCM existing simulation. This long continuous climate simulation with a high-resolution CP-RCM makes it possible to carry out robust climate studies of extreme events such as Mediterranean Heavy Precipitation Events (HPEs).
The main objectives of the study are: -to extend the results of the previous study using CNRM-AROME and focusing on French Mediterranean HPEs (Fumière et al. 2020); -to consider these high-impact events as objects by applying a precipitating system detection and tracking algorithm to assess whether the CP-RCM is able to reproduce the HPEs' main characteristics.
Therefore, the CNRM-AROME CP-RCM simulation is compared to selected high spatial and temporal resolution observational datasets, such as the French kilometric and hourly COMEPHORE dataset now available for a 20-year period, and to its forcing simulation of the CNRM-ALADIN convection-parameterised Regional Climate Model (RCM). The comparison between the two models and observations leads to the following results: -CNRM-AROME is able to represent northwestern Mediterranean fall extreme precipitation at a daily scale and even better at an hourly scale: CNRM-AROME demonstrates its ability to represent HPEs in terms of location and intensity of extreme rainfall with daily 99th and hourly 99.9th percentiles values close to observational values, especially in areas with high percentile values. Daily and hourly event-based studies also show CNRM-AROME's ability to represent extreme precipitation frequency and interannual variability; -outside the Mediterranean region, CNRM-AROME tends to overestimate hourly precipitation extremes. In the French Mediterranean, the model underestimates the number of events for high thresholds at daily scale above 200 mm/day and at hourly scale above 40 mm/h; -the study shows the added value of the CNRM-AROME CP-RCM for fall extreme precipitation compared with the CNRM-ALADIN RCM, probably thanks to the combined effects of high-resolution, explicitly resolved deep convection and physical parameterisations adapted to mesoscale processes.
This study verifies and extends the main results of Fumière et al. (2020) on a longer period, a larger domain and with an improved version of the CNRM-AROME CP-RCM, making the results more robust.
The good behaviour of CNRM-AROME allow us to go beyond the basic Eulerian statistical approach and to set up an object-oriented Lagrangian approach in order to take into account the spatial and temporal connections that may exist within a given event. A precipitating system detection and tracking algorithm is applied to 1-h accumulated precipitation fields on a period of twenty fall seasons, both for the CNRM-AROME CP-RCM and the reference COME-PHORE precipitation dataset. This approach turns out to be an efficient tool for evaluating high-resolution climate models. Considering French Mediterranean HPEs as objects, CNRM-AROME's ability to represent the main characteristics of fall convective events is highlighted in terms of location, number, mean intensity and area for precipitating systems and in terms of number, duration, intensity, area, velocity and severity for tracks. Its main limitation is the underestimation of high precipitation with an important bias in the number of cells for high thresholds and the lack of tracks with maximum intensity values exceeding 40 mm/h. CNRM-AROME also tends to overestimate moderate rainfall, with two many cells for thresholds below 5 mm/h. The overestimation of cells areas increasing along with intensities is also noted. It is not yet possible to limit these biases by increasing spatial and vertical resolutions in climate models, but improving and adapting physical parameterisations could be a way to reduce these weaknesses.
Knowing CNRM-AROME's strengths and weaknesses in reproducing the northwestern Mediterranean extreme fall precipitation, we can now consider applying the detection and tracking tool to historical and scenario CP-RCM simulations in order to characterise possible changes in the organisation of convection in the context of climate change. Another interesting possibility would be to apply the objectoriented approach to other CP-RCMs to determine if the results obtained in the current study are robust to the choice of the CP-RCMs. Finally, taking advantage of this long simulation, the study could be completed by the study of trends in the main characteristics of the Mediterranean heavy precipitation events extending studies based on observation such as Ribes et al. (2019).

A COMEPHORE
COMEPHORE (COmbinaison en vue de la Meilleure Estimation de la Précipitation HOraiRE) is a high-resolution (1 km×1 km) and hourly gridded dataset of precipitation merging radar and rain gauge observations provided by Meteo-France over metropolitan France (Tabary et al. 2012;Fumière et al. 2020). It is available from 1997 to 2016. The first ten years of COMEPHORE (1997COMEPHORE ( -2006 were already used as an observational reference in the first evaluation study of the CNRM-AROME CP-RCM (Fumière et al. 2020). COMEPHORE is also used in Ban et al. (2020) and Pichelli et al. (2020). As this dataset is the reference for our extreme precipitation evaluation study over the southeast of France, we include a detailed description below.
The COMEPHORE observational dataset is based on local radar 5 min reflectivities and rain gauge data (approximately 4000 rain gauges with a daily timestep including approximatively 1200 rain gauges with an hourly timestep, cf. Fig. 16). The beginning of the period, 1997, is constrained by the availability of archived radar data for the French aramis network. The processing chain is detailed in Tabary et al. (2012). To summarise, after corrections linked to radar measurement errors, daily accumulations of local radars reflectivities are computed and completed by rain gauge data. Afterwards, a temporal interpolation to hourly data is carried out taking into account hourly measurements. Since 2007, the processing method has benefited from some improvements in the radar data processing. Adding different radar measurements at different elevations and satellite cloud classification to determine the probability of rain, the daily fusion with rain gauges uses a new daily reference obtained via the ANTILOPE method merging radar and rain gauge data with a separation between stratiform and convective rainfall (Laurantin 2008). Then, twenty-four intermediate rainfall accumulations calculated with ANTILOPE are used for the temporal interpolation to an hourly time step. advantages COMEPHORE allows a good quality climatology of daily surface precipitation accumulation over long time steps (seasonal) to be reproduced, with values similar to those obtained with the Lassegues (2018) dataset, a high-resolution dataset resulting from a kriging of rain gauge daily rainfall data over France (Roger 2018). Thanks to the use of high spatial resolution and coverage radar, COMEPHORE provides information in regions where the density of stations is low and allows a better representation of extreme values with a refined representation of temporal and spatial precipitation fields and therefore a better localisation of heavy rainfall peaks and higher maximum intensities values (Roger 2018). With a focus on the southeast of France, Fumière et al. (2020) show that COMEPHORE is a more suitable product for the study of fall extreme precipitation than the SAFRAN database (Durand et al. 1993;Quintana-Segui et al. 2008;Vidal et al. 2011).
limitations If corrections of errors related to radar measurement are made in the COMEPHORE processing, a mask effect problem in areas with high altitude leads to the underestimation of precipitation amounts in mountainous areas above 1500 m (Roger 2018). COMEPHORE does not yet have correction taking altitude into account, and these regions also encounter a low density of rain gauges and undercatchment problems. The regions with the lowest-quality data are limited to parts of the Alps, limited parts of the Pyrenees and the western part of Corsica (cf. Fig. 16). While the number of rain gauges is more or less consistent over time, the number of radars over France has increased, especially in the first few years and in the southeast: from 10 radars in 1997 to 24 from 2006. From November 2015, 3 X-band radars located in the Alps were added to the network. Moreover, beginning in 2007, the use of ANTILOPE intermediate hourly accumulations has improved the processing method, with a better calibration and consideration of convective rainfall and a decrease in underestimation in mountainous areas (Roger 2018). These two last points lead to an improvement of COMEPHORE precipitation representation over time, but can also lead to temporal inhomogeneity that must be kept in mind when using it.
In order to verify if COMEPHORE's expected quality improvement over time can affect our results, we calculate the spatial distribution of the annual mean number of cells per year above the 10mm threshold during the extended fall (SOND) on several time periods (cf. Fig. 17). We do not get a real change in COMEPHORE's distribution over mountains, even for 2016 when X-band radars were implemented in the Alps. Therefore, we might suppose that measuring heavy precipitation over mountains is still challenging (Yu et al. 2018). However, over Corsica, with the implementation of a new radar in 2003, an increase of the number of observed cells above 10mm between 1997-2006 and 2007-2016 is noticed. CNRM-AROME clearly underestimates the number of cells in this region.
Despite these limitations, COMEPHORE is the best available hourly and kilometric precipitation product over France and over a long period (20 years), which makes it especially suitable for studying fall Mediterranean HPEs.

B1 Bootstrapping method
The following bootstrapping method is used to validate the statistical significance of model bias differences at the 90% confidence level. For each observation/model common period defined in Sect. 2.3.2, observations and model datasets are simultaneously resampled with replacement for the same time period so that resampled datasets contain the same meteorological conditions for all datasets. The resampling is done for time blocks of one-season in order to avoid over-confident estimates due to temporal correlation.  We then calculate percentiles for each resampled datasets on each point of the domain and compute the 5th-95th percentile ranges of the differences between the percentiles of the two datasets. For each grid point, if the zero value is included in the range, we conclude that the difference between the studied percentiles of the two datasets is not significant at the 90% confidence level. For computational reasons, we apply the bootstrapping method only to data on the 12 km grid, and a choice of 100 resamples is done. A test with 1000 resamples is carried out for daily 99th percentile for a couple of datasets and gives very similar results.

B2 Interannual correlation and significance
To evaluate the interannual variability of the models in terms of the number of daily and hourly events (cf. Sect. 3.3) or the number of precipitating cells (cf. Sect. 4.2), interannual correlations are calculated for n pairs of model and observational annual data, computing the Pearson sample linear cross-correlation coefficient r. This correlation coefficient for n pairs of independent data (we can indeed consider each year to be independent) is then tested against the null hypothesis (i.e. no correlation) using the Student test with two levels of confidence, 95% and 99%.

B3 Significance of mean differences
In Sects. 3.3, 4.2 and 4.3, the paired-sample Student test with a confidence level of 95% is used to compare the means between two related groups of samples (i.e. annual means for model and observation) for different parameters, such as number of events or tracks, intensities or areas per threshold.

C1 Choice of the algorithm
The object-oriented approach used here to study HPEs involves the implementation of a precipitating system detection and tracking algorithm. Different methods already exist some coming from NWP model verification or from nowcasting methods, and others developed specifically for this purpose.
Several methods have already been applied to CP-RCM. In the US, the object-oriented method used to verify the operational precipitation predictions of high-resolution models, called MODE (Davis et al. 2006(Davis et al. , 2009, has been applied by Prein et al. (2020) for 4 km WRF climate simulations in order to evaluate the model ability to reproduce MCSs and is based on 1-h accumulated precipitation exceeding the 5 mm/h threshold. The same algorithm has been applied to characterise precipitation features in western Canada by Li et al. (2020). In Germany, a method of convective cells tracking on 5 min precipitation fields exceeding the 8 mm/h threshold has been developed to verify if the COSMO-CLM CP-RCM (2.8 km) reproduces sub-daily convective cells characteristics well (Brisson et al. 2018). A storm-tracking algorithm developed by Stein et al. (2014) and using a thresholding of 1 mm/h is applied to regional climate model simulations of the Met Office Unified Model and to observations to assess the representation of West African Storm lifecycles by convection-permitting simulations (Crook et al. 2019). Fitzpatrick et al. (2020) apply a similar method to outgoing longwave radiation fields coming from a CPM of 4.4 km resolution in order to track MCSs in the West African Sahel. Studies on MCSs in the Midwest US were also carried out by Haberlie and Ashley (2019) using a tracking method based on observed and simulated radar reflectivities.
Further, other tracking algorithms have already been applied to the AROME model. The TOOCAN algorithm (Fiolleau and Roca 2013), developed to detect MCSs and based on satellite infrared images, is applied by Beucher et al. (2014) to hourly precipitation fields of AROME in the tropics. In order to evaluate high-resolution precipitation forecasts and avoid double-penalty problems, an approach based on the identification of homogeneous rainfall areas through similarity constraints of the distribution has recently been developed at CNRM (Arbogast et al. 2016;Raynaud et al. 2019), but with no temporal tracking at this point.
Object-oriented tools have been used for many years in nowcasting. In CNRM, the ISIS (Instrument de Suivi dans l'Imagerie Satellitaire) tracking algorithm developed by Morel and Senesi (2002a) and initially based on satellite infrared imagery to characterise European MCSs (Morel and Senesi 2002b) is still applied to 5 min radar reflectivities to monitor precipitating active areas in storms. Brousseau et al. (2016) use this algorithm to characterise improvements in the convective cell simulation of AROME-simulated radar reflectivities between 2.5 and 1.3 km resolutions.
The last method following Morel and Senesi (2002a), developed in CNRM and already applied in precipitation nowcasting and for AROME evaluation, is chosen for the evaluation of the CNRM-AROME CP-RCM.

C2 Methodology
The complete description of the detection and tracking algorithm is provided in Morel and Senesi (2002a). For our study, the tool is adapted to an 1-h accumulated precipitation field, instead of the 5-min radar reflectivities used in its nowcasting version. The method can be briefly described as follows and in Fig. 18: -First step: interpolation and smoothing Hourly precipitation fields are interpolated to common regular 3km grid. Specific smoothing is performed and allows the precipitation field to be less noisy, while still preserving high values. A first smoothing replaces each grid point by a weighted average of the eight adjacent grid points. Then, a second smoothing consists in applying a Gaussian filter with a small standard deviation (0.5) on a moving window of 5 × 5 grid points. This small value of standard deviation allows for a slight smoothing. -Second step: detection of the cells Cells are defined every hour at time (t). A cell is defined as contiguous grid points above a given threshold with a minimum area of 20 km 2 . To represent the different ranges of precipitating cell intensity, seven thresholds are chosen: 2, 5, 10, 15, 20, 25 and 30 mm/h. -Third step: tracking of the cells Trajectories are built by identifying links between cells of different time steps. A complex algorithm mixing overlapping and correlation conditions is applied. The overlapping condition uses the cells on the image at time (t-1h) moved with the velocity calculated between the images at time (t-1h) and (t-2h) and compared to the cells on the image at time (t): if there is a cell on the image at time (t) overlapping the moved (t-1h) cell, with a criteria on the minimum recovery rate (15% ), the matching is done. This condition based on the velocity allows the specificity of each cell in each meteorological situation to be taken into account. Another condition is based on spatial correlation calculation between the cells at time (t) and (t-1h) in a research box whose size varies accord- ing to the size of the cell and the presence of potential fathers after the first overlapping condition: the match is done for the highest correlation rate found, with a minimum correlation of 0.4. An example of trajectories for CNRM-AROME for precipitating cells above 10 mm is given for the 2003 fall season in Fig. 18. -Fourth step: diagnostics For each threshold, the tool provides information about each cell schematized as an ellipse: centre of gravity, length of the minor axis and the major axis, angle, velocity, area, mean and maximum intensity. A post-processing algorithm assembles the cells belonging to the same trajectory and calculates the main characteristics of each trajectory (duration, mean and maximum intensity, mean and maximum area,...). Diagnostics are computed for all tracks with minimum duration of one hour, minimum threshold of 2 mm/h and minimum area of 20 km 2 . Fig. 19 Spatial distribution of the daily precipitation mean (mm/d) during an extended fall (SOND) for a CNRM-AROME, b CNRM-ALADIN and c observation for the 12 km grid. The second line presents d, e the differences between models and observation (mm/day) and f the added value of CNRM-AROME compared to CNRM-ALA-DIN (mm/day). Added value is plotted in green if CNRM-AROME better than CNRM-ALADIN, in brown otherwise. On all figures, orography (500 m and 1000 m) is plotted in brown lines D Spatial distribution of the daily precipitation mean and wet days adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

Fig. 20
Spatial distribution of the mean number of wet days (%) during an extended fall (SOND) for a CNRM-AROME, b CNRM-ALA-DIN and c observation for the 12 km grid. The second line presents d, e the relative differences between models and observation (%) and f the added value of CNRM-AROME compared to CNRM-ALADIN (%). Added value is plotted in green if CNRM-AROME better than CNRM-ALADIN, in brown otherwise. On all figures, orography (500 m and 1000 m) is plotted in brown lines