On the potential of ensemble forecasting for the prediction of meteotsunamis in the Balearic Islands: sensitivity to atmospheric model parameterizations

This study investigates the potential of ensemble forecasting using full realistic high-resolution nested atmosphere–ocean models for the prediction of meteotsunamis in Ciutadella (Menorca, Spain). The sensitivity of model results to the parameterizations of the atmospheric model is assessed considering the ten most significant recent meteotsunami events for which observations are available. Different schemes adapted to high-resolution Weather Research and Forecasting model simulations were used for the representation of cumulus, microphysics, planetary boundary layer and longwave and shortwave radiations. Results indicate a large spread of the ensemble simulations in terms of the final magnitude of the meteotsunamis. While the modeling system is shown to be able to realistically trigger tsunamigenic atmospheric disturbances in more than 90% of the situations, the small-scale characteristics of these disturbances are significantly modified with the change of parameterizations, leading to significant differences in the magnitude of the simulated sea-level response. No preferred set of parameterizations can be identified that leads to either the largest or the most realistic magnitudes in the majority of situations. Instead, the performance of a given set of parameterizations is found to change with the meteotsunami event under consideration. Importantly, the observed magnitude of the extreme sea-level oscillations lies within the range of a nine-member ensemble in 70% of the cases. This ensemble approach would then allow to generate a realistic range of possibilities in the majority of events, thus improving the realism of meteotsunami predictions compared to single deterministic forecasts.


Introduction
Advancing toward the prediction of the occurrence and magnitude of meteotsunamis is of recognized importance to limit the harmful consequences of these natural hazards (Pattiaratchi and Wijeratne 2015; Vilibić et al. 2016). The most critical effects of meteotsunamis are very strong currents associated with large high-frequency sea-level oscillations, which produce flooding, significant damages to harbor structures and boats and even fatalities (Jansà 1986;Rabinovich and Monserrat 1998;Monserrat et al. 2006;Jansà et al. 2007;Vilibić and Šepić 2009;Orlić et al. 2010;Williams et al. 2019;Linares et al. 2019).
Due to the nature of the phenomenon involving long oceanic waves generated by atmospheric disturbances and multi-resonant amplification processes, the operational prediction of meteotsunamis requires to include both atmospheric and oceanic components. A very high-resolution atmospheric model should ideally first be able to describe the highfrequency sea surface atmospheric pressure variations which force the generation of long waves at the surface of the ocean. The ocean model then needs to represent the physical processes which successively amplify the sea-level oscillations. These are: (1) Proudman, Greenspan and shelf resonances (Proudman 1929;Greenspan 1956;Rabinovich 1997), (2) shoaling effects over the sloping topography in the very coastal areas (Dean and Dalrymple, 1991) and (3) the resonance of the harbor or inlet (Rabinovich 2009).
The first meteotsunami forecasting system was implemented more than 30 years ago in the Spanish National Institute of Meteorology in the Balearic Islands (Jansà 1987) based on the analysis of synoptic conditions in the atmosphere. Today, the Spanish Meteorological Agency (AEMET) continues delivering official meteotsunami alerts (or "rissaga", which is the local name of the phenomenon) with different levels when the predicted synoptic conditions have similar characteristics to the conditions that led to significant meteotsunamis in the past. While this approach is generally able to detect situations propitious to rissagues a couple of days in advance, it remains qualitative. Several numerical systems have been implemented over the recent years with the objective to provide more quantitative estimates.
Complementary to the AEMET system, the Balearic RIssaga Forecasting System (BRIFS; Renault et al. 2011;Ličer et al. 2017;www.socib .es/syste ms/forec ast/brifs ) uses the WRF atmospheric model in a 2-nested grid configuration over the Western Mediterranean Sea with a 4 km resolution in the inner grid to successively force two ROMS model simulations (Fig. 1). The first one covers the Balearic shelves and aims at representing sea-level amplifications until reaching the entrance of Ciutadella inlet in Menorca Island. It then feeds the second simulation focused on the harbor of Ciutadella. This system in its current form has been providing predictions since 2015. Mourre et al. (2019) reported that the system was able to generate some significant signal (> 50 cm) in 60% of the thirty recent cases with observed oscillations larger than 70 cm amplitude, which is the first warning level of the Spanish Meteorological Agency. The BRIFS system was found to provide an overall underestimation of the magnitude of the phenomenon. However, this underestimation was not systematic and it also occurred that the prediction overestimated the observed oscillation.
Other systems were developed more recently. Denamiel et al. (2019) presented the Adriatic Sea and Coast modeling suite, including a coupled WRF-ROMS configuration, which then forces the barotropic unstructured model ADCIRC. The model was shown to be able to capture atmospheric disturbances, but with a misrepresentation of the timing and details of the perturbations in turn affecting the magnitude of the modeled meteotsunamis. Romero et al. (2019) presented a computationally cost-effective modeling system which is now run operationally (https ://meteo .uib.es/rissa ga/), allowing to produce probabilistic predictions for rissagues in Ciutadella. Instead of running full atmospheric and oceanic models, this system only retains the most important features focusing on the physical mechanisms responsible for the generation of the meteotsunamis, simplifying both the atmospheric and oceanic components. This system was tested for the 128 registered rissagues and showed successful skill to indicate the actual warning level.
While the oceanic response to specific atmospheric pressure disturbances is reproduced in the numerical models with a satisfying accuracy, provided the representation of an accurate bathymetry and very high-resolution coastal morphology (Vilibić et al. 2008;Whitmore and Knight 2014;Ličer et al. 2017), the challenge of meteotsunamis prediction mainly lies in the atmospheric component. Small-scale pressure disturbances related to squall lines, gravity waves or convective nucleus are at the origin of meteotsunamis (Monserrat et al. 1991Jansà et al. 2007;Belušić et al. 2007;Šepić et al. 2009;Horvath et al. 2018). While these processes can be theoretically generated in state-of-the-art mesoscale atmospheric models with sufficiently high resolution, the uncertainties associated with the resulting pressure signals are still generally large, especially in terms of the detailed location and timing of the small-scale processes (Horvath and Vilibić 2014;Belušić et al. 2007;Belušić and Mahović 2009;Denamiel et al. 2019). Moreover, it can be expected that due to the nonlinear and chaotic nature of the atmosphere, small differences in initial conditions or model parameters yield different representations of these disturbances and as a consequence different magnitudes of the modeled meteotsunamis.
In numerical weather prediction, one common approach to cope with this sensitivity and address forecast uncertainty is to use ensemble prediction (e.g., Molteni et al. 1996;Toth and Kalnay 1993;Kalnay 2003). Ensembles of simulations are generated to depict the sensitivity of the results to initial conditions or model characteristics (e.g., Houtekamer  Teixera and Reynolds 2008). Considering the Boothbay 2014 meteotsunami, Horvath and Vilibić (2014) investigated the sensitivity of high-resolution atmospheric conditions to model parameterizations, lead time, initial and boundary conditions and nesting strategy. They found a large scatter of results in their experiments. Analyzing modeling experiments of a meteotsunami event in 2003 in the Adriatic Sea, Belušić et al. (2007) found that the generation of wave-convection system was very sensitive to the microphysics and convective parameterizations as well as the initialization time.
In this context, the questions we want to address in this paper are the following. How sensitive are the meteotsunamis predictions to the parameterizations of the atmospheric model? Is there a set of optimal parameterizations for all meteotsunamis events? Does ensemble prediction allow to improve meteotsunamis forecasts and provide an efficient approach to characterize the associated uncertainty?
The BRIFS modeling system representing the whole process of meteotsunamis generation, propagation and amplification is used in this work, which focuses on the meteotsunamis occurring in Ciutadella (Menorca, Spain). Ciutadella is a place where significant meteotsunamis (magnitude larger than 1 m) are observed several times per year, and where some of the most worldwide catastrophic events were reported (Jansà 1986;Jansà et al. 2007;Pattiaratchi and Wijeratne 2015). The 15-June-2006 event, which is the largest rissaga reported in the last 30 years, will be considered first. Then, the sensitivity of the predictions is investigated considering the nine other recent cases for which observations are available with a measured sea-level oscillation larger than 1.20 m.
The paper is organized as follows. The modeling system and experimental approach are introduced in Sect. 2. Section 3 presents the results of the experiments, which are further discussed in Sect. 4. Finally, Sect. 5 concludes the paper.

BRIFS system
The atmosphere-ocean modeling setup of the Balearic RIssaga Forecasting System (BRIFS; Renault et al. 2011;Ličer et al. 2017) is used in this study. BRIFS was initially developed at the Balearic Islands Coastal Observing and Forecasting System (SOCIB; Tintoré et al. 2013) in 2011 and provides operational daily predictions of the magnitude of high-frequency sea-level oscillations in Ciutadella, Menorca (www.socib .es/syste ms/forec ast/brifs ). The predictions are deterministic, covering a 48-h forecast horizon every day. Since the sea-level oscillations at a given time are described by two simulations (forecast of the day and forecast of the day before), it provides a flavor of the sensitivity of the results to the initial conditions and lead time of the simulations.
BRIFS configuration is illustrated in Fig. 1. BRIFS presently employs the WRF model (Weather Research and Forecasting Model, ARW core, v3.6; Skamarock et al. 2008) with a two-domain two-way nested grid configuration to represent the atmospheric pressure variations over the Western Mediterranean Sea. WRF is a fully compressible non-hydrostatic state-of-the-art mesoscale atmospheric model. The initial and boundary conditions come from the NCEP FNL analysis (http://rda.ucar.edu/data/ds083 .2, resolution of 6 h and 1°) for the retrospective simulations considered in this paper. This product comes from the same data assimilation and forecast system as the GFS product which is used for initial and boundary conditions of the operational BRIFS predictions, but is slightly delayed to 1 3 ingest a larger number of observations than GFS. The parent grid covers a region spanning from Gibraltar strait to the longitude of Corsica-Sardinia and from 33° to 45° N approximately, with a horizontal resolution of 20 km. The nested domain is centered around the Balearic Islands, covering the usual areas of generation and propagation of tsunamigenic atmospheric disturbances, with a resolution of 4 km. The vertical grid is common to both domains with ninety-seven terrain-following levels. A 12-h spin-up period is considered in these simulations to get rid of the spurious waves generated after model initialization. The default setup presently implemented in the operational system uses the following parameterizations for sub-grid scale processes: Mellor-Yamada Nakanishi and Niino scheme for Planetary Boundary Layer (Nakanishi and Niino 2006), Thompson scheme for microphysics (Thompson et al. 2008), Grell-Devenyi ensemble scheme for cumulus on the parent domain (Grell and Devenyi 2002), Goddard scheme for shortwave radiation (Chou and Suarez 1994), Rapid Radiative Transfer Model for longwave radiation (Mlawer et al. 1997), Monin-Obukhov scheme (Monin and Obukhov 1954) for surface layer physics and Noah land-surface model. Moreover, the time between successive radiation physics calls is fixed to 4 min. The time step for the higher resolution nested domain is 10 s. The topography is represented by the 2-min-resolution USGS database. The horizontal diffusion is parameterized to act along model levels, with 2D Smagorinsky dynamic coefficients. The atmospheric pressure fields extracted from WRF outputs with a 2-min resolution are then used to force the simulations of the ocean model.
The ROMS model version v3.7 (www.myrom s.org; Shchepetkin and McWilliams 2005) is used to represent the ocean response to the atmospheric pressure variations. ROMS is a three-dimensional free-surface model solving the primitive equations using Boussinesq and hydrostatic approximations over terrain-following coordinates. It uses a split-explicit hydrodynamic kernel allowing a stable and accurate resolution of the barotropic motions.
In particular, ROMS was demonstrated to provide a proper representation of the Proudman resonance (Renault et al. 2011;Ličer et al. 2017;Bubalo et al. 2018).
Due to the two-dimensional nature of the processes of interest, our configuration is essentially barotropic, i.e., it does not consider any density variations and has a limited number of vertical levels. A parent grid covers the shelves around Mallorca and Menorca Islands with a horizontal resolution of 1 km. It is able to represent the amplification of the wave amplitude under the effects of Proudman or Greenspan resonances, as well as the final topographic amplification. It provides the boundary conditions at the entrance of Ciutadella inlet in terms of magnitude and periods of sea-level oscillations and associated barotropic flows. This information is transferred to the child grid, which covers the whole inlet with a 10 m resolution, through Flather (1976) and Chapman (1985) boundary conditions. Since the ROMS modeling chain is essentially barotropic, the substantial jump in the resolution (by a factor of 100) between parent and child grids (which could cause instabilities in a baroclinic setup) amounts to variable-in-time, but uniform-in-space sea-level forcing at the lateral boundary of the child domain. Sea-level disturbance propagation across the parent-child boundary was explicitly tested for any spurious reflectances, discontinuities or other instabilities, but no unusual behavior was found. The reason for using the three-dimensional version of ROMS is that it includes a time-averaging of the barotropic mode over several time steps (Shchepetkin and McWilliams 2005), which allows to dampens instabilities in our very high-resolution configuration. More concretely, the time-averaging is applied here over 6 time steps, i.e., 1 s, the barotropic time step being 1/6 s. Two different child grids are used in this study. The first one, used for the simulation of meteotsunamis events in 2006 and 2008, is similar to that used in Renault et al. (2011) and Ličer et al. (2017). It does not include Son Blanc dike, which was built in 2011 approximately 1 km south off Ciutadella inlet. The second one, used for the rest of the simulations, incorporates the new dike, as represented in Fig. 1. The bottom stress in the child domain is represented using a quadratic drag coefficient of 0.001. While the bathymetry of the parent model is provided by the GEBCO 30″ database (www.gebco .net), that of the child model is generated from high-resolution bathymetric estimates from the University of Cantabria.
The time series of sea-level and atmospheric pressure in Ciutadella are extracted at the location of the instruments deployed by SOCIB and Ports de les Illes Balears (PortsIB) (3.8314° E-39.9999° N), i.e., approximately 400 m from the end of the inlet (point 2 in Fig. 1) for all cases except the 15 June 2006 event. In that case, the sea-level time series are extracted from the location at the end of the inlet (point 1 in Fig. 1), consistent with previous works by Vilibić et al. (2008) and Renault et al. (2011), given that the only available observations for that date are from eyewitnesses. The time series at the entrance of the harbor are extracted from the child model at 3.8189° E-39.9927° N (point 3 in Fig. 1).
In the analysis of the sea-level time series, the magnitude of the meteotsunamis is defined as the maximum sea-level oscillation in a time window of 12.5 min. This ensures that the minima and maxima are considered within the same oscillation, which period is approximately 10.5-10.6 min, i.e., the fundamental period of the harbor (Rabinovich and Monserrat 1998;Liu et al. 2003).

WRF sensitivity experiments
In the WRF model, the main parameterizations used to represent the effects of atmospheric processes occurring at the model sub-grid scale are the following (Skamarock et al. 2008): • the cumulus parameterization (cu), used to mimic the effects of convection in grid cells with unstable vertical profiles. The use of this parameterization is recommended for grids with a spatial resolution coarser than ~ 10 km (e.g., Molinari and Dudek 1992). It is only applied for the parent simulation in our case. • the microphysics parameterization (mp), used to represent the physics of clouds, with direct effects on condensation, precipitation and evaporation rates. • the parameterization of shortwave and longwave radiations (ra_lw and ra_sw), applied to compute surface radiative fluxes. Notice that these radiative parameterizations are not called every time step so as to optimize the computing time, but at regular intervals which are sufficiently short to resolve the cloud-cover changes. The radt parameter is used to specify these intervals (1 min interval per km of grid size is generally assumed to be satisfactory). • the parameterization of the atmospheric surface layer (sfclay) and the land/water surface model (surface), used to determine the exchanges of heat, moisture and momentum between the atmosphere and ground surfaces of land or water. • the planetary boundary layer (pbl) parameterization, used in conjunction with the sfclay and surface parameterizations to determine boundary layer fluxes of heat, moisture and momentum between the lowest part of the atmosphere influenced by the topography and affected by rapid changes of air properties and vertical mixing, and the free atmosphere above it.
WRF possesses a wide range of available schemes (ten on average) for each of these classes of parameterizations (Skamarock et al. 2008). Moreover, these parameterizations interact in complex ways, affecting simultaneously the representation of convective processes, cloud effects or heat and moisture fluxes for instance (Stensrud 2007). While these parameterizations provide a natural way to tune the model, their selection is a particularly difficult task, often guided in practice by the results of sensitivity tests with regards to the specific objectives of the simulations, considering also that testing all possible combinations is generally prohibitive (e.g., Jankov et al. 2005;Gallus and Bresch 2006;Borge et al. 2008;Argüeso et al. 2011;Ratnam et al. 2017).
In this work, we perform sensitivity experiments selecting a subset of cu, mp, ra and pbl parameterizations that are suited for high-resolution simulations and that have been used in previous meteotsunamis studies using high-resolution non-hydrostatic atmospheric models (Belušić et al. 2007;Šepić et al. 2009;Tanaka 2010;Orlić et al. 2010;Renault et al. 2011;Horvath and Vilibić 2014;Horvath et al. 2018;Denamiel et al. 2019). The radt parameter is also considered here. An extended set of 22 simulations is first generated for the case of the 15 June 2006 rissaga, which was the largest event of the last 30 years. Due to the computational burden, a reduced subset of 9 combinations is then selected and applied for the 9 largest events recorded in the recent years.

Observations and meteotsunamis events
Both PortsIB and SOCIB have sensors installed in Ciutadella harbor, measuring sea-level and atmospheric pressure signals with a resolution of 1 min and 30 s, respectively. These have allowed to precisely monitor all recent extreme events (since 2010 for PortsIB instrument, and since September 2014 for SOCIB sensor). Moreover, extensive fields experiments were carried out in 2007(Marcos et al. 2009), also providing accurate sealevel time series in Ciutadella harbor for the 25-May-2008 rissaga. On 15 June 2006, while the only available measurements were the barometers time series at Palma and Mahon airports, eyewitnesses reported a 4-m rissaga magnitude in Ciutadella harbor. Table 1 summarizes the main events considered in this article. These are the 15-June-2006 and 25-May-2008 rissagues, as well as all the events with a magnitude of the high-frequency sea-level oscillations larger than 1.20 m, as observed by SOCIB and PortsIB instruments since August 2014.
The time series of high-pass filtered atmospheric pressure variations provided by SOCIB (all dates after Sept 2014) and PortsIB (for the 18 Aug 2014 case) barometers in Ciutadella are shown in Fig. 3 for the eight major events since 2014. These time series are accompanied by their corresponding wavelet power spectrum. The time series have been filtered with a fourth-order Butterworth filter using a cutoff period of 3 h to remove the low-frequency component of the signal and focus on the high-frequency disturbances of relevance for the meteotsunami problem. The wavelet power spectra have been computed using the Morlet wavelet. The central line in these graphs represents the time of the maximum observed sea-level oscillation in the different cases. It allows to detect the specific atmospheric pressure signals responsible for the major sea-level oscillation. The figure illustrates the diversity of tsunamigenic signals leading to significant rissagues. Pressure variations associated with gravity waves with periods around 1 h are present on 16-Jul-2018, 01-Aug-2015, 09-Jul-2019, 01-Apr-2016 and 23-Jul-2017, as highlighted by the wavelet spectra. Significant pulses of short duration and with a period closer to the 10.5-min period are also found in specific cases, coinciding in time with the maximum of the observed sea-level oscillation. The events on 16-Jul-2018 and 18-Aug-2014 are the most remarkable examples. This also happens on 14-Jul-2019, 22-Apr-2015 and 23-Jul-2017. Convective activity is the most likely cause of these short-term atmospheric variations. While significant pressure variations with magnitudes around 3-4 hPa were observed in the two most intense cases on 14 July 2019 and 16 July 2018, variations of the order of 2 hPa, and even 1 hPa in the case of 1 April 2016 were sufficient to produce a significant sea-level response in Ciutadella in other cases. The pressure gradients, the period of the oscillations and the characteristics of the spatial propagation are other parameters that significantly affect the final magnitude of the meteotsunami. Short temporal scales of variations are a common feature in these time series, allowing to excite oscillations with periods sufficiently close to the fundamental period of the harbor to generate large amplifications (Rabinovich 2009). The case of 1 April 2016 is quite surprising since no marked short scale signal is present in the atmospheric pressure time series. Looking into further details of neighboring barometers for this date, larger high-frequency variations with a magnitude around Fig. 3 High-pass filtered atmospheric pressure time series observed in Ciutadella during the eight major meteotsunami events since 2014. For each date, the 12-h time series is centered on the time of the maximum observed sea-level oscillation, which is indicated by red marks on the lower and upper x-axis. The corresponding wavelet power spectrum is shown on the lower panels. The 10.5 min period associated with the fundamental mode of the harbor is indicated by a dotted line 2 hPa were observed at Colonia Sant Pere station on the northeastern coast of Mallorca Island. This indicates that a significant signal was present over the Menorca Channel, probably limited on its eastern side, but nevertheless able to generate a significant sea-level oscillation reaching the mouth of the inlet, despite the weak signature in the atmospheric pressure measured in Ciutadella.
It is interesting to notice that the maximum rissaga does not generally coincide with the pressure oscillation of maximum magnitude, but rather with the specific variations that combine a sufficient amplitude with a period producing a large amplification in the inlet, so close to the fundamental period of the harbor. This also indicates that the small-scale details of these signals are of major importance for the final magnitude of the phenomenon. The success of the atmospheric-oceanic simulations will then depend on the capability of the model to realistically represent these small-scale processes, which is fundamentally extremely challenging.

15-June-2006 rissaga
The meteotsunami event of 15 June 2006 is the largest event reported over the last 30 years. It has naturally drawn special attention, with specific efforts to understand and reproduce the phenomenon (Jansà et al. 2007;Monserrat et al. 2006;Vilibić et al. 2008;Renault et al. 2011). The only available observations at that time were two atmospheric pressure time series collected at Palma and Mahon airports. Eyewitnesses and local media reported a 4-m-magnitude rissaga. A roughly 7 hPa pressure jump was observed both at Palma and Mahon, associated with the northeastward propagation of an intense squall line at a speed around 25 m/s Vilibić et al. 2008).
We use this spectacular case to perform an initial extended sensitivity analysis including 22 combinations of the WRF model parameterizations mentioned above. The results will then be used to select a reduced ensemble of 9 sets of parameterizations which will be applied to the remaining cases of meteotsunamis. Figure 4 presents the results of modeled sea-level oscillations in Ciutadella and at the entrance of the inlet, together with the magnitude of high-frequency atmospheric pressure variations. As explained in Sect. 2.1, the sea-level differences are computed here over a 12.5 min period for the sea-level time series inside the inlet, which are mainly constrained by the fundamental period of the inlet. Outside the inlet and concerning the atmospheric pressure, the variations are computed over a 25 min time window. The atmospheric pressure time series are first filtered using the same 3 h-cutoff period fourth-order Butterworth filter previously used for the observations. The simulations are ranked by the magnitude of the modeled rissaga in Ciutadella. In each column, the values are colored by the percentage of the maximum value in that specific column to facilitate the visualization of the maxima and variations with a common color scale. Moreover, the absolute values of the maximum sea-level and pressure variations are indicated in each cell. For that event, previous numerical experiments led to a magnitude of the wave of 2 m in Renault et al. (2011) using a pilot configuration of the BRIFS system, and of 2.30 m in Vilibić et al. (2008), propagating the atmospheric pressure signal observed in Mahon with the corresponding estimated speed and direction.

3
The time series of filtered atmospheric pressure and sea-level anomalies obtained in Ciutadella from the simulation leading to the largest rissaga are illustrated in Fig. 5. Figures 4 and 5 show that the system is able to generate significant high-frequency sea-level oscillations in Ciutadella, with a maximum magnitude of 2.70 m. This is still underestimating the magnitude observed by eyewitnesses, but sufficient to motivate a warning for an extreme event (red color in AEMET warning levels). Figure 4 also highlights the sensitivity of the modeling results to the choice of the set of parameterizations, leading to final magnitudes of the meteotsunami from 0.64 to 2.70 m, the lowest magnitude being then only 24% of the largest one. Two of the 22 simulations generate a rissaga larger than 2 m, 16 of them between 1 and 2 m, and 4 of them less than 1 m.  While the WRF model is able to represent high-frequency pressure variations larger than 5 or 6 hPa in several simulations, it is striking that these are not the simulations that generate the most intense rissagues in the ocean model. The two rissagues larger than 2 m are obtained with high-frequency pressure variations of 4.8 and 2.5 hPa magnitude, respectively. This reflects the fact that not only the magnitude plays a significant role here, but also the oceanic area affected by the atmospheric disturbances and their direction of propagation which both modulate the Proudman resonance (Vilibić et al. 2008;Ličer et al. 2017), as well as the exact period of the pressure oscillations which affects the sea-level signal at the entrance of the inlet and so the signal amplification in the inlet (Rabinovich 2009). In particular, the simulations with pressure variations larger than 5 hPa led to rissagues of 1.93 m, 1.71 m and only 1.15 m in the last case. Also, the rank of the magnitude of the oscillations at the mouth of the inlet differs from the rank of the magnitude of the oscillations inside the harbor, which highlights the importance of the period of these external oscillations and their proximity to the fundamental modes of the inlet for the determination of the final magnitude of the rissaga.
Ranking the simulations from the largest to the lowest result in Fig. 4 aims at allowing to evaluate the performance of individual parameterization schemes. The tendency of a specific parameterization scheme to generate larger (smaller, respectively) rissagues compared to others would be highlighted by a tendency of the simulations using this scheme to be in the upper (lower, resp.) part of the figure. Based on these considerations, the only two tendencies that are revealed in this representation of the 15-June-2006 rissaga are the following: (1) the radiation parameterization using Goddard and RRTM schemes for shortand longwave (red diamond) seem to perform better than that using RRTMG (blue diamond), and (2) in the two simulations in which it was employed, the Tiedtke cu scheme (green circle) was less efficient than Grell-Devenyi and Kain-Fristch schemes.
The rest of the parameterization schemes provide an overall mixed performance, producing both relatively small and large rissagues depending on the combination of schemes that is considered. This could indicate that the specific combinations of parameterization schemes have in this experiment a more important impact than the individual schemes themselves. The impact of the radt parameter (the time interval between two calls of the radiation scheme) is quite surprising. It reduces for instance the final magnitude from 2.70 to 1.93 m when changing from 4 to 10 min, all other schemes remaining the same. There is no clear physical meaning of this effect. We rather interpret it as an effect of the chaotic behavior of the model dynamics, where small modifications in the numerical approach can lead to significant changes in the details of the small-scale structures which then impact the simulated rissaga.
Due to the cost of running the BRIFS system (around 7 h for a 24-h forecast simulation on SOCIB computing facilities using 32 cores running at 2.8 GHz), the ensemble of sets of parameterizations needed to be reduced for the analysis of the other nine cases of rissaga. A reduced ensemble of nine combinations was defined, as indicated in Fig. 4. To characterize the potential systematic impact of single parameterizations, if any, this selection includes the combination leading to the largest simulated rissaga (#1) as well as the combinations changing only one parameterization scheme with respect to #1. For the 2006 case, this subset selects the four simulations leading to the largest rissagues, as well as five other less efficient scenarios. Figures 6 and 7 illustrate the sensitivity of the WRF model to represent the squall line and associated small-scale atmospheric processes for this reduced 9-member ensemble. Figure 6 shows the distribution of high-pass filtered atmospheric pressure anomalies over the Balearic Islands at 20:00 UTC. High-frequency instabilities are present in all the simulations, with a different representation of the patterns and magnitude of the small-scale atmospheric disturbances. The squall line is present with a marked signature in 6 of the 9 simulations. While simulations #1 and #3, which only differ by the radt interval, have a very similar behavior, the smallest scales are slightly different in these simulations. Simulation #5 also shows a similar pattern, indicating a close behavior of Kain-Fritsch and Grell-Devenyi cumulus schemes here. Notice, however, that the timing of the squall line is not exactly the same, the main pressure jumps reaching Ciutadella around 15 min before with the Kain-Fritsch scheme. Simulations #4 and #6, which only differ in the microphysics scheme, also show similar representations of the squall line. In general, simulations #4, #6 and #9 develop an extended squall line which travels over the Menorca Channel a bit earlier than in simulation #5. The simulations #2, #7 and #8 do not represent such a marked squall line, but still generate high-frequency pressure variations of large magnitude. A wider view of the propagation of the high-frequency atmospheric disturbances larger than 3 hPa is presented in Fig. 7. The 9 simulations provide a remarkable agreement in the position and timing of the origin of the disturbances (i.e., a few tens of kilometers south-west of Algiers around 15:00 UTC on 15 June 2006). These results are in good agreement with the atmospheric pressure values included in the METAR and SPECI reports (World Meteorological Organization 2014) issued at Algiers airport, reporting a ~ 6 hPa atmospheric pressure jump between 17h00 and 17h50 UTC. The perturbations then travel northeastwards, with differences in the propagation path and speed across the simulations. New disturbances are also generated later (around 19:30-20:00 UTC) south of Ibiza Island in some of the simulations (#4, #6 and #9). These new disturbances then also propagate northeastwards. As a result, the tsunamigenic atmospheric pressure perturbations reach the Menorca Channel with different shapes and at slightly different times in the ensemble of simulations, as  Fig. 6. Despite the overall satisfactory capacity of the model to generate and propagate these perturbations, their small-scale characteristics differ over the Menorca Channel in the ensemble of simulations, which in turn significantly affects the final magnitude of the simulated meteotsunamis,

All cases
The magnitude of the modeled high-frequency sea-level oscillations obtained for all ten major rissaga events with the selected sets of WRF parameterizations is presented in Fig. 8. As in Fig. 4, in each column the values are colored by the percentage of the maximum value in that specific column to facilitate the visualization of the maxima and variations with a common color scale.
A set of parameters systematically leading to the largest (lowest, respectively) sealevel oscillations among the ensemble would result in a row with red (dark blue, resp.) color. This is not the case and the matrix rather shows red colors scattered among the different rows, which indicates that there is no systematic best (worse, resp.) performance of a specific set of parameterizations. The simulations that provide the most significant rissagues change with the event that is considered. In particular, the two sets of parameterizations leading to the most significant rissagues on 15-June-2006 do not provide the largest oscillations in none of the other nine cases. On the contrary, the simulations leading to the lowest oscillations in 2006 (simulations #8 and #9) provide the largest oscillations in four other events. While the simulations #7 (using Mellor-Yamada-Nakanishi-Niino planetary boundary layer scheme) and #8 (using the RRTMG short-and longwave radiations) provide the maximum values each of them for 3 of the simulated meteotsunamis, the set of parameterizations #4 (using the WSM6-class microphysics scheme) is the one that never reaches more than 75% of the maximum value obtained in these ensemble experiments. For each set of parameterization, values both smaller and larger than 50% of the maximum values are generated over the different dates.
The results are now compared to the observed values presented in Table 1. For each simulation, the cell in Fig. 9 is now colored according to the percentage of under/overestimation, computed as: 100 × simulated magnitude−real magnitude real magnitude . It allows to visualize the realism of the simulations for the different dates and sets of parameterizations with the same colorbar. While the overall tendency is toward underestimation, which occurs in 77% of all the simulations, this underestimation is not systematic. Multiple cases of overestimation of the rissaga are also obtained in six of the ten significant events analyzed in this study. The most realistic results are spread among the rows of the matrix, indicating that there is no preferred set of parameterizations that systematically leads to a better match with the observations. All sets of parameterizations considered here provide some cases of underestimation and overestimation among the simulated events, indicating that the performance of a specific set of parameterizations depends on the specific situation. The sets of parameterizations #1, #5 (using Kain-Fritsch cumulus scheme) and #7 (using Mellor-Yamada-Nakanishi-Niino scheme for planetary boundary layer) provide the most realistic results within the ensemble in two occasions for each of them. The mean of the percentage of under/overestimation closest to zero is provided by the set of parameterization #8 (− 16.8%), mainly affected by a case of strong overestimation for the 25-May-2008 event.
In 30% of the events analyzed in this study, none of the simulations is able to reproduce a rissaga with a magnitude larger or equal to the observation. The case of 14 July 2019 is particularly striking. The maximum simulated rissaga is only 0.52 m high, significantly underestimating the 1.74 m observed value. The majority of the simulations do not even produce a 10 cm signal. Looking into further details, only three of the simulations generate high-frequency atmospheric pressure disturbances for that date (#3, 4 and 6). In the best case, the magnitude of these variations is less than 1 hPa, which is not sufficient to trigger a rissaga with the observed magnitude. All simulations also underestimate the 22 April 2015 event, although with a relatively better accuracy, some of the simulations producing a rissaga larger than 0.7 m which is generally considered as a threshold for a significant signal. In that case, while most of the simulations represent high-frequency pressure variations with a magnitude larger than 2 hPa in Ciutadella, the conditions of propagation and the period of these oscillations are not favorable to the large Proudman and harbor amplifications that occurred that day.
In spite of these deficiencies, in the remaining cases the ensemble provides at least one simulation which overestimates the real signal. In other words, the observation lies within the range of the ensemble in 70% of the cases of meteotsunamis simulated in this study. This is an encouraging result, suggesting that the phenomenon is predictable with such atmosphere-ocean models in the majority of cases, provided that a sufficient number of simulations can be performed. This aspect is further illustrated in Fig. 10 which plots the results in the form of a boxplot. While the observation falls within the range of the ensemble in 7 of the 10 cases of study, it falls within the interquartile range in 5 of these 7 cases. The median is affected by the model tendency to underestimate the final magnitude of the phenomenon and is therefore almost systematically underestimating the observed value (in 9 out of the 10 cases).

Discussion
The capacity of the modeling system to generate significant signals of meteotsunamis in the large majority of cases reveals that the high-resolution atmospheric model is able to generate tsunamigenic atmospheric pressure disturbances associated with squall lines, gravity waves and convective activity. This indicates that the type of atmospheric instabilities at the origin of these high-frequency pressure disturbances is predictable from the synoptic conditions provided by the large scale input model. This is a first requisite for a successful prediction of the magnitude of the meteotsunamis. But this is not sufficient. The characteristics of propagation of the signal, affecting the effect of the Proudman resonance over the shelf, and the "period" (defined here as the time between two relative minima/ maxima, even for a single oscillation) of these disturbances, affecting the period of the sea-level oscillations at the mouth of Ciutadella inlet, and so the magnitude of the harbor amplification, are of critical importance for the final magnitude of the sea-level oscillations in Ciutadella. The importance of these details was highlighted by the wavelet analysis of the atmospheric pressure signals observed during the main recent events in Ciutadella. It is also consistent with the results obtained in the Adriatic Sea, where small failures in the representation of the speed, direction and intensity of the atmospheric disturbances resulted in significant misrepresentations of the simulated meteotsunamis (Denamiel et al. 2019). This is where the challenge of meteotsunami forecasting really lies. The chaotic nature of the atmosphere implies that two almost identical situations lead to non-negligible differences in the small-scale characteristics of the atmospheric pressure disturbances in terms of pattern, timing and magnitude. The representation of these small-scale features in highresolution models is then found to be extremely dependent on the effects of sub-grid processes which are represented through physical parameterizations. A striking illustration of this sensitivity is the impact provided in this study by the change of the time interval applied between two calls of the radiation schemes. The overall pattern of the disturbance is very similar in these simulations, but the small-scale characteristics differ, impacting in turn the magnitude of the simulated rissaga. Ensembles of simulations including changes in the atmospheric model parameterizations then provide a significant spread in terms of the magnitude of the sea-level oscillations. It improves the output that could be provided by a deterministic forecast, which is limited by the fact that it only gives one single realization within an ensemble of possible simulations which are all consistent with the large scale meteorological situation.
Ensemble prediction is a practical approach to represent uncertainty in numerical model forecasts, which is highly necessary when the target is the magnitude of meteotsunamis due to the importance of small-scale atmospheric processes. In this study, we have introduced uncertainty by changing atmospheric model parameterizations which describe some of the most important sub-grid scale processes. Another usual way to generate the ensemble would be by perturbing the initial conditions (Molteni et al. 1996;Toth and Kalnay 1993). This has not been considered here and would probably lead to a further extended spread of the ensemble. Predictions of the NCEP or ECMWF global ensemble forecasting systems would allow to evaluate the impact of the uncertainty affecting the initial conditions in future experiments. Note, however, that these two approaches (i.e., introducing uncertainty in the model parameterizations or initial conditions) are not independent. In particular, the perturbation of the physical parameterizations during the 12-h spin-up period leads to different initial states at the beginning of the predictions analyzed in this study, so that the initial conditions are also effectively altered in our experiments.
Even if multiple cases of overestimation have been obtained in this study, the BRIFS modeling system was found to tend to underestimate the final magnitude of the meteotsunamis. This characteristic was also highlighted in a recent evaluation of 4 years of BRIFS daily predictions (Mourre et al. 2019). Further experiments increasing the frequency of the atmospheric forcing inputs to the ocean model have revealed that the magnitude of the simulated meteotsunamis was increased by more than 15% when considering 1-min resolution atmospheric pressure forcing compared to 2-min, indicating that the present 2-min resolution forcing is a clear factor contributing to the overall tendency of the system to underestimate the magnitude of the meteotsunamis. Even if the increase in the forcing frequency will further add to the computational burden of the simulations, it should definitely be considered in the future to improve the realism of the predictions.

Summary and conclusion
We have evaluated the capacity of an ensemble of full realistic high-resolution atmosphere-ocean modeling system representing the full life cycle of meteotsunamis, to simulate the magnitude of meteotsunamis en Ciutadella, by changing the main physical parameterizations of the atmospheric model. Different schemes adapted to high-resolution WRF model simulations were used for the representation of cumulus, microphysics, planetary boundary layer and longwave and shortwave radiations. A set of 22 combinations was first applied to the extreme case of the 15-June-2006 rissaga, which was generated by an intense squall line accompanied by some convective activity. While the modeling system was found to be able to properly trigger instabilities and generate high-frequency tsunamigenic pressure disturbances, the results also showed a large sensitivity of the magnitude of the extreme sea-level elevations to the choice of the parameterizations. The maximum simulated sea level reached 2.70 m, which is sufficient to motivate an extreme warning, but still underestimates the 4 m magnitude reported by eyewitnesses. While all parameterizations were found to impact the results, only two of them seemed to tend to negatively impact the performance for this particular event (RRTMG scheme for radiation and Tiedtke scheme for cumulus).
A reduced ensemble of nine sets of parameterizations was then considered to analyze the model performance for the other nine most significant rissaga events observed these last years. Here again, a significant spread was obtained in the ensemble, without any clear positive/negative impact of a specific parameterization scheme on the system performance. No set of parameterization was systematically, or in a majority of cases, leading to the largest meteotsunamis, nor to the most realistic forecast. Instead, the simulation leading to the best performance was changing with the event under consideration. Overall, the simulations were able to trigger instabilities leading to tsunamigenic pressure disturbances in more than 90% of the cases, but the details of these disturbances were altered by the changes in the WRF model parameterizations. This generated a large spread of the ensemble given that these details have a crucial impact on the final magnitude of the meteotsunamis.
While the overall tendency was found to be toward underestimation of the observed magnitude, multiple cases of overestimation were also obtained in 60% of the significant events analyzed in this study. Importantly, the observed magnitude of the extreme sea-level oscillations in Ciutadella fell within the range of the nine-member ensemble in 70% of the main rissaga events.
Ensemble forecasting outputs including the range of the ensemble and the probability of generating a magnitude of sea-level oscillations in the different warning levels defined by AEMET would then allow to provide reliable (in the sense that the simulated range of possibilities contains the real magnitude of the rissaga) probabilistic predictions in the majority of events. These kinds of probabilistic predictions would complement the probabilistic predictions provided by the University of Balearic Islands using simplified models initialized from an ensemble of representative soundings (Romero et al. 2019), whose low computational cost allows to generate larger ensembles. BRIFS ensemble predictions would outperform the outputs of a single deterministic forecast, which are limited to provide one realization within a large ensemble of possibilities and are so inevitably affected by some degree of randomness. The present challenge is to implement such a reduced ensemble for timely operational predictions, which will require both optimizations of the modeling system and increased computing resources.