1 Introduction

Information about key near-surface meteorological variables at regional and/or local levels can be obtained by regional climate models (RCMs), which simulate climate over limited areas of the globe by applying the dynamical downscaling technique (Giorgi and Gutowski 2015). The common simulation framework of the Coordinated Regional Climate Downscaling Experiment (CORDEX) can provide projections of different working groups over specific domains that are useful for regional climate change assessments (Zittis et al. 2019).

A number of studies have dealt with the evaluation of the CORDEX RCM output against observations, revealing biases of the modelled temperature and precipitation climatology (Kotlarski et al. 2014; Gbobaniyi et al. 2014; Fotso-Nguemo et al. 2017) and extremes (Vautard et al. 2013; Diallo et al. 2016; Klutse et al. 2016) influenced by the choice of physics parameterizations. For example, a EURO-CORDEX domain study indicated systematic temperature and precipitation biases for the Weather Research and Forecasting (WRF) model, linked to different physical mechanisms related to convection, radiation and land surface (Katragkou et al. 2015). Davin et al. (2016) also highlighted the effect of land surface schemes on the summer temperature bias simulated over southern Europe. More recently, multi-model studies also investigated how land surface parametrization can affect the climate over CORDEX-Africa (Soares et al. 2019), EURO-CORDEX (Knist et al. 2017) domains.

The region of the Middle East and North Africa (MENA) is historically exposed to background increasing temperatures and diminishing precipitation, which have been intensifying in the recent past and are projected to be even more enhanced in the twenty-first century (Lelieveld et al. 2012, 2016; Zittis et al. 2016; Almazroui et al. 2016, 2017). Research on how the simulated climate is influenced by the model physics in the MENA region has been the focus of the following few studies, mainly as part of regional climate model optimisation within the CORDEX initiative. Zittis et al. (2014) and Zittis and Hadjinicolaou (2017) have tested the WRF model ability to realistically represent the observed climatology under different convection, micro-physics and radiation schemes at 50 km resolution simulations. Zittis et al. (2014) investigated the performance of different physics configurations in WRF by testing combinations with the planetary boundary layer, cumulus and microphysics schemes, showing that the cloud microphysics setup has the strongest impact on temperature biases, while precipitation is most sensitive to the cumulus parameterization scheme, mainly in the tropics. Zittis and Hadjinicolaou (2017) explored the sensitivity of the WRF model to the short- and long-wave radiation parameterizations by testing two schemes and revealed that each radiation scheme performs best depending on the season, location and dominant land use type of each of the model’s grid point. Bucchignani et al. (2016) analysed the performance of COSMO-CLM with respect to changes in physical and tuning parameters related to surface, convection, radiation and cloud parameterizations. By incorporating new parameterizations of albedo and aerosols, the model obtained mean absolute error values of \(\sim\) 1.2 °C for temperature and \(\sim\) 15 mm/month for precipitation.

The need for more systematic evaluation of the representation of land surface processes in climate models is highlighted by Davin et al. (2016) and especially over the MENA domain which is not broadly studied regarding land surface schemes (LSS) as mentioned in Bucchignani et al. (2018). The effect of different land surface schemes coupled with RCMs was investigated by Almazroui (2016) revealing a non negligible role, especially for precipitation. Specifically in this study Almazroui (2016), the BATS and CLM LSS were tested with the RegCM4 model with BATS resulting to be better performing for the MENA-CORDEX domain. In a recent study, Constantinidou et al. (2020) examined how the WRF modelled climate over the MENA region varied according to different land surface treatment, with the incorporation of four different land surface schemes in six numerical experiments. In that sensitivity analysis, the LSS evaluation was carried out against Noah, the most commonly used LSS in WRF, without considering any observational data. The authors quantified a variation of 1–2 °C in air temperature that resulted from the six simulations and an overall MENA domain climate sensitivity of 0.1 °C per W/m\(^2\) due to an implied surface energy balance forcing from the different schemes when compared to the reference Noah run.

In this study, we complement the work of Constantinidou et al. (2020) by evaluating the WRF model output against observations of air and land temperature, precipitation, net radiation and soil moisture. Data from Constantinidou et al. (2020) are used, comprising six WRF simulations over MENA domain, coupled with four LSS [Noah, NoahMP (with dynamic vegetation = off and on), CLM4.0 and RUC (with six and nine soil layers)] for 2000–2010 driven by ERA-Interim re-analyses at 50 km horizontal resolution. These are directly compared to observational datasets to quantitatively (and objectively) identify the best performing LSS. For this purpose, monthly time series of modelled and observed data are statistically analysed, and a ranking (based on least bias) is applied accordingly for the whole MENA-CORDEX domain, as well as in seven sub-domains.

2 Data and Methodology

2.1 Data

2.1.1 Regional Climate Model Simulations

The simulations are performed over the MENA region by the WRF model (version 3.8.1) (Skamarock et al. 2008) with horizontal resolution of 0.44° (\({\approx }\) 50 km) and 30 vertical levels, following the guidelines of CORDEX (Giorgi et al. 2009).

The model setup includes the Yonsei University (YSU) Planetary Boundary Layer scheme, the Kain–Fritsch (KF) cumulus scheme, the WSM6 cloud microphysics scheme and RRTMG scheme for long- and short- wave radiation parameterizations, a model configuration that is suggested by Zittis and Hadjinicolaou (2017); Zittis et al. (2014) for climate applications in the MENA region. This configuration remains unchanged for all six simulations performed for this study, and the only component of the configuration that is altered is the land surface scheme (LSS). The four different LSSs used here are Noah, NoahMP (multi-physics) (Niu 2011), Community Land Model (CLM) (CGD 2010) and the Rapid Update Cycle (RUC) (Benjamin et al. 2004).

Table 1 Main characteristics of the Noah, NoahMP, CLM4.0 and RUC land surface schemes employed in WRF

The main features of the four LSSs, which differ in their complexity of the treatment of the land surface and associated processes (Table 1), are utilised in six simulations driven by the ERA-Interim re-analyses (Table 2) for the period 2000–2010, as detailed in Constantinidou et al. (2020).

Noah, used for the first simulation (run 1), is the most commonly used LSS among the WRF community and it is also the simplest scheme of the four used in this study. The calculations are performed over the whole grid box considered as one combined surface layer with four vertical levels of soil and the surface parameters to be taken from look-up tables. NoahMP, an advanced Noah scheme, has a dynamic vegetation model option that can be turned off or on. When this option is off, the monthly leaf area index (LAI) is prescribed for various vegetation types and the vegetation greenness fraction (GVF) comes from monthly GVF climatological values, while when it is turned on, LAI and GVF are calculated using a dynamic leaf model. Runs 2 and 3 are carried out using the NoahMP with the dynamic vegetation option off and on, respectively. The CLM scheme comprises ten soil layers and the surface parameters required (e.g., LAI) are satellite-based (MODIS) and it is used in experiment 4. The RUC scheme includes up to nine soil layers and the vegetation fraction together with LAI are taken from MODIS and it is used for the last two runs (5 and 6) that only differ in the number of soil layers (six and nine, respectively). Experiments 3, 4, 5 and 6, due to their more detailed treatment of land processes, can be considered as the most “advanced” simulations compared to runs 1 and 2. The list of six experiments performed are presented in Table 2 displaying also the number of soil layers considered by each LSS used. In terms of computational time required to perform a model year simulation (not shown), RUC with nine soil layers is the less and CLM the most computationally expensive schemes.

Table 2 List of the six performed experiments

2.1.2 Observation Datasets

The simulated climate needs to be evaluated against observations in order to reveal the best performing simulations over the MENA region. For this comparison, several observational datasets are used and are described in the following paragraphs.

Datasets produced by the Climatic Research Unit (CRU) at the University of East Anglia are used here for the validation of mean temperature and precipitation regimes. In particular, the TS 3.22 dataset is employed, which is a monthly high-resolution gridded field (0.5°) based on daily values (Harris et al. 2014). Satellite information regarding the land surface temperature is obtained from the MODIS (Moderate Resolution Imaging Spectroradiometer)/Terra satellite. The data comprise daily composites and monthly means of the land surface temperature as derived from infrared radiances measured with the MODIS-TERRA sensor with 0.05° grid resolution (Wan and Hulley 2015).

The parameterized and/or observe satellite data from “Clouds and the Earth’s Radiant Energy System” (CERES) experiment by NASA (Kato et al. 2013) are used to evaluate the net-radiation produced by the WRF model. Data are available from the NASA Langley Research Center (http://ceres.larc.nasa.gov/index.php) on a monthly timescale and in 1° horizontal resolution covering the period from March 2000 to today. The parameters provided by CERES and used for deriving the net radiation flux analysed in this work are surface (SFC) longwave (LW) and shortwave (SW) fluxes under clear and all-sky conditions.

The SMAP (Soil Moisture Active Passive) dataset, produced by the department of Geodesy and Geoinformation, Technische Universtaet Wien, is used to evaluate soil moisture. It is a product released by the European Space Agency (ESA) in 2012 as part of its Climate Change Initiative (CCI) program and it combines various single-sensor active and passive microwave soil moisture products, with horizontal resolution of 0.25° (Dorigo et al. 2017).

2.2 Evaluation Metrics and Ranking

An evaluation framework is applied based on different metrics and ranking approaches, to identify the best performing LSS over the MENA region. The meteorological variables that may be influenced by the imposed change of the parameterization of land surface processes, and used in the analysis here are mean 2-m air temperature, land surface temperature, precipitation, net radiation and soil moisture. The choice of these variables is justified on their relevance for surface climate evaluation and is constrained by the availability of gridded observational datasets that cover the MENA domain (see previous sub-section).

For all variables studied, the difference of simulated climate minus observed is calculated first for the statistical metrics and subsequently, a 3-way ranking of the different experiments is performed based on least bias. To assist this, the MENA-CORDEX domain is divided into sub-domains representing most of the different climatic zones [Fig. S1 of Lelieveld et al. (2016)] and land characteristics (Fig. 1) identified in the region for which all the metrics and methods described next are also applied.

Fig. 1
figure 1

Model Land-Use Index (at 50 km grid size) of the MENA domain used in the analysis and the seven sub-regions (A, B, C, D, E, F, G)

Initially, to check the performance against observations of the different LSS-driven simulations across the different sub-domains and the MENA as a whole for the investigated variables, Taylor diagrams are produced. These were based on monthly time series for the 10-year period. These diagrams provide a concise statistical summary of how well patterns of the model output match observations in terms of correlation, root mean square difference and the standard deviation (Taylor 2001). This visual overview is further supported by analysis using different metrics that diagnose specific characteristics of model climate behaviour, as described next.

Various statistical metrics are derived from the monthly time series (December 2000–November 2010) and for each grid box of the MENA domain, for the above-mentioned climatic variables, and for both model output (for each LSS) and observed data. The metrics are annual and seasonal 10-year averages (for long-term mean conditions overall, i.e. climatology), 95th and 5th percentiles (to represent upper and lower bands of monthly distribution in the cold/wet and warm/dry parts of the year), standard deviation (for variability) and linear trend (for long-term tendency). The derived biases (model run minus observations) for each metric are then spatially averaged over the twelve selected sub-domains, their average, and for the whole MENA domain.

The ranking procedure is applied as follows. Three different methods are used as described below, using the bias results of the different metrics mentioned in the previous paragraph, and a fourth combines them to obtain an overall ranking. This ranking methodology was successfully used in Hadjinicolaou et al. (2011) for the selection of the most appropriate model grid-box among several neighbouring ones, to represent a particular location, as part of a multi-model RCM evaluation exercise for Cyprus. The implementation of the three ranking methods described in the next paragraph occurs for each of the sub-domains and for every climatic parameter studied.

The “ranking summation” method consists, first, of ranking the obtained biases for the six schemes for each metric, and then summing the ranking values for all metrics and generating a final ranking. In the “multiplication” method, the absolute values of the biases are multiplied and the resulting numbers are ranked. The “most wins” method counts the number of first places obtained by the six schemes and sorts them accordingly. The overall ranking sums those from these three methods for each climatic variable considered, for the whole MENA domain and for the average of the seven sub-domains.

3 Results

3.1 Air Temperature

The mean 2-m air temperature is the first climatic parameter that is analysed and shown in Fig. 2. Higher observed (CRU) values are found in the southern part of the domain, which decrease northward, a pattern that comprises three zones of different temperature intervals (> 25 °C, 15–25 °C, < 25 °C). All six difference maps from the LSS experiments exhibit biases of > 10 °C over the eastern part of Turkey and Caucasus, western Iran and Morocco, and cold biases over the eastern part of the domain and Sudan. The large warm biases are uniform for all LSS and are due to the lower model elevation at 50 km resolution as they occur over the main mountain ranges of the domain [areas that are also known to be under-sampled by the CRU TS.3 dataset as shown in Fig. 3 of Harris et al. (2014)]. Hence, other smaller, but non-negligible differences in other areas are not easily seen with the applied colour scale. For example, the two NoahMP and the CLM are warmer by 2–3 °C from Noah at the northern African mainland (especially over 20 °E and 20 °N), as shown in Constantinidou et al. (2020).

The Taylor diagram produced for air temperature is presented in Fig. 3, where basic statistics for the different simulations are denoted with different colours for the LSS and symbols for the sub-domains. The results produced by all six schemes and for every region have very small differences from each other. The correlation coefficients for the six experiments are generally high, between 0.7 and 0.95 for all the sub-regions, while for the whole MENA domain it is highest under the WRF/Noah simulation (\(\sim\) 0.89). Regions E and F are closer to observations with a correlation of 0.95, normalized standard deviation (presented with blue arcs in Fig. 3) very close to 1.0 and root mean square error (RMSE) less than 0.5. The experiment using the Noah scheme is closer to observations when looking the results for region E and CLM for domain F.

The results of the more detailed spatial bias analysis in the seven MENA sub-domains is presented in Fig. 4 in the form of matrix-plots. The presented information includes the biases (colour bars) calculated for annual climatology, 5th and 95th percentiles, standard deviation and linear trend of the six simulations ranked (numbers) for each sub-region, their average and the whole MENA domain. From a sub-region inspection of the figure it is evident that box D, which includes Syria/Iraq area exhibits the largest biases for climatology and extremes consistently for all six LSS, while sub-domains A and B underestimate consistently the standard deviation and area G overestimates it. A comparison among the schemes reveals that CLM is warmer for climatology and extremes in most of the sub-regions, while for the other metrics none of the schemes performs in a consistent manner.

Fig. 2
figure 2

Annual climatology of 2 m air temperature; a observations from CRU (top), WRF biases of the six experiments; b Noah; c NoahMP (dyn.veg.= OFF); d NoahMP (dyn.veg.=ON); e CLM; f RUC (six soil layers); g RUC (nine soil layers) (bottom)

Fig. 3
figure 3

Taylor diagram of 2 m air temperature of the seven sub-domains and MENA (different symbols) simulated by the six experiments [Noah; NoahMP(dyn.veg.= OFF and ON); CLM; RUC(six and nine soil layers)] (different colours)

Fig. 4
figure 4

Two-metre air temperature—colour: biases from observed of the LSS runs in a annual climatology, b 95th and c 5th percentiles; d) standard deviation; e linear trend—numbers: ranking according to least bias for each sub-domain, their average (All) and the whole domain (MENA). The sub-domains are defined in Fig. 1

3.2 Land Surface Temperature

The satellite-based information for land surface temperature is presented in Fig. 5 (top map). Higher values are observed over the southern part of the MENA domain while lower temperatures are noted in the northern (European) part of the region. Over almost the whole study area, WRF simulates with all LSS colder conditions than observed. An exception is noted over the southwestern area of the model domain where all schemes simulate higher values than the observations.

The Taylor diagram in Fig. 6, shows that all schemes have correlation coefficient between 0.4 and 0.7 for several sub-domains, smaller overall than for air temperature. Exceptionally, box F (Saharan desert) has \(\sim\) 0.94. Normalized standard deviations for sub-domain F is close to 1.0 for all runs except the WRF/CLM (\(\sim\) 0.8) and for the rest of the investigated areas less than 0.8. RMSE obtained for all simulations and domains is less than 1.0, with the whole MENA \(\sim\) 0.9 and sub-region F a value of < 0.4.

The quantitative and sub-regional comparison included in Fig. 7 confirms the widespread cold model biases shown in Fig. 5. It can be seen that overall Noah is the best performing regarding annual climatology over the MENA and sub-regions. The 95th and 5th percentiles are mostly underestimated by the different LSS used for the simulations, expect Noah slightly overestimating warmest conditions and CLM overestimating the coldest conditions. The different performance of CLM in the 5th percentile, especially over the vegetated areas (including those with forests), could be due to the shading effects operating in this LSS (two-stream canopy radiation transfer scheme), where canopy and ground surface temperatures are separately computed. CLM also considers canopy gaps and calculates fractions of sunlit and shaded leaves together with the absorbed radiation which may lead to the land surface temperature overestimation noted in box C, where the Land Use index considered by the model includes grassland, shrubland, croplands and mixed forest.

Fig. 5
figure 5

Annual climatology of land surface temperature; a observations from MODIS/Terra (top), WRF biases of the six experiments; b Noah; c NoahMP (dyn.veg.= OFF); d NoahMP (dyn.veg= ON); e CLM; f RUC (six soil layers); g RUC (nine soil layers) (bottom)

Fig. 6
figure 6

Taylor diagram of land surface temperature of the seven sub-domains and MENA (different symbols) simulated by the six experiments [(Noah; NoahMP (dyn.veg.= OFF and ON); CLM; RUC (six and nine soil layers)] (different colours)

Fig. 7
figure 7

Land surface temperature—colour: biases from observed of the LSS runs in a annual climatology, b 95th and c 5th percentiles; d standard deviation; e linear trend—numbers: ranking according to least bias for each sub-domain, their average (All) and the whole domain (MENA). The sub-domains are defined in Fig. 1

3.3 Precipitation

In Fig. 8, it is obvious from the CRU observations that drier conditions (rainfall less than 25 mm/month annual average) prevail over most of the MENA domain (in northern African and Middle East), while the northern part of the domain (Europe, Anatolia, Caucasus) as well as the southern part of the Sahel region and the tropics are wetter. The six simulations strongly overestimate precipitation in the tropics (south of 15°N) but this is a region where different observational datasets tend to vary a lot (Tanarhte et al. 2012). Underestimation of precipitation is simulated by all LSS in large parts of Europe (except the Balkan Peninsula) and around the Mediterranean Sea.

All six simulations exhibit low correlation (0.0–0.4) with observations as presented in Fig. 9. Standard deviations of more than 1 are obtained from all different runs and regions, reaching values more than 4 for the simulations with both options of NoahMP and CLM over sub-domain F, which is also the case when focusing on the results for RSME. These large values (in contrast to air and land temperature which are not greater than 1) indicate, not surprisingly, the high month-to-month variability of precipitation.

Box C stands out in the detailed bias map of Fig. 10 as the area with the largest biases in most of the metrics and for all LSS. This figure also reveals that, while the Noah scheme achieves the least bias for several metrics (e.g. annual climatology and 95th percentile) in the whole MENA domain, for specific sub-regions (e.g. box G) its bias is the largest among the schemes (and of opposite sign)). The scheme suitability can be assessed for individual areas, for example, in sub-domain D (Levant and Mesopotamia), Fig. 10 demonstrates that the best performing LSS in simulating precipitation is RUC.

3.4 Net Radiation

The annual climatology of net radiation observed by CERES is shown by the top map of Fig. 11. Most parts of the MENA domain measure net radiation < 100 W/m\(^2\), except from the coastal areas of northern Africa and the Arabian peninsula where it is greater than 100 W/m\(^2\). Looking at the comparison of the simulations with the observations, it is evident that all six WRF model options of LSS underestimate net radiation over the areas where observed values are \(>100\) W/m\(^2\) (Fig. 11). This distinct difference pattern is also recorded in the respective upward short-wave map for winter (not shown).

Figure 12 summarizes the statistical outcome of the six experiments and visualised in the form of Taylor diagram for net radiation compared to CERES satellite observations. The correlation of the different runs lies in the range of 0.4–0.7 and for the whole domain of interest is about 0.6. Root mean square error for all the options studied here takes values close to 1.0, while the standard deviation varies from 0.8 to 1.4. The compactness of these results suggest that different LSS in the six runs do not have a discernible effect, overall, on the model net radiation.

Fig. 8
figure 8

Annual climatology of monthly precipitation; a observations from CRU (top), WRF biases of the six experiments; b Noah; c NoahMP (dyn.veg.= OFF); d NoahMP(dyn.veg.= ON); e CLM; f RUC (six soil layers); g) RUC (nine soil layers) (bottom)

Fig. 9
figure 9

Taylor diagram of monthly precipitation of the seven sub-domains and MENA (different symbols) simulated by the six experiments [Noah; NoahMP (dyn.veg.= OFF and ON); CLM; RUC (six and nine soil layers)] (different colours)

Fig. 10
figure 10

Precipitation [mm/month]—colour: biases from observed of the LSS runs in a annual climatology, b 95th and c) 5th percentiles; d standard deviation; e linear trend—numbers: ranking according to least bias for each sub-domain, their average (All) and the whole domain (MENA). The sub-domains are defined in Fig. 1

Fig. 11
figure 11

Annual climatology of net radiation; a observations from CERES (top), WRF biases of the six experiments; b Noah; c NoahMP (dyn.veg.= OFF; d NoahMP (dyn.veg.= ON); e CLM; f RUC(six soil layers); g RUC(nine soil layers) (bottom)

Fig. 12
figure 12

Taylor diagram of net radiation of the seven sub-domains and MENA (different symbols) simulated by the six experiments [Noah; NoahMP (dyn.veg.= OFF and ON); CLM; RUC (six and nine soil layers)] (different colours)

Fig. 13
figure 13

Net radiation—colour: biases from observed of the LSS runs in a annual climatology, b 95th and c 5th percentiles; d standard deviation; e linear trend—numbers: ranking according to least bias for each sub-domain, their average (All) and the whole domain (MENA). The sub-domains are defined in Fig. 1

From the sub-regional analysis in Fig. 13, Noah appears to be the best performing scheme to simulate the net annual radiation climatology with relatively small annual climatology biases. In other metrics, the same scheme performs worse (relative to the rest), for example, in box G (Maghreb) for the 95th percentile and standard deviation, although all LSS have distinctly different biases here compared to the other sub-domains and metrics. The largest differences of the standard deviation (simulated minus observed) are calculated for region B (which includes the Balkans). The biases in linear trends of all sub-regions and simulations are positive with the largest obtained with the CLM run.

3.5 Soil Moisture

In Fig. 14 (top map) of the annual mean climatology of soil moisture from satellite observations (SMAP), it is evident that the northern part of the MENA domain is moister than the southern part. When comparing the six simulations with observations (Fig. 14), all schemes (except CLM) have relatively small biases (between – 0.05 and + 0.05 m\(^3\)/m\(^3\)) with an overall underestimation of soil moisture in the African continent and the Arabian peninsula and overestimation in the northern part of the domain. CLM exhibits larger biases both positive and negative.

In the Taylor diagram (Fig. 15), the normalized standard deviation and RMSE take large values values between 2 and 8 for all sub-domains, and correlation varies from 0.05 (box F with CLM) to 0.65 (MENA with Noah). These results appear more scattered in the diagram, implying a more variable performance which can be also seen in Fig. 16 where more sub-regional features are revealed for the additional metrics. An example is the opposite behaviour that CLM shows when looking at the upper and lower monthly distribution, where it takes the first and the last place in the ranking for the 95th and the 5th percentile respectively, over the whole MENA region. Overall, it seems that there is not much affinity in these biases with the respective ones for precipitation where, for example, the RUC scheme performs better in sub-domain D (and worse for the soil moisture).

Fig. 14
figure 14

Annual climatology of soil moisture; a observations from SMAP (top), WRF biases of the six experiments; b Noah; c NoahMP (dyn.veg.= OFF); d NoahMP (dyn.veg.= ON); e CLM; f RUC (six soil layers); g RUC (nine soil layers)) (bottom)

Fig. 15
figure 15

Taylor diagram of soil moisture of the seven sub-domains and MENA (different symbols) simulated by the six experiments [Noah; NoahMP (dyn.veg.= OFF and ON); CLM; RUC (six and nine soil layers)] (different colours)

Fig. 16
figure 16

Soil moisture—colour: biases from observed of the LSS runs in a annual climatology, b 95th and c 5th percentiles; d standard deviation; e linear trend—numbers: ranking according to least bias for each sub-domain, their average (All) and the whole domain (MENA). The sub-domains are defined in Fig. 1

3.6 Overall Ranking

A three-method intermediate ranking is applied and a further ranking, following the procedure described in Sect. 2.2, generates a final ranking. The results from the grand ranking are presented in Table 3 for each climatic variable considered, for the whole MENA domain (labelled “MENA”) and for the average of the seven sub-domains (labelled “all”). The latter distinction allows a scheme ranking based on the selected sub-regions of interest without considering the tropics (which are included in the whole “MENA” domain).

For most of the variables and schemes, the two ranking results (“all”, “MENA”) coincide. Generally, Noah ranks first for most variables, with the exception of air temperature. Both options of NoahMP (an augmented version of Noah) follow, succeeded by RUC (nine soil layers) and CLM, which also consider a more detailed scheme than Noah. The least performing LSS overall is RUC (six soil layers).

Excluding radiation, the final ranking results for the other four climatic parameters are further grouped into two different ways (“air” vs “land” and “thermal” vs “humid”) to provide another perspective to the assessment, as follows: air [mean 2m air temperature (Tmean) and precipitation (prcp)] vs land [soil moisture (smois) and land surface temperature (Tland)]; thermal [(mean 2-m air temperature (Tmean) and land surface temperature (Tland)] vs humid [(precipitation (prcp) and soil moisture (smois)]. The “air” variables are better simulated using RUC (nine soil layers), whereas Noah LSS performs best when considering “land” and “humid” variables. The group of air and land temperatures (“thermal”) is best simulated using the option of NoahMP with the dynamical vegetation option turned on.

Table 3 Overall ranking, see Sect. 2.2 for explanation

4 Summary and Conclusions

The WRF-generated climatology of six simulations using four different LSS for the period of 2000–2010 has been compared with observations. The simulation period was limited to 10 years due to the availability of computational resources. It can be considered as a minimum time period that allows representative and, therefore, adequate climatological averages to be obtained, although a longer than 20-year period would be certainly desirable for statistical robustness. Since we are interested at extra-tropical latitudes where annually there is mostly a winter and summer season separation, we also assess the warmest and coldest parts of the investigated period using 5th and 95th percentiles of the monthly time series. Hence, the current evaluation does not consider (and the conclusions below are not based on) daily extremes.

The maps showing the biases of the different variables show largely similar patterns for the six simulations, so any conclusive statements from these large-scale differences cannot be drawn only from visual inspection. Also, it has not been possible to discover any clear spatial patterns in the bias maps among the different variables that could explain physically which fundamental bias in a particular land surface scheme propagates itself across different variables. This may be due to the fact that the observational datasets of the surface climate variables used are from different and independent sources and, therefore, not physically consistent with each other. A spatially detailed assessment was carried out, involving statistical summary with the help of Taylor diagrams and a sub-regional bias breakdown for which several metrics were calculated and compared with observations. The initial ranking applied for each sub-region and variable exhibits varying results (where one scheme for a specific variable and metric, e.g. annual climatology of precipitation, may be the best performer for the whole MENA domain but the worst performer for a certain sub-region). Although this exercise does not unambiguously point to one superior scheme, this sub-regional perspective for several climate variables can be useful for WRF applications that focus on a particular area (or climate statistics aspect) of the MENA region.

One limitation of this study is the horizontal resolution of 50 km, which turned out to be coarse to allow the more detailed LSS to produce any substantial differences and tangible improvement in the simulated climate. Another limitation is the use of only one observational dataset per meteorological variable, where, for example, gridded precipitation datasets are known to contain uncertainties (Zittis 2018). Thus, the evaluation and ranking might be sensitive to the choice of the reference observations (Gómez-Navarro et al. 2012). This is compensated by the fact that we have looked at different surface climate variables (additionally to air temperature and precipitation) that are not commonly assessed in RCM evaluation studies.

Notwithstanding the above, the overall ranking, based on three different intermediate ranking methods, provided a MENA-wide suitability estimation. This last step objectively identified the Noah suite of schemes as the best performing LSS, for the sub-regions average (“All”) and the whole MENA domain, occupying the top three ranks: Noah 1st, NoahMP (dynamic vegetation off) 2nd and NoahMP (dynamic vegetation on) 3rd. The other three schemes (CLM and RUC with six and nine soil layers) follow with lower ranking. Note that this conclusion is not sensitive to the spatial definition of the evaluation sub-domains (as confirmed by a separate test, not presented here, with different sub-domains). The predominance of the Noah scheme may not be unexpected since this land surface model has been at the core of the WRF model development and hence appropriately tuned.

Hence, for the composite performance of WRF at a horizontal resolution of 50 km over the MENA region and for the climatic variables considered, the land surface scheme that is recommended is Noah. This information may be worthwhile for climate change impact related estimates for the region (e.g. Constantinidou et al. (2019)) using this particular model and horizontal resolution.