Seasonal and regional contrasts of future trends in interannual arctic climate variability

Future changes in interannual variability (IAV) of Arctic climate indicators such as sea ice and precipitation are still fairly uncertain. Alongside global warming-induced changes in means, a thorough understanding of IAV is needed to more accurately predict sea ice variability, distinguish trends and natural variability, as well as to reduce uncertainty around the likelihood of extreme events. In this study we rank and select CMIP6 models based on their ability to replicate observations, and quantify simulated IAV trends (1981–2100) of Arctic surface air temperature, evaporation, precipitation, and sea ice concentration under continued global warming. We argue that calculating IAV on grid points before area-averaging allows for a more realistic picture of Arctic-wide changes. Large model ensembles suggest that on shorter time scales (30 years), IAV of all variables is strongly dominated by natural variability (e.g. 93% for sea ice area in March). Long-term trends of IAV are more robust, and reveal strong seasonal and regional differences in their magnitude or even sign. For example, IAV of surface temperature increases in the Central Arctic, but decreases in lower latitudes. Arctic precipitation variability increases more in summer than in winter; especially over land, where in the future it will dominantly fall as rain. Our results emphasize the need to address such seasonal and regional differences when portraying future trends of Arctic climate variability.


Introduction
Alongside a severe decline in Arctic sea ice, it is generally acknowledged that Arctic surface temperature (SAT), precipitation (PR) and evaporation (EVAP) will continue to increase until the end of this century (Overland et al. 2019;Bintanja and Selten 2014;Bogerd et al. 2020;Perovich et al. 2014;Rennermalm et al. 2009;Screen and Simmonds 2010). Compared to this established confidence around future mean trends of Arctic climate indicators, relatively little is known about the future of Arctic climate variability on interannual time scales. Changes in climate are generally a result of internal processes (e.g. variability of large-scale modes) and external forcing such as volcanic eruptions or anthropogenic increases in greenhouse gas (GHG) emissions on longer time scales. The Arctic climate is characterized by significant climate change-induced long-term trends, but also large decadal variability and interannual variability (IAV, or yearto-year fluctuations). Future changes in the magnitudes of these fluctuations are largely uncertain-as are the governing causes and consequences (Liu et al. 2020a). We briefly mention the main processes associated with IAV, including the modulating effect of longer-term fluctuations, but focus our analysis on changes in the magnitude of IAV under future warming across different seasons and Arctic regions.
Often, processes governing mean trends or longer-term climate variability differ from those responsible for annual fluctuations. For example, Bintanja et al. (2020) and Bogerd et al. (2020) highlight the dominant role of poleward moisture transport (PMT) on increased PR-IAV in the future, whereas the increase in mean PR has been mainly linked to local thermodynamic processes (Higgins and Cassano 2009;Bogerd et al. 2020). Gimeno-Sotelo et al. (2019) point towards strong differences in the patterns that govern the long-term trend versus the IAV of Arctic sea ice and further 1 3 highlight seasonal differences. Generally, trends and deviations of the mean Arctic climate occur in response to both local and remote climate changes, which are briefly summarized in the following.
The retreat of Arctic sea ice is an example of a local process that has been repeatedly linked to increased SAT and PR in the Arctic (Overland and Wang 2010;Stroeve et al. 2011;Jaiser et al. 2012;Ford and Frauenfeld 2022;van der Linden et al. 2016b;Reusen et al. 2019). Direct local effects of sea ice loss result from increased heat and moisture fluxes over open ocean areas, which favour cloud formation and hence PR. Ford and Frauenfeld (2022) suggest that Arctic PR caused by sea ice loss will increase in a warmer climate. Sea ice itself is likewise influenced by local SAT and PR. Positive SAT anomalies generally reduce sea ice concentration (SIC) through surface melting, while positive snowfall anomalies protect sea ice through increased insulation and a higher surface albedo (Post et al. 2019;Kajtar et al. 2021;Reichler and Kim 2008). Rainfall on the other hand reinforces sea ice melting by lowering the albedo of (snowcovered) ice (Aoki et al. 2003). SIC anomalies have also been linked to storm activities, both of local and remote origin (Screen et al. 2011).
Remote influences can be subdivided into atmospheric versus oceanic meridional transport. The shorter-term variability of Arctic climate is mainly influenced by atmospheric meridional transport through the poleward advection of heat and moisture (Graversen 2006;Hofsteenge et al. 2022;Kapsch et al. 2019). Reusen et al. (2019) suggest that IAV of atmospheric meridional transport across 70 • N increases along with a warmer climate, partly as a result of increased mean latent heat fluxes (Graversen and Burtu 2016;Feldl et al. 2020;Reusen et al. 2019). Regarding longer-term variability, Corti et al. (1999) found a strong influence of atmospheric circulation regimes on the enhanced the warming of the Northern Hemisphere during the later decades of the last century. Changes in oceanic meridional transport are also more important on decadal than interannual time scales due to the larger inertia of the ocean compared to the atmosphere. Oceanic meridional transport likely dominates decadal variations in the Arctic (Reusen et al. 2019), and can thereby alter the magnitude of IAV (Liu and Barnes 2015;Goosse and Holland 2005;van der Linden et al. 2016b). Furthermore, differences in the rate of oceanic warming and poleward oceanic heat transport can significantly influence the climate response represented by GCM (Shu et al. 2022;Semmler et al. 2021).
Clearly separating remote from local processes is difficult, as remote processes (e.g. atmospheric moisture transport) can enhance or compensate local processes (e.g. latent heat fluxes caused by sea ice variations). Several studies also suggest that the dominant drivers of Arctic climate IAV have changed in the past and/or may change in the future (Holland 2003;Francis and Hunter 2006;Reusen et al. 2019). Table A1 summarizes the dominant mechanisms and climate variability modes associated with interannual variations of Arctic-wide SAT, PR and SIC, based on a literature review of observation-and numerical-based studies. The spatial and seasonal differences in the Arctic climate response to these driving mechanisms highlight the need to distinguish between regions and seasons when analyzing and quantifying future changes in Arctic IAV.
Most studies that address climate variability are focused on low-frequency variability instead of annual variations (van der Linden et al. 2016b, a;Goosse and Holland 2005;Yang et al. 2020;Halloran et al. 2020;Day et al. 2013;Chylek et al. 2009). Additional knowledge on shorter-term (interannual) variability in the Arctic is needed for a better prediction of the likelihood of extreme events (van der Wiel and Bintanja 2021) and to be able to assign annual anomalies to climate variability and long-term trends Bogerd et al. 2020). This study examines differences in the year-to-year fluctuations of annual and seasonal means across the Arctic region. After describing our methodological choices, we start the analysis with an assessment of the relative importance of natural variability versus model uncertainty (Sect. 3.1). Next, we present a selection of models that best represent the Arctic climate (reference period: 1981-2010) based on seasonal means (Sect. 3.2). This selection allows us to quantify and discuss future trends in summer and winter IAV of Arctic climate based on models that are better at simulating the reference period (Sect. 3.3). We also compare these results to future projections based on the multi-model mean (MMM) of all 31 CMIP6 models to suggest where most CMIP6 models may over-or underestimate future means and IAV of Arctic climate (Sect. 3.4).

Data sources
Gridded monthly output data of 31 CMIP6 models (Table 1, accessed via https:// stora ge. googl eapis. com/ cmip6/ pangeo-cmip6. json) that provide output for SAT, PR, EVAP and SIC were investigated from 1851 to 2100. We investigated the shared socio-economic pathway 5 − 8.5 scenario (SSP5 − 8.5) to show the clearest response to global warming. For SAT, PR and EVAP all areas north of 65 • N were examined, while sea ice area (SIA) was calculated as the product of SIC and grid areas summed over 45 • N-90 • N . We examined the first member of each model, and also investigated large ensemble models to assess internal variability. Our analysis focuses on future Arctic climate IAV of models whose summer and winter mean climate fields compare relatively well to those of reanalyses. The evaluation method is described in Sect. 2.3.
To assess CMIP6 models, 3 reanalysis products for each variable were used (reference period: 1981-2010). We compared simulated SAT and PR fields with the JRA55 (Kobayashi et al. 2015), the 20CRv3 (Slivinski et al. 2019) and the ERA5 (Copernicus, 2017). To evaluate SIC, we also used ERA5, in addition to HadISST (Rayner et al. 2003) and GIOMAS (Zhang and Rothrock 2003). All data fields of CMIP6 models and reanalyses were regridded to a 360x180 longitude/latitude grid.

Natural variability and model uncertainty
Natural variability (NV) and model uncertainty (MU) are quantified based on grid-point based spreads of 1 Multi-Model-Ensemble (M-Mod-E, different models with only one member each) and 8 Multi-Member-Ensembles (M-Mem-E, one model with multiple members). We define spreads as the weighted area average of grid-point intermodel (-member) standard deviations (STDs). We find that the minimum amount of members needed to achieve a relatively constant value of the STD is 5 (i.e. using 5 or more members does not  Table 1 lists the 31 models and highlights the 8 models with 5 or more members which will be used in Sect. 3.1 to discuss the impact of NV on Arctic climate. For a fair comparison, spreads were calculated as the STD of the same number of members and models. We then repeatedly (50 times) randomly constrained the members of these 8 models (or 31 models of the M-Mod-E) to 5. Then we calculated the average STD on these 50 chunks (consisting of 5 members or models each), which represents the spread of each M-Mem-E or M-Mod-E. Finally, the relative contribution of NV alone was calculated by dividing the average spread of the 8 M-Mem-E by the M-Mod-E spread. For every variable v and season s, the final area-weighted ( w n with n=grid points) NV ratio can then be described by Eqs. 1 and 2 for the mean and IAV of variables respectively, with where mem denotes the M-Mem-E, mod the M-Mod-E, m the used amount of models with multiple members, and t the 1981-2010 time interval. We assume that the remaining fraction of uncertainty occurs due to differences in model design. Model uncertainties (MU) are therefore described as and

Model evaluation
The method used to evaluate models is an adapted version of an approach from Murphy et al. (2004) and Reichler and Kim (2008), who evaluated models based on their ability to simulate multiple variables compared to a reference reanalysis on a grid-point basis. Sea ice is usually assessed as an area average; e.g. Massonnet et al. (2012) compared the sea ice extent (SIE) and sea ice volume of CMIP5 models to those of reanalyses.
assess each first member of the 31 CMIP6 models. Further, we did not evaluate annual mean fields, but averages of the summer and winter season (JJA and DJF for SAT and PR), or months with minimum and maximum SIC (September and March). First, the squared difference of the mean value of each variable was calculated for every grid point. In contrast to Reichler and Kim (2008), who then normalized the squared grid point differences by the interannual variance at each grid point, we normalized the differences by the area-weighted average interannual variance of the assessed region ( 65 • N-90 • N ). This different approach allowed us to assess SIC where in the models SIC is nonzero, but in the reanalyses SIC is zero. A sensitivity test that compared the use of an area-averaged uncertainty versus a grid-point-uncertainty of SAT and PR confirmed the applicability of this adaption, as results were almost identical. The area-averaged error e 2 vtm for each model m, variable v and time period t (either season or month) was then computed as follows: with n referring to the grid point and w to the area weight of n. s vtmn and r vtmn indicate the simulated (CMIP6) and averaged reanalyses fields respectively, and 2 vtn n denotes the area-averaged uncertainty of the reanalyses.
The normalized error variance of each variable and season was then scaled by the mean error of all CMIP6 models to ensure that all variables have comparable weights: The final performance index (PI) is defined as the mean of the 6 errors (3 variables x 2 seasons). The summer and winter criteria were weighted equally (e.g. the final index is the mean of the scaled indices for JJA/Sep and DJF/Mar): The reason for why only mean fields, and not IAV fields, were evaluated is because the latter were overly affected by natural variability. This was tested by applying the evaluation method on 46 CanESM5 members, which resulted in a small spread of PIs based on means of individual members, but a large spread of PIs based on IAV of individual members. The IAV error spread of the CanESM5 ensemble was almost identical to that of the M-Mod-E. The same result was attained with MIROC6 (42 members). The final IAV error of all models therefore strongly depends on the choice of member, and a closer resemblance of IAV fields of one model or member to that of the reanalysis data may as well imply a different phase of natural variability of the IAV.

Local calculation of IAV and trends
We first compute the IAV for each individual reanalysis or model before averaging the IAV values. Furthermore, all mean and IAV results of gridded variables (SAT, PR, EVAP and SIC) were calculated on a grid-point-basis before area-averaging (except for PMT-IAV, which we defined as area-averaged PR-IAV minus area-averaged EVAP-IAV).
For each grid point and model/reanalysis, IAV was defined as the STD of linearly detrended 30 year moving windows shifted forward by 5 years. The moving IAV of a time series of length j for any variable v and grid point n can therefore be described as with (vn) denoting the linear least-squares fit to vn for each 30 year time interval [x, y].
Compared to the conventional area-average method (AAmethod), where IAV is defined as the STD of area-averaged variables, this grid-point-method (GP-method) returns greater IAV values of SAT and PR (Fig. 1). For example, IAV based on the AA-method of the multi-reanalyses mean [1,30] − (vn [1,30] )], [vn [6,35] − (vn [6,35] )] , ... , [vn [j−29,j] − (vn [j−29,j] )]} returns SAT-IAV values of 0.28 K in summer and 0.76 K in winter (similar to results from Reusen et al. (2019) and Bintanja et al. (2018)), while the GP-method returns much higher values in summer (0.75 K) and winter (1.96 K) (Fig. 1a). In general, the reason for these differences is that in any given year, the AA-method allows for compensation of positive anomalies at one grid point with negative anomalies at another. In reality, not all grid points experience a positive or negative anomaly at the same time, which the AA-method 'assumes'. Likewise, the IAV of winter surface temperature from 2010 to 2100 (Fig. 1a) decrease is more noticeable based on the GP-method, but the relative decrease is similar to that of the AA-method (not shown). The driving processes that cause these trends are discussed in Sect. 3.3.
Regarding PR, IAV calculated using the AA-method is not only smaller, but also yields higher winter than summer variability, both based on reanalysis and CMIP6 data ( Fig. 1b). Here, IAV based on the multi-reanalysis mean and the AA-method is 23.68 mm/yr in summer and 30.14 mm/yr in winter, while with the GP-method IAV is higher in summer (123.2 mm/yr) than in winter (91.39 mm/yr).
Since the mechanisms that govern the variability of many climate variables usually do not affect each region similarly, we argue that the GP-method yields a more accurate presentation of climate IAV, which is especially relevant for studying local impacts. On the other hand, IAV on the basis of the AA-method is easier and cheaper to calculate, and may still be useful for studies investigating dominant responses to large-scale climate variability.
For showing time series as well as the trends summarized in Tables 2 and A3, we also calculated SIA to allow for comparisons to other studies. SIA is defined as the sum of the ice-covered fraction of each grid cell multiplied by the area of the respective grid cell. IAV of SIA was then calculated as described above, except over just one dimension.
Future trends of mean and IAV changes were defined as linear regression slopes from 1981 to 2100 (yielding centered IAV values from 1995 to 2085). For calculating multimodel (-member) means of trends, we first computed means and IAV for each individual model and then averaged over the members or models, yielding gridded mean and IAV fields over time. For each grid point, we then calculated the linear trend slope and significance value (p-value). Figures A5 and A6 visualize the area-averaged mean and IAV trends and their significance for each model and variable.

Results and discussion
Before we go into detail about natural variability (3.1), the performance of CMIP6 models (3.2), and future changes in summer and winter IAV (3.3 and 3.4), we briefly address annual mean trends of SAT, PR, and SIA variability (Fig. 2). The decrease in temperature variability along with Arctic warming (Fig. 2b) has already been noted in previous studies which have attributed this negative SAT-IAV trend to winter sea ice loss (Reusen et al. 2019;Screen 2014;Landrum and Holland 2020). In Sect. 3.3.2 we will show that SAT-IAV in fact increases in summer over most regions, which counteracts the SAT-IAV decrease in the colder months (Screen 2014). Regarding sea ice variability, there are significant summer and winter differences (as well as strong regional variations, Sect. 3.3.1), that are hidden in the annual mean signal (Fig. 2f). Annually averaged SIA slightly increases between 2000-2050 caused by the thinning of sea ice, which results in a larger sea ice growth and melt area (Holland et al. 2008;Goosse et al. 2009). However, most models depict an eventual decrease in SIA variability towards the end of the century due to the significant reduction in annually averaged sea ice area. PR-IAV increases in both summer and winter ( Fig. 2d and Fig. 7c), which has been attributed to the increase in the moisture holding capacity of the atmosphere and poleward moisture transport (Pendergrass et al. 2017;Bintanja and Selten 2014;Bogerd et al. 2020).

Impact of natural variability on IAV
To understand the role of natural variability (NV), we compare the spread of the 3 large ensemble models CanESM5, MIROC6, and MIROC-ES2L with that of the first ensembles of All-CMIP6 (M-Mod-E). The spreads of annual means of these large ensemble models are noticeably lower than that of the M-Mod-E (Fig. 2a,c,e). This suggests that natural climate fluctuations have a small impact on the mean Arctic climate. This was also shown by Tokarska et al. (2020), who analyzed CMIP5 and CMIP6 models and concluded that warming trends across models mostly differ due to differences in model design, even on finer spatial scales. However, NV seems to dominate projections of IAV, as the spread of the M-Mem-E is nearly equally as large as that of the M-Mod-E (Fig. 2b,d,f). This implies that the uncertainty of IAV defined by the M-Mod-E spread is less caused by differences in model design, but primarily linked to internal climate noise. The strong role of internal variability on annual fluctuations of Arctic sea ice and atmospheric variables was also highlighted by other studies (Dorn et al. 2012;Ding et al. 2019;England et al. 2019;Tebaldi et al. 2021;Giorgi 2002;Alexander et al. 2004).
In order to quantify the magnitude of NV for each variable and season, we calculated the relative contribution of the M-Mem-E spread to the M-Mod-E spread for the reference period 1981-2010 (Fig. 3). The relative impact of NV on the mean climate is small amongst all 3 variables and both seasons, with around 25% of uncertainty arising from natural fluctuations (Fig. 3a,c,e). One striking feature is the large spread of EC-Earth3-Veg members (Fig. 3e), indicating a considerable impact of NV. This is most likely a response to the (multi-)decadal climate oscillations that appear in the EC-Earth3 configuration (Döscher et al. 2022). To test this, gridded fields of the inter-member spreads were investigated, showing that the average difference of SIC in EC-Earth3-Veg members is as high as 30% in the Atlantic sector, while that of other models is much smaller ( Figure A3). Higher SIE in the reference period of most EC-Earth3-Veg members explain the more southern position of maximum Intermember versus intermodel spread of Arctic surface temperature (a), precipitation (b), and sea ice area (c) and its interannual variability-(d),(e),(f) respectively-over 1981-2010. Shown are the spreads of 8 multi-member CMIP6 models compared to the spread of first-member models (rightmost bars) for summer (red) and winter (blue) as defined in Sect. 2.2. Pie charts summarize the contribution of model uncertainty (MU) and natural variability (NV) to the total uncertainty spreads compared to other models. However, other models with higher SIE (e.g. CanESM5) show a significantly lower and more distributed spread, while in EC-Earth3-Veg the highest values are in the Atlantic sector, where decadal variability is generally largest (Häkkinen and Geiger 2000). This strong localized spread of members is likely a sign of differences in the strength of the AMOC, which is responsible for anomalies in the northward flow of warmer waters towards the Arctic (Mahajan et al. 2011).
The relative contribution of NV to Arctic climate IAV is significantly larger than the relative contribution of NV to long-term means (Fig. 3). Regarding IAV of all variables, NV dominates over MU in both seasons, except for summer SAT-IAV, where MU and NV are fairly equal. The following paragraphs summarize the impact of NV on interannual SAT, PR and SIA variability.
Both intermember and intermodel differences of SAT-IAV and long-term means are greatest in winter (Fig. 3a,b), when SAT-IAV is also higher (Fig. 7b). Higher winter SAT-IAV values are partly caused by the tropo-stratospheric coupling via planetary waves Jaiser et al. 2016;Labitzke and Kunze 2009;Jakovlev et al. 2021;Matsumura et al. 2021), which can disrupt the polar vortex and cause large SAT anomalies. Hence NV plays a significant role especially in winter, where it is often correlated to NAO/AO anomalies (Jakovlev et al. 2021;Dorn et al. 2003;Wei et al. 2017) (Table A1). Based on the intermember and intermodel spreads (described in section 2.2), we estimate a relative contribution of NV to Arctic SAT-IAV anomalies of 72% and 48% for winter and summer respectively, while the long-term mean appears to mainly result from differences in model design.
The spread of PR-IAV is slightly lower in winter (17.51 mm/yr) than in summer (24.06 mm/yr), with comparable amounts of variability arising from NV (67% in summer and 60% in winter). The difference between summer and winter is in accordance with slightly lower 30-year-mean spreads and absolute PR(-IAV) differences across both seasons.
As for Pan-Arctic sea ice, models and members agree less on summer SIA-IAV than on winter SIA-IAV, in spite of slightly higher mean spreads in winter (Fig. 3e,f). The relative contribution of NV in summer versus in winter may govern this contrast, as winter SIA-IAV can almost entirely be explained by NV according to these spreads. In general, sea ice is the variable whose IAV is mostly affected by NV, supported by various studies linking SIA anomalies to climate modes (Tables A1, A2). For example, Park et al. (2018) found that the Arctic Oscillation alone explained about 22% of September IAV of SIE from 1981 to 2015. We find that IAV of March SIA across CMIP6 models from 1981 to 2010 varies by 110,000 km 2 in summer and 60,000 km 2 in winter, dominated by NV (71% and 93% respectively).

CMIP6 model performances of arctic climate
Out of the 31 investigated CMIP6 models, we find that those that best describe the mean Arctic climate during 1981-2010 are MRI-ESM2-0, EC-Earth3-CC, EC-Earth3-Veg, EC-Earth3-Veg-LR, HadGEM3-GC31-MM, and PI-ESM1-2-LR (Fig. 4). Investigating the relative contribution of PR, SAT, and SIC to the final PI (Eqs. 5 to 7), the 'best' 6 CMIP6 models (B6-CMIP6) are better at simulating SAT and PR than SIC fields, while models with higher PI ('worse models') have relatively larger PR and SAT errors than SIC errors ( Fig. 4; note that the absolute SIC errors of B6-CMIP6 models are still lower). Differences between simulated and observation-based SIC fields can be expected, considering that the 3 reanalyses agree least on SIC compared to SAT and PR (not shown) and that regional SIC can be largely influenced by internal climate variability (Ding et al. 2017;Kay et al. 2011;Notz and Marotzke 2012). We still choose to assess SIC, particularly regional (grid-point) differences in September and March, as several studies have shown that the year at which Arctic SIE in CMIP5 models is projected to reach near ice-free conditions is constrained by its present state (Massonnet et al. 2012;Senftleben et al. 2020;Thackeray and Hall 2019). The relatively large contribution of SIC to the PI of B6-CMIP6 models also indicates that simulating SIC similar to reanalyses remains challenging even when SAT and PR are well-simulated. Meanwhile, if SAT and PR of certain models already compare poorly to reanalyses, their relative SIC error is of similar magnitude to SAT and PR errors.
There is a relatively balanced contribution of the summer and winter seasons to the final PI of the B6-CMIP6 models, indicating that each individual B6-CMIP6 model is not systematically better at simulating the summer or winter season (note that the relative contribution of summer and winter errors for each individual model is equal). This also applies to most models not included in B6-CMIP6, while some have unequal summer/winter error distributions of variables (e.g. HadGEM3-GC31-LL, FIO-ESM-2-0, ACCESS-ESM1-5 or MIROC-ES2L).
In Fig. 6, the difference between the MMM of B6-CMIP6 and the reanalyses (see means in Fig. 5a-f) is compared to the difference between all 31 CMIP6 models (All-CMIP6) and the reanalyses.
Apart from summer PR, the B6-CMIP6 MMM performance of all variables and corresponding season/month is better than compared to that of the All-CMIP6 MMM. The spatial mean biases in B6-CMIP6 are primarily caused by inaccurate representations of SIC (more specifically near the sea ice edge; see Fig. A4). Taking into account all models and variables, we find that different configurations of the same model perform relatively similar (e.g. EC-Earth, CESM, CNRM, or ACCESS configurations). Models with higher horizontal and vertical resolution and top pressure levels generally performed better, although model-characteristic spatial patterns may play a larger role (3 out of the 6 best models are EC-Earth3 configurations, including the low resolution; Table 1). Our model ranking is roughly in line with Cai et al. (2021b), who (only) evaluated present-day Arctic surface temperatures of 22 CMIP6 models. The authors name MPI-ESM1-2-LR, EC-Earth3-Veg, FIO-ESM-2-0, MPI-ESM1-2-HR, CESM2-WACCM, CESM2, AWI-CM-1-1MR and NorESM2-LM as the best performing models, which (if included in our study) are within our best 8 models (Fig. 4). We also find agreement in the worst performing models (e.g. FGOALS-g3, BCC-CSM2-MR and NESM3).
The remainder of this study will thus center around results based on B6-CMIP6, following latest suggestions to put observational constraints on CMIP6 models and evaluate multiple variables at the local scale for a more realistic representation of future climate (Ribes et al. 2021;Tokarska et al. 2020;Liang et al. 2020;Hu et al. 2021).
Surface Air Temperature: Most CMIP6 models have a cold bias over the reference period, indicated by an overestimation of Arctic SAT by 1 K in summer and 2.9 K in winter. In B6-CMIP6, this area-integrated difference is lowered to 0.8 K and 1.1 K respectively. In All-CMIP6, winter SAT   Difference between the multi-model mean (MMM) of 'best' 6 CMIP6 models (B6-CMIP6) and the multi-reanalysis mean, compared to the respective difference using all 31 CMIP6 models (All-CMIP6). Shown are summer (a-f) and winter (g-l) differences of surface temperature, precipitation and sea ice concentration during the 1981-2010 reference period. Red (blue) indicates that models are overestimating (underestimating) mean values. Colour scales only differ across the seasons. Scale ranges indicate exact minimum and maximum value (rounded to nearest integer)  is underestimated almost across the entire Arctic, while in summer there is a warm bias over the Greenland Ice Sheet (GrIS) in All-CMIP6, which stands in contrast to the cold summer bias of B6-CMIP6. Thus the better area-averaged performance of summer SAT in B6-CMIP6 is partly a result of the compensational effect of lower SAT over the GrIS. This is not ideal, but likely a side effect caused by the decision to evaluate multiple variables (i.e. one would expect a closer spatial approximation to reanalysis fields of best models if SAT was the only constraint). Precipitation: Summer PR is slightly overestimated (by 6 mm/yr) in B6-CMIP6 and slightly underestimated (by 5 mm/yr) in All-CMIP6, with similar regions of error origins-notably those with highest mean PR rates (Fig. 5a-f). Winter PR is underestimated in the Atlantic sector in both B6-CMIP6 and All-CMIP6, but this difference is reduced with the B6-CMIP6 selection. PR biases in CMIP6 are also significantly higher in winter than in summer, with spatial mean differences in winter of 32 mm/yr (B6-CMIP6) and 46 mm/yr (All-CMIP6). Note that locally, winter differences are as high as 958 mm/yr (All-CMIP6) over regions where PR is very high, such as Southwest Greenland, while differences outside of the Atlantic domain are fairly low.
Sea Ice Concentration: Despite the cold summer bias, September SIC is similarly underestimated in both B6-CMIP6 and All-CMIP6, with errors up to 86% (All-CMIP6) at the sea ice edge (Fig. 6c, f). The overall underestimation of September SIE is in line with Shu et al. (2020) and Wei et al. (2020). Maximum differences of March SIC are of the same magnitude (up to 86% in All-CMIP6), but of opposite sign, indicating an overestimation of the seasonal cycle in CMIP6. Comparing the performance of March SIC in B6-CMIP6 to that in All-CMIP6 (Fig. 6i,l) particularly shows that models not included in B6-CMIP6 have too much sea ice around the West Pacific and Atlantic sea ice margin, consistent with colder surface air in All-CMIP6 (Fig. 6j). Although CMIP6 models accurately simulate the total SIA (Fig. 2e), their deficiency to properly simulate regional SIC is a known problem (Wei et al. 2020;Shen et al. 2021). The September and March spatial mean SIC bias is slightly reduced in B6-CMIP, but we still note an over-(underestimation) of SIC near the September (March) sea ice margin compared to all three individual reanalysis (Fig. A4). Due to spatial compensations these spatial model errors do not appear in the SIA time series (Fig. 7e).
In conclusion, B6-CMIP6 and All-CMIP6 over-or underestimate the discussed variables in the same regions (apart from summer SAT over Greenland), but the absolute bias is lower in B6-CMIP6. Further, the area-averaged cold (dry) bias of summer and winter SAT (winter PR) does not hold true for all Arctic regions. As for summer PR, the stronger dry bias of B6-CMIP6 in West Greenland, the Russian Far East and Alaska explains why area-averaged summer PR is too high in B6-CMIP6 but too low in All-CMIP6 (despite a close spatial matching of over-and underestimations of PR). This exemplifies the benefit of evaluating gridded fields instead of area averages, as the latter could potentially favor models in which large local biases are compensated.

Future trends in IAV across seasons and regions
This section presents seasonal and regional differences of Arctic SIC-, SAT-and PR-IAV based on B6-CMIP6. Additionally, we investigate EVAP fields as well as area-averaged PMT, mainly to explain PR trends. Area-averaged mean and IAV trends (from 1981 to 2100) of all variables are summarized in Table 2 and Table A3 for the B6-CMIP6 MMM and All-CMIP6 MMM respectively.
We then highlight regional differences of seasonal trends of all variables, and additionally investigate relative PR-IAV and PMT trends. The future IAV (2041-2100) of all variables based on B6-CMIP6, and the difference of B6-CMIP6 and All-CMIP6 in simulating future means and IAV of Arctic climate is also discussed. To show that projected trends are fairly robust across different CMIP6 models and multiple members of individual models, Figure A7 (A8) shows the summer (winter) spread of B6-CMIP6 and All-CMIP6 models, as well as spreads of two B6-CMIP6 models with multiple members. The following subsections begin with a brief summary of mean trends in SIC, SAT, and PR, before highlighting respective spatiotemporal IAV trends in more detail.

Sea ice concentration
While the decline of mean September SIA over 1981-2100 is comparable to that of mean March SIA (572,000 km 2 / decade in September and 528,000 km 2 /decade in March) ( Table 2), there are large differences in both the timing and intensity (Fig. 7e). The B6-CMIP6 MMM predicts a practically sea-ice free Arctic summer between 2040 and 2050, which is in line with most CMIP6 studies (Notz and Community 2020; Ravindran et al. 2021). The decrease of mean SIA in March accelerates after 2050, leading to a SIA minimum of 600,000 km 2 at the end of the century.
The summer sea ice melting causes an initial increase of the IAV of September SIA, as thinner ice can grow or melt faster depending on atmospheric and oceanic variability (Holland et al. 2008;Goosse et al. 2009). In both All-CMIP6 and B6-CMIP6, this initial positive trend is small and subject to large uncertainty. However, similar to results from Goosse et al. (2009), we find that SIA-IAV is highest at a SIA of 3.06 million km 2 . Regionally, September SIC-IAV decreases in all Arctic regions (Fig. 8l), whereas summer EVAP-IAV slightly but significantly increases in the Central Arctic (Fig. 8k) despite negative mean trends (Fig. 8c). March SIA-IAV increases in the last decades of this century, corresponding to a period of rapid decline in mean SIA (Fig. 7e, f). The relatively weak trends of areaaveraged March SIA-IAV are due to regional compensations (Fig. 8p) that are hidden in the area-averaged signal. There is a sharp trend gradient around the Central Arctic, with strong negative SIC-IAV fields (up to − 3.2%/decade) to the south and positive SIC-IAV trend fields (up to 3.3%/decade) to the north, correlating with corresponding increased and decreased winter EVAP-IAV trend fields (Fig. 8o). In winter, Fig. 8 Linear mean (a-h) and IAV (i-p) trends of the 'best' 6 CMIP6 models (B6-CMIP6) for the summer/September and winter/March seasons from 1981-2100. Calculated by averaging over IAV fields of all models, and then calculating linear trends of the multi-model mean (MMM). Hatched areas mark grid points with insignificant trends (p-value > 0.05). Same colour scales for summer and winter to allow for a direct comparison. Scale ranges indicate exact minimum and maximum trend of either season rounded to the nearest integer the starker contrast between the warm ocean waters and cold air aloft allows for a stronger EVAP response compared to summer, both in terms of mean and IAV (Fig. 8g,o).
To summarize, September SIA-IAV is presently at its maximum due to a relatively large area covered by thin sea ice, which can rapidly melt and grow in response to atmospheric and oceanic forcing variability. Between 2030 and 2040, September SIA-IAV abruptly drops until the end of the century related to the eventual permanent absence of sea ice. In March, there are contrasting regional trends: SIC-IAV increases (decreases) over the new (former) sea ice margin.

Surface air temperature
Mean trends in winter SAT are largest over the ocean and on average more than twice as high (1.55 K/decade) compared to summer (0.7 K/decade) (Fig. 8a,e). The stronger SAT increase in winter compared to summer has been linked to sea ice decline in previous studies (Serreze and Barry 2011;Bintanja and Selten 2014;Sellevold et al. 2022). Sellevold et al. (2022) performed simulations of a CMIP6 model with reduced Arctic sea ice and found that, in both winter and summer, SAT increases over open ocean areas, from where winds export these warmer air masses further north. Similar to their results, mean SAT trends during summer shown here are highest in the Atlantic and Northern Eurasia. We also find increased summer SAT over the Central Arctic, whereas in Sellevold et al. (2022)'s sea-ice experiment the Central Arctic was slightly cooling. Although the maximum SIA in Sellevold et al. (2022)'s simulations was higher than those in B6-CMIP6 at the end of this century, this emphasizes the role of other forcings on increased summer SAT, such as atmospheric heat transport, which were held constant in their experiment.
As for SAT-IAV trends, there are trends of opposite sign in summer and winter: summer variability increases and winter variability decreases (Fig. 7b). In the Central Arctic however, there are small but significant positive trends in both seasons of up to 0.1 K/decade (Fig. 8i,m). When averaged over 65 • N-90 • N , the trend signal in summer is dominated by these positive trends, even though some other regions exhibit negative trends. In winter, positive SAT-IAV Fig. 9 End of century IAV of summer (JJA; a-c) and winter (DJF; e-g) surface temperature, precipitation, evaporation, as well as September (d) and March (h) sea ice concentration simulated by the multi-model mean of the 'best' 6 CMIP6 (B6-CMIP6) models. Same colour scales for summer and winter to allow for a direct comparison. Note that instead of computing IAV on just the last 30 years of this century, we defined future IAV based on the last 60 years of this century to reduce low-frequency internal variability trends over the Central Arctic slightly compensate strong decreases in SAT-IAV which are strongest over regions of permanent open ocean areas. This strong decrease in winter SAT-IAV is most pronounced in the Atlantic sector, where in the last decades it was strongest, both in the reanalyses (Fig. 5, lower panel) and in B6-CMIP6 (not shown). Local SAT fluctuates less over permanently sea-ice free regions, as the surface air is in constant contact with the relatively mild temperatures of the ocean (Screen 2014;Landrum and Holland 2020). The magnitude of the strong winter SAT-IAV decrease can also be seen in Fig. 9e, indicating SAT-IAV is lowest in the Barents and Kara seas at the end of this century. The region of maximum winter SAT-IAV shifts further north, together with the marginal ice zone.
Over the newly formed persistent open ocean areas during winter, there are also decreased EVAP-IAV trends (Fig. 8o), further suggesting that the dominant mechanism behind decreased (increased) SAT-IAV are decreased (increased) surface fluxes from the warmer ocean to the colder atmosphere. As this gradient is larger in winter, the mean and IAV response of SAT to sea ice changes are also larger. This is in broad agreement with Reusen et al. (2019), who showed a decrease in annually averaged SAT-IAV in warmer climates in response to reduced SIC, based on EC-Earth simulations.
Here it is shown that this area-averaged trend is dominated by the winter season and only true in distinct regions in the Arctic, in particular the North-Atlantic and the North-Pacific ocean. The gradient between high summer SAT-IAV over  (DJF) land and low summer SAT-IAV over the ocean is smaller at the end of this century (Fig. 9a) compared to the reference period (CMIP6 models resemble the reanalysis fields in Fig. 5g). As for winter, the region where SAT used to fluctuate the most shifts northward, both in the Atlantic and Pacific sector (Fig. 9d). From the reference  to the future (2041-2100) period, area-averaged summer (winter) SAT-IAV increases (decreases) from 0.98 to 1.1 K (from 2.18 to 1.86 K) (not shown).
In summary, SAT-IAV increases in the Central Arctic and decreases in lower latitudes in summer and winter. However, in summer the increase in the Central Arctic dominates, whereas in winter the decrease in the lower latitudes dominates, so that area-averaged trends show different signs.

Precipitation
Arctic-averaged mean PR in B6-CMIP6 increases at a rate of 14.54 mm/decade in summer and a rate of 19.86 mm/ decade in winter (Table 2). Mean PR increases have mainly been attributed to thermodynamic changes, i.e. increased sensible and latent heat fluxes resulting from a larger area of Fig. 11 a Linear trends of relative summer (JJA) precipitation interannual variability IAV (PR-IAV) before nearly sea-ice free conditions . Calculated by dividing grid-point PR-IAV time series by corresponding mean values. Hatched areas mark grid points with insignificant trends (p-value > 0.05) of the 'best' 6 CMIP6 multimodel mean (B6-CMIP6 MMM). b Same as a but after nearly seaice free conditions (2041-2100). c Time series of summer poleward moisture transport interannual variability (PMT-IAV) over 65 • N (lightgreen) versus over 80 • N (darkgreen) of the B6-CMIP6 MMM, defined as the difference between area-averaged Arctic precipitation and evaporation IAV. Shaded areas represent the total range of all B6-CMIP6 models. Dotted lines represent the equivalent MMM of All-CMIP6. Orange graph represents the difference of PMT-IAV across 80 • and 65 • N (based on B6-CMIP6). Linear regression slopes before and after 2040 indicated by blue lines open water in the Arctic (Higgins and Cassano 2009;Bintanja et al. 2020). This effect is particularly strong in winter due to the relatively large temperature gradient between the ocean and the surface air. In line with other studies, we find a stronger increase of area-averaged mean PR in winter than in summer (Fig. 7c) (Pendergrass et al. 2017;Bintanja and Selten 2014;Bogerd et al. 2020). In summer, mean trends are greatest over land, especially over the west coast of Greenland, while winter PR trends are greatest in the Barents, Kara and Chukchi seas, and over the eastern coast of Greenland (Fig. 8b,f). The spatial pattern of mean winter trends strongly resembles those shown by Sellevold et al. (2022), who isolated the effect of reduced sea ice on PR, and are thus very likely caused by the loss of sea ice, which is also in line with Higgins and Cassano (2009) and Bintanja et al. (2020).
PR-IAV significantly increases in summer and winter. Surprisingly, the stronger mean PR increase in winter does not result in a comparable increase in winter PR-IAV. Instead, summer and winter area-averaged PR-IAV increase nearly simultaneously and at a similar rate ( Fig. 7d; Table 2). To further quantify PR-IAV trends independent of long-term mean PR changes, we calculated relative PR-IAV (Rel-PR-IAV) by scaling the PR-IAV time series by the runningmean PR time series. This confirms that relatively to mean trends, Arctic PR-IAV only increases in summer (Fig. 10a). This is in line with Bogerd et al. (2020), who found a relatively strong (weak) increase of summer (winter) PR-IAV compared to mean increases in warmer climates. We find that Rel-PR-IAV trends are less significant over land areas (Fig. 10b,c). Over most oceanic areas, there is a significant increase (decrease) in summer (winter) Rel-PR-IAV, which are discussed in the following.
Negative winter Rel-PR-IAV trends are partly due to presently low mean winter PR compared to high PR-IAV, resulting in higher Rel-PR-IAV in winter than in summer at present (Fig. 10a). This relation is modified by the significant increase of EVAP due to sea ice loss, generating a consistent moisture source and therefore increased winter mean PR. Consequently, the decrease in winter Rel-PR-IAV is strongest over areas of permanent sea ice loss (Fig. 10c). We find that these regions of more consistent winter PR shift northwards over time as the sea ice retreats (not shown), clarifying why there are negative winter Rel-PR-IAV trends even in the Central Arctic. Observation-based studies have confirmed the strong influence of sea ice on the moisture budget in these regions (Zhong et al. 2018;Ford and Frauenfeld 2022). We stress that absolute winter PR-IAV still significantly increases in all regions except in the Barents and Kara Seas (Fig. 8n). This is the area where the IAV of other variables (SAT, EVAP and SIC) also strongly decreases in response to the prospective absence of sea ice (Fig. 8m-p).
To examine the increase in summer Rel-PR-IAV (and whether it is triggered by sea ice-driven EVAP-IAV), we recalculated EVAP-IAV and Rel-PR-IAV trends based on pre-versus post-'ice free summer' conditions (1981-2040 versus 2041-2100). The positive summer EVAP-IAV trends over oceanic areas are of comparable magnitude in both periods, and only slightly stronger over the Central Arctic after 2041 (not shown). Summer Rel-PR-IAV trends however are insignificant or even negative until 2040 (Fig. 11a), but significantly positive over the ocean from 2041 onward (Fig. 11b). This later increase in summer Rel-PR-IAV is thus not mainly caused by increased local EVAP-IAV, but possibly a response to increased PMT-IAV. Indeed, Bintanja et al. (2020) found a stronger (weaker) contribution of PMT to Arctic PR-IAV in summer (winter) in CMIP5 models, in which future mean PR trends in summer are also weaker than in winter (Bintanja and Selten 2014). However, PMT-IAV increases relatively linearly from 1981, both in CMIP5 across 70 • N (Bintanja et al. 2020) and CMIP6 across 65 • N . Thus it remains unclear why summer Rel-PR-IAV trends are only significantly positive from 2041 to 2100, mainly in the Central Arctic (Fig. 11b).
Previous studies have suggested that reduced SIC (especially during winter) let southerly winds reach further north (Valkonen et al. 2021;Rainville and Woodgate 2009), and that the storm tracks shift northward with further warming (Harvey et al. 2020). Thus, to test if the northward retreat (and eventual absence) of summer sea ice triggers an acceleration of PMT-IAV further north, we compared summer PMT-IAV across 65 • N versus 80 • N over time. Indeed, summer PMT-IAV across 80 • N accelerates after 2040, while PMT-IAV across 65 • N increases at a relatively constant rate from 1981 to 2100 (Fig. 11c). This suggests that the sea ice effect on summer PR-IAV is more indirect and thermodynamic: in years with very high PMT, low SIC allow for more moisture to be sourced and transported over sea-ice free open ocean areas. In other words, future Arctic PR-IAV in summer is less affected by exclusively local Arctic SIC-IAV, but more by increased interactions with lower latitudes partly owing to the long term sea ice decline. Note that due to the simplified moisture budget used here (PR minus EVAP), PMT across 80 • N also includes evaporated moisture between 65 • N and 80 • N . Our conclusion (i.e. that summer PMT-IAV and PR-IAV north of 80 • N increases in a sea-ice free Arctic) would thus benefit from further analyses based on other data and identifications of moisture source regions.
To summarize, PR-IAV significantly increases in both seasons, except for the Barents and Kara Seas in winter (where it decreases due to reduced EVAP-IAV caused by sea ice loss). In summer, both PMT-IAV and PR-IAV trends accelerate in the Central Arctic under sea-ice free conditions, further amplifying increased PR-IAV. In winter, the increase in PR-IAV is small compared to the long-term mean PR increase. Still, winter PR-IAV significantly increases with further warming, especially over land.

Comparison of future climate in B6-CMIP6 and All-CMIP6
This section briefly summarizes how SAT, PR and SIC means and IAV in B6-CMIP6 differ from those in All-CMIP6. Over the ocean, mean SAT in B6-CMIP6 are up to 4 • C higher in the future compared to All-CMIP6, in both summer and winter (Fig. A9a,e). Over land, the surface air in B6-CMIP6 is colder in summer, but significantly warmer in winter over Greenland and most other land regions. Compared to All-CMIP6, B6-CMIP6 exhibits higher future SAT-IAV in the Central Arctic and lower SAT-IAV just south of this region with higher SAT-IAV (especially in winter), presumably as a result of the lower future SIE in B6-CMIP6 ( Figure A9h): SIC-IAV and EVAP-IAV are also higher (lower) north (south) of the sea ice edge in B6-CMIP6 (Fig. A9o,p). The sea ice-induced variability of air-sea heat fluxes likely causes this North/South SAT-IAV and EVAP-IAV gradient compared to All-CMIP6, as the area of highest air-sea heat flux variability is located further north in B6-CMIP6. This further stresses the large degree to which local SAT variations are governed by the thermodynamic response to sea ice changes.
We also find moderately higher winter PR-IAV over regions with higher EVAP-IAV in B6-CMIP6, but the local sea-ice effect on PR-IAV is small compared to the EVAP-IAV and SAT-IAV response. Overall, future PR-IAV in B6-CMIP6 and All-CMIP6 are fairly similar over most regions (Fig. A9j,n), but B6-CMIP6 models indicate higher mean winter PR and PR-IAV over distinctive regions such as Southeast Greenland and the Scandinavian coast, where there are also higher mean values (up to 172 mm/yr) in the future compared to All-CMIP6 (Fig. A9j).

Conclusions
In this study we discussed the role of natural variability on IAV in Arctic climate and evaluated CMIP6 models based on gridded fields and multiple variables (SAT, PR and SIC). Over short time scales, IAV of Arctic SAT, PR and SIA is largely influenced by natural variability in both seasons (especially during winter), whereas future trends of Arctic climate IAV are relatively robust.
Based on models that accurately capture present-day mean fields of SAT, PR and SIC, we presented and analyzed respective future trends under a high emission scenario. We found that mean and IAV trends of all variables are susceptible to significant regional and seasonal differences, which strongly emphasizes the need to address and understand local and seasonal responses to Arctic warming. Compared to All-CMIP6, the 'better' models project higher summer and winter SAT over the ocean, more winter PR in Southeast Greenland, and slightly lower maximum and minimum SIE. Regarding future IAV trends, the 'best' models indicate that: 1) Variations of September SIA are presently largest, as there is a relatively large area covered by thin sea ice, which rapidly melts and grows depending on the environment. Between 2030 and 2040, September SIA-IAV abruptly drops until the end of the century caused by the eventual permanent absence of sea ice. In March, there are contrasting regional trends: SIC-IAV increases (decreases) over the new (former) sea ice margin.
2) Future changes in SAT-IAV are strongly linked to local sea ice changes. In summer, SAT-IAV increases in the Central Arctic and decreases in lower latitudes and the North Atlantic. The same applies to winter SAT-IAV, but the SAT-IAV decrease is much stronger compared to summer (due to stronger surface fluxes in winter), causing a significant area-averaged decrease of winter SAT-IAV.
3) Due to large increases in mean PR in both summer and winter, PR-IAV increases in almost all Arctic regions. Although mean PR increases more in winter, PR-IAV increases more in summer, which can be linked with strong increases in PMT-IAV. PR-IAV increases most over land (including coastal Greenland), where in the future it will dominantly fall as rain in the summer months.
Our results highlight the importance of calculating IAV on a grid-point basis and the role of Arctic sea ice loss on future PR-IAV and SAT-IAV trends. To robustly define causes for changes in Arctic climate IAV towards the future, we plan to build on this analysis by investigating large ensemble and long-term climate runs, and include variables such as sea ice thickness, sea level pressure, specific humidity, and wind components. Tables 3, 4 ,5 and Figs. 12,13,14,15,16,17,18,19,20

Fig. 18
End of century spread of summer (JJA) IAV of surface temperature, precipitation, evaporation and September sea ice concentration simulated by two multi-member models of the 'best' 6 CMIP6 models (B6-CMIP6), namely MPI-ESM1-2-LR (first column) and EC-Earth3-Veg (second column). Compared to the spread of B6-CMIP6 models (third column) and all 31 models (fourth column). Same colour scales for each model (member) ensemble set to allow for a direct comparison. Scale ranges indicate exact minimum and maximum trend value rounded to the nearest integer or decimal Author Contributions All authors contributed to the study conception and design. MK analyzed and interpreted the data and discussed the findings with RB and EvdL The manuscript was written by MK and reviewed by both co-authors.
Funding The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Fig. 20
Future (2041-2100) differences of means (a-h) and IAV (i-p) of surface temperature (SAT), precipitation (PR), evaporation (EVAP), and sea ice extent (SIE) or concentration (SIC) between 'best' 6 CMIP6 models (B6-CMIP6) and all 31 CMIP6 models (All-CMIP6). Instead of mean SIC differences, the March and Sep-tember SIE (defined as the sum of grid points where SIC > 15%) of B6-CMIP6 (light turquoise) is compared to those of All-CMIP6 (dark turquoise). Same colour scales for the summer and winter season to allow for a direct comparison