Automated parameter tuning applied to sea ice in a global climate model

This study investigates the hypothesis that a significant portion of spread in climate model projections of sea ice is due to poorly-constrained model parameters. New automated methods for optimization are applied to historical sea ice in a global coupled climate model (HadCM3) in order to calculate the combination of parameters required to reduce the difference between simulation and observations to within the range of model noise. The optimized parameters result in a simulated sea-ice time series which is more consistent with Arctic observations throughout the satellite record (1980-present), particularly in the September minimum, than the standard configuration of HadCM3. Divergence from observed Antarctic trends and mean regional sea ice distribution reflects broader structural uncertainty in the climate model. We also find that the optimized parameters do not cause adverse effects on the model climatology. This simple approach provides evidence for the contribution of parameter uncertainty to spread in sea ice extent trends and could be customized to investigate uncertainties in other climate variables.


Introduction
Lack of consistency between observed sea ice extent trends and those simulated by complex climate models has garnered significant public and scientific attention in recent years. This interest is partly driven by the radiative properties of sea ice which amplify climate change as well as the implications of sea ice change for local communities and ecosystems. Yet simulations from the latest Coupled Model Intercomparison Project (CMIP5) continue to exhibit various biases in their mean state, particularly during winter months (Stroeve et al. 2012;Turner et al. 2013), as well as in the magnitude and even sign of hemispheric trends (Flato et al. 2013).
Such model biases in recent historical sea ice can broadly be attributed to two sources of uncertainty: firstly, internal variability arising due to the chaotic nature of climate, and secondly, structural uncertainty arising from incomplete knowledge of and inability to represent the complete physics of the climate system (Hawkins and Sutton 2009). Parametrizations of complex processes may reduce spatially or temporally evolving variables to a single constant value or approximate a number of processes by some non-physical constant. Uncertainty in parameter values has previously been quantified by constructing large perturbed physics ensembles, where parameters are varied within physically reasonable ranges to construct a range of climatologies (Murphy et al. 2004;Stainforth et al. 2005). Sea ice parameter uncertainty can be larger than that expected from natural variability (Ridley et al. 2007) and Abstract This study investigates the hypothesis that a significant portion of spread in climate model projections of sea ice is due to poorly-constrained model parameters. New automated methods for optimization are applied to historical sea ice in a global coupled climate model (HadCM3) in order to calculate the combination of parameters required to reduce the difference between simulation and observations to within the range of model noise. The optimized parameters result in a simulated sea-ice time series which is more consistent with Arctic observations throughout the satellite record (1980-present), particularly in the September minimum, than the standard configuration of HadCM3. Divergence from observed Antarctic trends and mean regional sea ice distribution reflects broader structural uncertainty in the climate model. We also find that the optimized parameters do not cause adverse effects on the model climatology. This simple approach provides evidence for the contribution of parameter uncertainty to spread in sea ice extent trends and could be customized to investigate uncertainties in other climate variables. from inter-model spread (Hodson et al. 2013), motivating its reduction.
Recent years have seen the development of methods for automated parameter optimization (Severijns and Hazeleger 2005;Tett et al. 2013;Zhang et al. 2015). These are motivated by the time-consuming and subjective nature of manual parameter tuning, whereby selected parameters are perturbed in a number of simulations to find a configuration which improves the specified focus of the modeller (Mauritsen et al. 2012;Voosen 2016;Hourdin et al. 2016). New automatic optimization software has been developed that uses cost function type methods to calculate the combination of parameters required to reduce the difference between model simulation and target observations for userspecified variables, whilst accounting for their non-linear dependence on parameters. Due to the high computational cost involved in many of the methods proposed, automated parameter optimization has only been tested on atmosphere-only [eg. Zhang et al. (2015)] or very low-resolution [eg. Gregoire et al. (2011)] models thus far.
The innovation of this paper is to apply an automated parameter tuning method to a widely-used global, fullycoupled climate model in order to investigate whether targeting sea ice extent can improve its consistency with observations. Changes to the model must use physically allowed parameter values, account for internal variability and not compromise other climate variables. We implement the Tett et al. (2013) method for automated tuning, discussing optimization input in Sect. 2.2 and parameters in Sect. 3 to ensure these criteria are satisfied, and evaluate the results in Sect. 4 by comparison to observations.

Methods
Most data in this study comes from global climate model simulations, which are discussed below. Observational data is from the HadISST set of satellite sea ice concentrations (Rayner et al. 2003) and the Pan-Arctic Ice Ocean Modeling and Assimilation System sea ice volume reanalysis [PIOMAS, Schweiger et al. (2011)]. We show HadISST sea ice extents from 1941, but note that data is derived from hand-drawn charts, with limited Antarctic sources, before the beginning of satellite observations in 1978 (Rayner et al. 2003).

Model
The model used in this study is the Hadley Centre Coupled Model version 3 (HadCM3) (Gordon et al. 2000), a global model with full coupling between the ocean, atmosphere and sea ice components. The ocean component has a horizontal resolution of 1.25° × 1.25° with 20 vertical levels and the atmosphere component has a lower resolution of 3.75° × 2.5° with 19 vertical levels.
The sea ice component is described in Cattle et al. (1995) and we summarize the main features here, noting that it is a simple model. Sea ice is advected with the top ocean current and mixed laterally by unresolved wind and ocean drags, with magnitude determined by coefficient of turbulent diffusion A H . Hence the time evolution of ice thickness g i is given by where the Heaviside step function prevents ice convergence once it has reached a depth of 4 m. Sea ice forms in the ocean when the sea surface temperature reaches a freezing temperature T f = −1.8 • C. For thermodynamic changes, ice and snow are treated as a single layer, though which heat conduction H I is given by where i and s are the thermal conductivities of ice and snow, h i and h s are the ice and snow layer thicknesses, and T S is the temperature of the top surface. The heat flux from the ocean into the bottom of the ice H o is given by where and c w are the density and specific heat capacity of sea water, the first ocean layer has temperature T o and depth d = 10 m and K is the ocean-ice diffusion coefficient. To account for leads, sea ice concentration is not allowed to exceed c NH max = 99.5% in the Arctic and c SH max = 98.0% in the Antarctic. The model assumes constant salinity (0.6%) and a linear temperature profile in the ice. Albedo is linearly parametrized, with a warm ice albedo of w = 0.5 at an ice temperature of 0 °C which increases linearly to a cold ice albedo of c = 0.8 at the defined cold-ice temperature of T c = −10 °C and below. In the ocean, horizontal eddy mixing of tracers such as salinity and temperature is parametrized. In particular, along-isopycnal diffusion is parametrized by a single diffusion constant D.
In contrast to HadCM3, state-of-the-art sea ice models represent sea ice using multiple thickness categories and multiple ice and snow layers (Hunke et al. 2010). They use rheologies such as the non-linear viscous plastic model (Hibler 1979). The sea ice albedo may depend on snow and ice thickness and spectral properties, and melt ponds are calculated explicitly (Hunke et al. 2010). While the representation of sea ice, as well as other processes, in HadCM3 is much more limited, it was included in CMIP5 and is still widely used for research and impact studies. Its low computational cost makes it an ideal choice for this study. (1) In this study, 'control' refers to model experiments under pre-industrial conditions where all external forcings are as observed in 1850. All model runs started from initial conditions taken from a 5505 year-long control run. The first year after the 5505-year control simulation is model year one. For parameter perturbation experiments, the parameter change is implemented after model year twenty. While the adjustment period for HadCM3 can be around 350 years (Banks et al. 2007), this timescale is not feasible for us to gauge parameter sensitivity. We observe that initial drifts in sea ice extent following individual diffusion coefficient perturbations relax after approximately ten years (see Fig. 1). Therefore model years 31-40 are used to select parameters. Model years 31-170 are used to examine long-term model climatology and estimate internal variability. The choice of a 140-year period arose from CO 2 doubling experiments which will be written about elsewhere. We assume that internal variability is unaffected by forcings and calculate it from the standard deviation of relevant decadally-averaged observables from the 140 years.
'Forced' runs include a package of observationallybased forcings including greenhouse gases, aerosols and volcanoes from Schurer et al. (2014). The forced run began in model year zero with forcings starting as in AD800 and was branched into four ensemble members at AD1400 (model year 600). Post 2000, forcings follow the A2 emissions scenario. Any sea ice data from the forced standard model shown is the mean of four ensemble members corresponding to the forcing year.

Optimization
Previous optimization studies [eg. Severijns and Hazeleger (2005); Zhang et al. (2015)] have used the downhill simplex method, which Wright (1996) has suggested is often unreliable. We instead employ a classical Gauss-Newton approach. A simple summary of this optimization technique is provided here and readers are referred to Tett et al. (2013) and Tett et al. (2017) for a fuller description. The aim is to reduce the root mean square error F between a vector of simulated observables and a vector of target observables , with F 2 = || − || 2 . This is a problem in n-dimensional parameter space for n chosen parameters which are allowed to vary within some n-dimensional boundary N. Beginning with the model default parameter values, the algorithm automatically executes the following steps: 1. Runs the model n times in parallel. For each run, one parameter is perturbed individually by +10% of its allowed range away from the starting point. 2. Constructs the error F from each of the n runs, and then calculates a finite-difference Jacobian, which describes how the error evolves under parameter changes. 3. Uses the Gauss-Newton equations to find the sequence in parameter space which converges towards a stationary point for F and defines a search direction ̂. 4. Carries out a line-search at 20, 70 and 100% of the vector ̂ in parameter space and runs the model at these points in parallel. 5. Calculates the error and: (a) Uses the point with the smallest error as the start point for the next iteration, or (b) Terminates if it has been unable to reduce the error from the start point.
The resulting simulation therefore uses the configuration of constant parameter values which minimize the error F within the boundary of allowed parameter values. Computing resources will dictate the number of model runs that can be carried out in parallel and hence limit parameter dimensionality, which in our case is four. The algorithm requires: (1) a vector of target observations; (2) parameters which have an impact on the simulated observables and their allowed range; and (3) a simulation design. Below we outline these choices.
Simulated variables for comparison with observations are calculated from sea ice concentration during the first decade where satellite observations are available: 1981 to 1990. Monthly sea ice concentration is averaged over these ten years and then a 15% ice extent cutoff is applied. Due Fig. 1 The maximum sea ice extent each year in the Southern Hemisphere during the first twenty years after a parameter perturbation is individually perturbed to a high or low value. Colours and linestyles are defined in the legend to the right of the plot and parameters are described in Table 1 to the high scientific interest in sea ice extent extrema, we tune the maximum and minimum of the monthly sea ice extent cycle in each hemisphere, giving four tuning variables. We choose the maximum and minimum regardless of which month they fall in, as HadCM3's Northern Hemisphere maximum does not always fall in the same month (see Fig. 2). Equivalent target observations are calculated from the HadISST1 set of satellite sea ice concentrations (Rayner et al. 2003) (inadvertently) over 1980 to 1990, but differences between the mean over 1981 to 1990 and over 1980 to 1990 are smaller than model internal variability (Fig. 3). Differences between different observational data sets are considerable (Bunzel et al. 2016) but we choose to neglect this for simplicity.
Tuning is only applicable to models where the difference between the standard model and observations is larger than that expected due to model noise (Notz 2015). We find that the differences between our target and observations are indeed larger than internal variability (Fig. 3).
Internal variability will magnify any minute differences between simulations, so multiple experiments are required to confirm that the stationary point located is not due to chance, thus avoiding selection bias. While ten years may be sufficient for fast adjustment of sea ice extent to individual parameter perturbations, as discussed above, variation of multiple parameters may require a longer adjustment period. We therefore trial two approaches: the first uses a spin-up period of ten years, requiring a total model run length of twenty years to calculate the observables in each of the two steps per iteration (case OPT1) and the second uses spin-up period of twenty years, requiring a total model run length of thirty years (case OPT2). Computational expense limits further trials. Model runs include observationally-based forcings from Schurer et al. (2014).
We expect that multiple stationary points exist, with different parameter combinations resulting in the same or similar error functions. Issues of equifinality are explored in Tett et al. (2013). We do not examine these here, and instead choose to start from the default model parameter values and investigate the first stationary point that is encountered.  Murphy et al. (2004), who consulted with HadCM3 experts to identify physically reasonable ranges for K, D, m and T c , but we make some changes as follows. See Table 1 for a summary of extreme values and justifications.

Parameter selection
From four years of field measurements, Perovich and Polashenski (2012) find seven steps in albedo evolution. They find a cold snow albedo of 0.85 which lasts between freeze-up and melt onset and warm snow albedo below 0.5 and as low as 0.32 during melt pond drainage. Hence we extend the albedo range suggested by Murphy et al. (2004) to include a lower warm albedo m of 0.45 and a higher cold albedo c of 0.85. c max is a parameter unique to HadCM3 which is based on a minimum lead fraction of 0.005 in the Arctic and 0.02 in the Antarctic. However, Lindsay and Rothrock (1995) observe a seasonally-varying lead fraction in the central Arctic, between 0.02 in winter and 0.06 in summer, using a radiometer with a resolution of around 1km. Marcq and Weiss (2012) finds that most leads are very small (<20 m) and hence unresolvable in satellite datasets, which may further reduce sea ice concentrations. We use the conservative lower estimate of 0.04, so that c max = 0.96, as our extreme value. We could not find a comparable observed limiting lead fraction in the literature for the Antarctic and thus assume the same figure as for the Arctic. Rae et al. (2014) perturb ice conductivity i in a sensitivity study for a later Hadley Centre model, HadGEM3, on the basis of the experimental range summarized by Pringle et al. (2007). They perturb only to the high end of this range; we additionally perturb to the low end of this range.
The very high uncertainty in the standard value of the turbulent diffusion coefficient A H -with default values ranging from 675 to 2000 m 2 s −1 -motivated us to reduce it to zero. We choose its upper limit according to the study where it was originally proposed, Bryan (1969).
Individually perturbing parameters to the extrema of their physically reasonable ranges in short control experiments provides a first-order estimate of parameter uncertainty. We select the four parameters that result in the largest changes in our simulated observables when individually perturbed, those changes being significantly larger than our estimate of internal variability (Fig. 4), and which span a variety of different physical processes. The four parameters are: the ocean-ice diffusion coefficient K, the alongisopycnal ocean diffusion coefficient D, the cold-ice temperature (which controls variation of sea ice albedo) T c , and the maximum grid cell sea ice concentration c max . Allowed ranges for these four parameters are further tested in long control experiments. Evidence of drift in the top of atmosphere net flux using the extreme values of D led us to refine its allowed range. The four parameters and their final allowed value ranges are shown in Table 2.

Results
Both optimizations converged within four iterations, with four parameter perturbations and three line-searches per iteration plus one standard model run requiring (4 × 7 + 1) × 20 = 580 and (4 × 7 + 1) × 30 = 870 years of parallel model runs for OPT1 and OPT2 respectively. In  Table 1 for a description of parameter perturbations both cases, most of the error reduction occurred within the first iteration, suggesting the optimization could be run for just one iteration where computer resources are limited.
The parameter values which resulted in the simulation with the smallest error are shown in Table 2. In both trials, K increased, c max decreased, D increased and T c remained the same relative to the standard model. The magnitude of changes for K and c max are slightly larger in OPT2, which had a longer spin-up period. The proximity of the final parameter configurations suggests that both experiments have located the same stationary point and provides evidence that the optimization is robust under changes in spinup period.
The ocean-ice heat diffusion parameter K changed the most from the default value relative to its range. Examining the change in error per unit change in parameter for the Gauss-Newton step of the four iterations in OPT1 shows that perturbing K caused the largest change in error (see Table 3). Increasing K in the first iteration decreased sea ice extent in both hemispheres and seasons. This is not desirable for the Northern Hemisphere minimum, where the standard model has too little ice, but standard model difference from observations is largest in winter (Fig. 2) and so winter improvements are prioritized. We also note that in later Hadley Centre models, K is replaced by a function of friction velocity with a minimum value of u * = 0.005 ms −2 [McLaren et al. (2006); McPhee (1992)]. This value would require a minimum value of K = 1.5 × 10 −4 m 2 s −1 in the HadCM3 scheme, hence the higher optimized value of K may have some physical basis. Turning to the other parameters, increasing T c leads to higher-albedo ice and larger sea ice extent in all four tuning variables in the first iteration, increasing the error. The default value of T c lies on the lower bound specified by Murphy et al. (2004) and so it stays constant during the line-search and remains at the default value. The change in error per unit parameter change is smallest for ocean isopycnal diffusion D. It causes decreases in sea ice extent in the Antarctic but increases in the Arctic, which cancel out changes in the error function F. The final value for the maximum allowed sea ice concentration c max is the most different from its default value relative to its allowed range. This is the least physicallybased parameter of the four, designed to cover a range of dynamical and thermodynamical processes involved in sea ice fracture and lead opening. As such, we suggest that this parameter is the most reasonable to manipulate to modeller requirements.
These parameter changes cause large error reductions for both optimizations. The root mean square error F between simulation and target reduces by 89% in OPT1 (from 5.3 million km 2 to 0.5 million km 2 ) and by 64% in OPT2 (to 1.9 million km 2 ). The difference in error between OPT1 and OPT2 is due to both model noise and the small differences in parameter values for K, c max and D. For simplicity, we consider only the lowest-error result, OPT1, in the following.
Firstly, we confirm that the control model climatology using OPT1 parameter values is stable. Over a 140 year model run, we find that the model does not exhibit a drift in the top of atmosphere net flux or in surface temperature (Fig. 5a, b). The differences in climatology between the optimized and standard configurations are consistent with there being significantly less global sea ice extent in the optimized case than the standard case (Fig. 5c). For example, relative to the standard control, the optimized control shows some warming (Fig. 6). Given that HadCM3 is biased slightly cold in Northern Hemisphere sea surface temperatures [Gordon et al. (2000)], this is a slight improvement in the model climate. Surface air temperature and ocean surface potential temperature are higher in the optimized model by up to 0.5 K away from the poles (Fig. 6). In the Arctic, surface air temperatures increase by up to 2.5 K in many areas, but also show a decrease of similar magnitude over Siberia and Northern Canada, which contrasts with a more uniform temperature increase of up to around 2 K in the Antarctic. We attribute the cooling in high-latitude ocean surface temperature to the decrease in ocean isopycnal diffusion, D, with a contribution from the increase in ocean-ice heat flux, K, extracting more heat from the ocean during the melt season. These cooler ocean surface temperatures although this is unlikely to be significant given its large variability (Banks et al. 2007). Precipitation changes are generally smaller than model noise except for small shifts in position of the Intertropical Convergence Zone (ITCZ). Chiang and Bitz (2005) propose that the ITCZ shifts away from an imposed increase in ice cover in a particular hemisphere; in the optimized model both hemispheres have reduced sea ice and there is no clear direction in the ITCZ shift. We conclude that changes in the model climate are reasonable and suggest perturbations to sea-ice relevant parameters may even reduce other model biases.
Three of the four tuning variables in the optimized model (OPT1) are now consistent with observations (Fig. 3). The Southern Hemisphere maximum is still outside of the range of internal variability, but the difference between model and observations has reduced by around 3 million km 2 . Over the full monthly sea ice cycle, the optimized model is much more consistent with observations over 1981 to 1990 than the standard model (despite being optimized to observations for 1980-1990) (Fig. 2). However, we find that the time lag between model and observations noted by Turner et al. (2006) persists. Over the long control experiment, variability is lower in the optimized model than the standard model for nearly all months of the year in both hemispheres (Fig. 7). Zunz et al. (2013) found that HadCM3, Fig. 6 The difference in 140-year averages between a control simulation using the standard parameters and a control simulation using the OPT1 parameters for a surface air temperature, b ocean surface potential temperate and c precipitation. Hatching shows statistically significant differences at the 95% confidence level  Zunz et al. (2013) and most CMIP5 models, overestimated Antarctic sea ice extent variance so this result is encouraging.
Having established the agreement between optimized model and observations in terms of the overall amount of sea ice, the tuned observable, we now consider regional differences in sea ice concentration. During winter in the Northern Hemisphere, default HadCM3 produces too much ice, particularly around the east coasts of Canada and Greenland, and the Barents and Okhotsk seas (Fig. 8a). The optimized model strongly reduces sea ice formation in  [1981][1982][1983][1984][1985][1986][1987][1988][1989][1990]. HadISST sea ice concentration was interpolated onto the HadCM3 grid. Hatching denotes statistically significant differences calculated from a Students t-test over the 10 year period at the 95% confidence level those areas. The smallest improvement is seen in the Barents Sea, while an anomalously sea ice-free region appears northeast of Svalbard (Fig. 8b). We attribute this to the decline in magnitude of the north-south temperature gradient observed in the control optimized model relative to the standard model (Fig. 6), although internal variability may also contribute.
A Student's t-test reveals significant (p < 5%) differences between the standard model and observations in the central Arctic, with the number of grid cells in the Northern Hemisphere in March which have statistically significant differences from observations increasing from 373 for the standard model to 491 in the optimized model (out of 3504 Northern Hemisphere grid cells). This is due to the decrease in the limiting parameter c max , which biases the model towards lower concentrations in the central Arctic where observed concentrations are much closer to one. For example, along the latitude line of 85° N in March (mean 198185° N in March (mean -1990, HadISST sea ice concentrations are all greater than 95% with one third of these also greater than 99%. The value of c max in the optimized model means that it cannot simulate sea ice concentrations greater than 97%. While more recent models do not use this parameter (McLaren et al. 2006), this issue could be resolved for HadCM3 in future work by a different choice of error function which takes into account differences in concentrations, rather than simply sea ice extent. It should be noted again, however, that there is significant uncertainty in satellite observations (Bunzel et al. 2016). True observed concentrations may be lower due to large numbers of unresolvable leads (Marcq and Weiss 2012) and widely-used satellite datasets differ in their representation of high-concentration grid cells due to differing treatment of melt ponds (Notz 2014).
In the Southern Hemisphere, the optimized simulation does significantly reduce the high bias in sea ice concentration in the Ross and Amundsen seas (Fig. 8d). This mainly occurs because of the increase in ocean-ice heat diffusion; the extreme value of this parameter has the largest impact of of all individual parameter perturbations on the September Antarctic sea ice edge in this area ( Fig. 9). However, the ocean-ice heat diffusion increase simultaneously drives a loss of sea ice around King Haakon VII sea (Fig. 9), which degrades the simulation (Fig. 8d). The reduction in the number of grid cells which are significantly different (p < 5%) from observations between the standard and optimized models (from 172 to 137 in March and from 314 to 286 in September, out of 3504 Southern Hemisphere grid cells) shows that there is an overall improvement. However, both models poorly capture the observed distribution of Antarctic sea ice (Fig. 8c, d), a feature common to a number of the CMIP5 models (Turner et al. 2013). None of the individual parameter perturbations for HadCM3 are able to replicate the shape of the observed sea ice edge, in particular the more extensive area around the King Haakon VII Sea (Fig. 9). This points to uncertainties other than simply that in parameter values (see Discussion), although much of the difference from observations on a regional scale may be due to decadal climate variability.
Finally, we ran the model using the optimized parameters from 1940 under observationally-based forcings.
Tuning parameters to conditions in the early satellite record will not guarantee agreement to observations in other years. The longer spin-up period causes these parameters to perform less well against observations in this experiment even during the tuning period (Fig. 2) compared to the direct optimization output, but still significantly better than the standard model. The mean state from this independent run and root mean square error of 2 million km 2 over 1981-1990 is very similar to that of OPT2, which had a longer spin-up period (Fig. 2). This suggests that a longer spin-up period reduces selection bias in results. The differences in simulated tuning variables between OPT1 and the independent run begun in 1940 are statistically significant (p < 5%) only in the Northern Hemisphere. Only the Northern Hemisphere maximum is additionally outside the range of variability calculated from the standard model, with both OPT2 and the independent run being closer to the standard model in the Northern Hemisphere maximum than OPT1. Examining the intermediate steps in the optimization shows that the Northern Hemisphere maximum took the longest to converge. The Northern Hemisphere maximum may be more difficult to optimize because of the limitations due to surrounding landmasses. It is also likely to have a larger inter-annual correlation than the Southern Hemisphere, as it has a larger proportion of multi-year ice. The improvement in the mean sea ice state due to optimized parameters causes HadCM3 to more skilfully predict sea ice extent up to the present day (Fig. 10). Linear trends from 1980-2014 relative to the 1980-2009 average in all four variables are statistically significant (p < 5%) and are displayed in Fig. 11a-d. The optimized model fails to capture the observed positive trend in the Antarctic, although trends are less negative compared to the standard model (for example, Southern Hemisphere maximum: standard model −2.8%/decade, OPT1 −1.8%/decade and HadISST +2.5%/ decade). This is consistent with the hypothesis that there are significant process-related uncertainties in the Antarctic. Trends in the Northern Hemisphere are more negative in the optimized case (OPT1) than in the standard model, with an improvement towards consistency with observations seen in the Arctic minimum (standard model −5.2%/ decade, OPT1 −9.1%/decade and HadISST −9.8%/decade). The absolute trend in the Arctic minimum over 1980-2011 is −0.71 million km 2 /decade. Comparison to Fig. 3 in Stroeve et al. (2012) reveals that optimization moves the Arctic minimum simulated by HadCM3 to within their calculated observational range, where only 7 of the 18 CMIP5 models examined by that paper lie.
We also examine sea volume and compare time series from the standard and optimized model to PIOMAS reanalysis data in the Arctic (Fig. 12). Given the significant uncertainties in observed sea ice volume, we follow Shu et al. (2015) and show a ±15% error around the PIOMAS data. This reveals that the standard model has a high bias in maximum Northern Hemisphere sea ice volume, which is amplified in the optimized case due to cooler sea surface temperatures (Fig. 6). In the Northern Hemisphere minimum, the PIOMAS reanalysis lies in between the standard and optimized models, and given the large uncertainties in this data, suggests that neither simulation has an unreasonable minimum. The Antarctic minimum is very similar in both the standard and optimized models; the Antarctic maximum is lower in the optimized case but there are no available observations which would confirm whether this is a move towards or away from the observed state. The reduction in Antarctic volume is likely due to the increase in K without a similar decrease in sea surface temperatures. Interestingly, although optimization has worsened the high bias in volume in the Arctic winter, trends in Arctic sea ice volume from the optimized case show a larger decline and therefore agree better with the PIOMAS data than the standard model (Fig. 11e, f). This may be because, once thick ice is established, it is better insulated from the cold ocean beneath and will melt more easily from surface heat fluxes than thinner ice (Hodson et al. 2013), resulting in larger losses. Conversely, Antarctic ice is thinner in the optimized case (Fig. 12), resulting in smaller losses compared to the standard model (Fig. 11g, h). Lastly, we note that sea ice volume takes around two decades to adjust to parameter changes (Fig. 12). If future optimizations were to include sea ice volume as a tuning variable, they would require a longer adjustment period.

Discussion
This paper reports on the application of automated parameter tuning to uncertainty in climate model simulations of sea ice. We find that manipulating poorlyconstrained parameter values within physically reasonable ranges can reduce model error by up to an order of magnitude in decadal mean historical sea ice extent. The improved total sea ice extent in the mean state corresponds to more consistency with observations in longerterm trends, most notably in the Arctic minimum. A key achievement is that these parameter values apply to both hemispheres and both sea ice growth and melt seasons.  Parameter tuning has been criticized for artificially improving models beyond their capabilities (Eisenman et al. 2007), but a manual approach is widely used amongst climate modellers (Mauritsen et al. 2012) and, like Massonnet et al. (2014), we argue that we can justifiably alter poorly-known or poorly-defined parameters to fit modelling needs. Using a 140-year control model experiment, we find no adverse effects of the parameter changes on the model climatology. We note an increase in Northern Hemisphere winter sea ice volume, in disagreement with reanalysis data, but this reduces later in the simulation and sea ice volume decreases faster than in the standard model after 1990.
Examining the spatial distribution of sea ice concentration, we note that, while improvements occur in the Arctic, Antarctic improvements are minimal. The inability to improve some regional aspects may be due to missing or poorly parametrized processes relevant to the Antarctic. HadCM3 does not include ice sheet melt, its coarse resolution is not suited to accurate Southern Ocean representation (Zunz et al. 2013), Antarctic topography for HadCM3 is lacking (Turner et al. 2006) and none of the model's sea ice parametrizations were specifically designed for the Antarctic (Turner et al. 2006). Indeed, our literature review showed that the experimental values behind various parameters are often measured in the Arctic only. In the face of these other issues, only limited improvement from parameter optimization using hemispheric sea ice extent error functions can be expected. We would expect to achieve greater consistency with observations using separate parameter values for each hemisphere, which would further point to deficiency in model processes, as the physics should be independent of hemisphere. We conclude that improvements from global parameter optimization are limited when other structural uncertainties dominate.
The high error reduction found here may not be replicated in other models that have better-constrained parameters or more advanced representation of sea ice processes. Yet even state-of-the-art models contain a large number of parameters and other studies have found improvements in uncoupled runs with advanced sea ice models such as CICE (Kim et al. 2006) and LIM3 (Massonnet et al. 2014). While more efficient than manual model tuning, automated parameter optimization is still computationally costly which limits ability to repeat experiments. The near-replication of resulting parameters in the second optimization experiment suggests that the successful error reduction was not due to chance, but future work should test this more robustly.
We stress that this is the first use of such a method targeting only one climate variable in a coupled climate model and as such there are areas that can be developed further. For sea ice, more advanced metrics than sea ice extent that take into account sea ice concentration differences could be investigated. Recent improvements in sea ice thickness observations mean that this could be used as an optimization metric for near present-day conditions. Similarly, more advanced models simulate variables such as meltpond area (Schroder et al. 2014), which can be observed by satellite, and therefore may provide another tuning variable. In our optimization method, a number of tuning metrics may be used without increasing computational cost. Another area for development would be to take into account observational uncertainty and weight the error function accordingly. Nevertheless, we have shown that automated parameter optimization significantly reduces difference from observations in the sea ice simulation in HadCM3. Our approach could be customized for other poorly-simulated climate observables and other models to investigate whether similar uncertainty reductions are possible.