Optimal experiment design for a bottom friction parameter estimation problem

Calibration with respect to a bottom friction parameter is standard practice within numerical coastal ocean modelling. However, when this parameter is assumed to vary spatially, any calibration approach must address the issue of overfitting. In this work, we derive calibration problems in which the control parameters can be directly constrained by available observations, without overfitting. This is achieved by carefully selecting the ‘experiment design’, which in general encompasses both the observation strategy, and the choice of control parameters (i.e. the spatial variation of the friction field). In this work we focus on the latter, utilising existing observations available within our case study regions. We adapt a technique from the optimal experiment design (OED) literature, utilising model sensitivities computed via an adjoint-capable numerical shallow water model, Thetis. The OED method uses the model sensitivity to estimate the covariance of the estimated parameters corresponding to a given experiment design, without solving the corresponding parameter estimation problem. This facilitates the exploration of a large number of such experiment designs, to find the design producing the tightest parameter constraints. We take the Bristol Channel as a primary case study, using tide gauge data to estimate friction parameters corresponding to a piecewise-constant field. We first demonstrate that the OED framework produces reliable estimates of the parameter covariance, by comparison with results from a Bayesian inference algorithm. We subsequently demonstrate that solving an ‘optimal’ calibration problem leads to good model performance against both calibration and validation data, thus avoiding overfitting.


Introduction
Numerical models of shelf seas and estuaries are utilised within a wide range of application areas, including sediment and pollutant transport (e.g. Periáñez et al. 2013;Chen and Liu 2017;Li et al. 2018), ecosystems and fisheries (e.g. Marshall et al. 2017;Whomersley et al. 2018) and marine renewable energy (e.g. Adcock et al. 2015;Neill et al. 2018;Mackie et al. 2020a;Wang and Yang 2020), as well as underpinning the modelling of coastal hazards including storm surges (Flather 2000). Accurate tidal models are therefore of significant value, yet any model output is subject to a variety of sources of uncertainty.
Within coastal ocean modelling, a significant source of uncertainty is the parameterisation of bottom friction. Friction between the ocean and the sea bed arises due to a boundary layer at the interface, and due to form drag induced by bathymetry undulations. The resulting momentum loss is not explicitly captured in coastal ocean models, but is instead typically incorporated via a parameterised drag term in the governing equations, which relates the frictional force to the water velocity. There are many possible choices of drag parameterisation (Döös et al. 2004;Zhang et al. 2011), each of which introduces a drag parameter. Furthermore, this parameter in principle varies spatially, due to variations in sea bed roughness and hydrodynamic conditions, as well as potential dependence on model mesh resolution given that the parameterisation represents subgrid-scale processes. Even when it can be related to land or sea floor classification using well-established tables, this parameter still carries significant uncertainty (Mayo et al. 2014). The bottom friction coefficient (BFC) may also vary over time, e.g. due to morphological changes at the sea bed (Davies and Robins 2017) or seasonal variations in hydrological conditions (Huybrechts et al. 2021).
However, since direct field measurements of bottom friction are not feasible over large spatial scales, this parameter is a source of significant uncertainty within coastal ocean models.
For this reason, it is common within the coastal ocean modelling literature to perform model calibration with respect to the BFC, whereby its value is inferred from observations of hydrodynamic variables. Commonly used observations for friction parameter estimation include timeseries or harmonic analysis data from tide gauges, current measurements (e.g. from ADCPs), or satellite altimetry. Techniques for performing this calibration include Kalman filters (e.g. Mayo et al. 2014), Bayesian inference approaches (e.g. Hall et al. 2011;Sraj et al. 2014b), and gradient-based methods utilising adjoint models (e.g. Maßmann 2010) or other gradient calculation techniques (e.g. Sraj et al. 2014a). The objective of model calibration is to maximise agreement between model outputs and observations, in such a way that confidence in model outputs at new locations, or outside of the time period where the observation data is available, is increased.
Validation data, which is distinct from the data used for calibration, is typically used to verify that performance is indeed improved at new locations. However, in any parameter estimation exercise, there exists a trade-off between (i) estimating a large number of parameters and thus achieving good agreement with calibration data; and (ii) achieving tight constraints on the estimated parameters, and thus achieving good agreement with validation data. Good agreement with calibration data at the expense of poor agreement with validation data occurs due to overfitting, or the estimation of too many control parameters, which also results in high uncertainty in the estimated parameters. The use of too few control parameters avoids these issues, but results in smaller model improvements.
The overfitting problem is addressed in the literature using a variety of methods. A priori knowledge of the BFC parameter value can be incorporated, for example in the form of a regularisation (or penalty) term within adjoint gradient-based approaches (Maßmann 2010), or as a prior within a Bayesian inference approach. This prior acts as an additional constraint, and suppresses spurious BFC values. An alternative approach is to reduce the dimension of the parameter space, so that fewer parameters are estimated. This can be achieved via use of a piecewise-constant BFC with subdomains selected based on land use (Mayo et al. 2014), bathymetry contours (Sraj et al. 2014b), sedimentological data (Guillou and Thiébot 2016;Mackie et al. 2020b;Warder et al. 2021a) or other more arbitrary subdomains (Lardner et al. 1993). Another approach to dimension reduction is to employ the so-called independent points scheme, where the spatially varying BFC is determined by interpolation between a set of 'independent points' (Lu and Zhang 2006). A similar approach is to use a coarse unstructured mesh representation of the BFC field using a small number of nodes (Ullman and Wilson 1998;Heemink et al. 2002). Aside from the benefits in reducing overfitting, the use of low-dimensional parameter spaces also facilitates the use of a broader set of parameter estimation algorithms, whereas high-dimensional problems can only realistically be tackled using adjoint methods (Lu and Zhang 2006). Note however that the reduction of the parameter space is not a sufficient condition for the avoidance of overfitting, and regularisation is often required in conjunction with parameter space reduction, as in Lardner et al. (1993), Ullman and Wilson (1998) and Heemink et al. (2002), where in the latter case the iterative optimisation algorithm was truncated after a small number of iterations, which can be considered a form of regularisation. The approach presented within this paper can be applied to parameter space reduction based on any of the above methods, but avoids the need for regularisation.
As described above, there exists a variety of choices for the selection of the quantity and spatial distribution of control parameters within BFC estimation, which (along with the observation strategy) constitutes the 'experiment design'. The influence of the control parameter aspect of experiment design on parameter estimation has been discussed by a small number of studies. Using an idealised test case, Das and Lardner (1991) found the empirical result that the number of control parameters which can be estimated from tide gauge data is limited to two times the number of tide gauges (since each tide gauge effectively provides two pieces of data, corresponding to amplitude and phase of the fundamental harmonic). Ullman and Wilson (1998) estimate parameter confidence intervals in order to quantify the significance of estimated BFC values, thus justifying their choice to use ten independent parameters to represent the BFC field. Heemink et al. (2002) perform a preliminary sensitivity analysis in order to select their nodes for a coarse representation of the BFC field, although the choice of experiment design is not the focus of their work. Lu and Zhang (2006) compare two schemes for the selection of control points within an 'independent points scheme', finding that independent points (IPs) distributed according to the bathymetry produce improved performance compared with uniformly distributed IPs.
Despite some discussion in the above literature, experiment design or its optimisation are not typically the focus of parameter estimation studies. Instead, model calibration is typically formulated as finding the set of parameters which minimise the misfit between model outputs and observations under a given set of constraints, with the experiment design taken as fixed. While the BFC estimation literature shows that excellent results can be obtained with this approach (i.e. without exploration of the influence of the experiment design), as stated above this typically relies on appropriate regularisation. It is intuitive that when model outputs are more sensitive to the control parameters, the parameters can be more tightly constrained by the corresponding observations. This is the central idea of this work, in which we use model sensitivity information derived from a numerical adjoint model in order to deduce experiment designs which result in tight constraints on the estimated parameters. We further demonstrate how the solution of the resulting BFC estimation problem results in good model performance against both calibration and validation observation datasets, without the need for regularisation.
The structure of the remainder of this paper is as follows. The primary case study region (consisting of the Bristol Channel and Severn Estuary), model setup and observation data are described in Sect. 2. In Sect. 3 we briefly review the optimal experiment design (OED) literature, and the adaptation of the OED technique we employ within this paper is presented in Sect. 3.3. Verification that the parameter constraints predicted by the OED method are reliable is presented in Sect. 4.1, where we utilise a Bayesian inference algorithm to solve the parameter estimation problem corresponding to a selection of experiment designs, and compare the sensitivity-predicted parameter uncertainties with those returned by the Bayesian inference algorithm. In Sect. 5, we then demonstrate the link between parameter estimate uncertainty and model performance against calibration and validation data. Specifically, we demonstrate that parameter estimation experiments which produce the tightest constraints on the unknown parameters also result in the best performance against validation data. As a further demonstration of the OED method, we present an application to a model of the northwest European continental shelf in Sect. 6. The results are discussed in Sect. 7, and we draw conclusions in Sect. 8.

Model and study region
Within this work, we take the Bristol Channel and Severn Estuary as a primary case study region, although we also present the application of the OED method to a larger model domain in Sect. 6. The Bristol Channel is a macrotidal inlet situated toward the south-east of the UK, and is of significant interest for its tidal range energy resource (Angeloudis and Falconer 2017). It is also the site of the Hinkley Point nuclear power station. Furthermore, accurate models are vital for flood risk studies (e.g. Lyddon et al. 2018). As such, the Bristol Channel is a site of particular interest for the development of accurate numerical models. Since the tidal resonance in the region is known to be influenced by the bottom friction coefficient (Gao and Adcock 2016;Warder et al. 2021a), it constitutes an ideal case study for BFC parameter estimation.

Model description
Thetis is a 3D (Kärnä et al. 2018;Pan et al. 2020) and 2D (Vouriot et al. 2019) finite element based coastal ocean model, built using the finite element code generation framework Firedrake (Rathgeber et al. 2016). Within this work, we use Thetis in its two-dimensional depth-averaged mode, solving the nonlinear shallow water equations in the form ∂η ∂t where η is the surface elevation, u is the depth-averaged velocity vector, H = η + h is the total water depth, h is the bathymetry (measured positive downwards), F C is the Coriolis force, g is the acceleration due to gravity, ρ is the density, ν is the viscosity, and τ b is the bottom stress. Within this work, the bottom stress is parameterised via Manning's n formulation where n is the (spatially varying) Manning coefficient. This coefficient is assigned a uniform value of n = 0.025 sm −1/3 outside the Bristol Channel (selected via preliminary model runs, but found to have little influence on the model results inside the Bristol Channel), while its value inside the Channel is to be inferred via parameter estimation methods; see Fig. 1. Wetting and drying of intertidal regions is handled via the bathymetry modification scheme of Kärnä et al. (2011). This scheme introduces a wetting-drying parameter α, which controls the transition from wet to dry regions of the domain. Smaller values of α produce more accurate results, but there exists a minimum stable value which depends on the bathymetry gradient and mesh resolution at the wet-dry interface. In this work, α is taken as 1 m, which was found by experimentation to be close to the minimum stable value for this model setup.
The unstructured mesh used for the primary case study is shown in Fig. 1, and is based on that used in Mackie et al. (2020b). The mesh was generated on a UTM30 coordinate projection, using the Python package qmesh (version 1.0.2) , which interfaces the mesh generator Gmsh (version 3.0.4) (Geuzaine and Remacle 2009). The coastline data is from the Global Self-consistent, Hierarchical, High-resolution Geography Database (GSHHG) (Wessel and Smith 1996). The mesh resolution varies between 250 m at the innermost part of the Channel, and 8 km at the ocean boundaries, with an intermediate resolution along other parts of the coastline. This results in a total of 42,862 triangular elements. The bathymetry is taken from Digimap (Digimap 2013), which has a resolution of 6 arcseconds. The ocean boundaries are forced with tidal elevations generated based on the dominant constituents (M2 and S2) from the TPXO database (Egbert and Erofeeva 2002). The next most significant constituent (N2) has less than half the amplitude of these primary constituents. While the choice to force with only two constituents neglects any transfer of energy from other tidal constituents, we assume that this forcing is sufficient for the purposes of this study, which represents a proof of concept for our OED framework. A no-normal-flow boundary condition is applied on the coastal boundaries, and river outflow is neglected. The governing equations (1) are solved using the P DG 1 -P DG 1 finite element pair, with a Crank-Nicolson timestepping method and a timestep size of t = 200 s. All model runs within this work follow a generic spin-up period of 10 days with a default uniform friction parameter of n = 0.025 sm −1/3 . Following this initial spin-up, each friction configuration is run for a further 17.77 days, with harmonic analysis (for the M2 and S2 constituents, plus overtides and compound tides) performed on the final 14.77 days, i.e. allowing an additional 3 days of spin-up with the modified BFC. The analysis period of 14.77 days spans a complete spring-neap cycle, and was chosen to satisfy the Rayleigh criterion for resolving the M2 and S2 constituents.

The adjoint model
Numerical adjoint methods are used to efficiently compute the gradient of a model output with respect to its inputs. The adjoint mode of Thetis is generated algorithmically via pyadjoint (Mitusch et al. 2019;Farrell et al. 2013), which interfaces with Firedrake to efficiently derive the adjoint model. Previous applications of the Thetis adjoint model include storm surge sensitivity analysis (Warder et al. 2019(Warder et al. , 2021b and tidal turbine array optimisation (Goss et al. 2020), while in this work we use the adjoint model to compute model sensitivity with respect to the Manning coefficient n. For the reader who is unfamiliar with adjoint methods, we give an outline here based largely on Funke (2012), to which the reader is referred for further detail.
A system of PDEs such as the governing equations (1) may be written compactly in the general form with F the PDE operator representing Eq. (1), u the model state variable (which for our equations represents η and u) and n could in general represent any model input but here we are specifically interested in the Manning coefficient. We denote a scalarvalued quantity of interest J (u), which for our purposes will represent the amplitude or phase of a tidal constituent at a specific location. The adjoint method is used to efficiently compute the derivative of J , with respect to n, i.e. dJ dn . It is straightforward to show that This can further be expressed as where λ is is the so-called adjoint variable, defined as the solution to the adjoint equation where the asterisk denotes the conjugate transpose, or adjoint operation. Note firstly that this adjoint equation is a PDE, with the adjoint variable λ a space-and timedependent function. The adjoint equation is solved in the reverse time direction, and since the inputs n are not time-varying, it is understood that the instantaneous influence of n is integrated over time, to obtain the overall gradient dJ dn .
Note secondly that the adjoint equation is linear, and that the input parameter n does not appear in the equation. These factors contribute to the low computational cost associated with computing gradients by the adjoint method. In particular, the computational cost does not depend on the number of control parameters. When applied within a numerical model, this means that the vector of control parameters can correspond to the value of a parameter at every mesh node. The adjoint method is thus particularly powerful when used to compute the gradient of a model output with respect to a fully spatially varying model input, and it is for this purpose which we employ the method within this work.
The Thetis adjoint runs within this work require approximately 2.4 times the wallclock time of the forward runs.

Observation data
We utilise data from two sources within this study, from tide gauges whose locations are indicated in Fig. 1. The red circles indicate 15 locations where harmonic analysis data is available (National Oceanography Centre, personal communication 2018). At these locations, we utilise M2 and S2 amplitude and phase data, for the purposes of calibration. The yellow circles in Fig. 1 indicate five further locations where timeseries data is made available from the British Oceanographic Data Centre (BODC). This data is used within this study for model validation, and we utilise M2, S2 and M6 amplitude and phase data at these locations. The M6 constituent is an example of a shallow water overtide, whose amplitude and phase are likely to be particularly sensitive to the bottom friction coefficient, thus making the M6 constituent a valuable source of validation data.
While we could have included more harmonic constituents within the modelobservation comparisons, the M2 and S2 constituents dominate the dynamics within the Bristol Channel with typical amplitudes of around 4 m and 1.5 m respectively, compared with around 0.6 m for the next most significant constituent (N2). The OED methodology incurs a computational cost proportional to the volume of data used for the parameter estimation, so we choose not to consider additional constituents and focus only on the M2 and S2 harmonics.
In order to facilitate our OED method, we assume throughout this work that the model-observation errors are independent and identically distributed variables. We assume that these errors have variances of 25 cm 2 for amplitudes and 6.25 •2 for phases. These variances could be included within the parameter estimation as hyperparameters to be inferred from the observations alongside the unknown BFC parameters, and the above values are in fact taken from a study employing this method (Warder et al. 2021a). However, the OED procedure requires these variances to be estimated in advance, so we instead use these fixed values.

Review of OED methods
Any parameter estimation problem requires two components: firstly, a suitable set of observations from which to infer the unknown model parameters, and secondly, an appropriate parameter space within which the unknown parameters lie (e.g. which/how many model parameters to estimate, their spatial distribution, etc.). These two aspects constitute the 'experiment design', the optimisation of which is known as 'optimal experiment design' (OED), and is the topic of this paper. The optimisation of these two aspects is often considered independently.
The optimisation of the observation strategy amounts to selecting which variables to measure, and when and where to measure them, in order to best infer a given set of unknown model parameters. This aspect of OED is common in the literature, with example applications in a variety of fields (Ucinski 2004;Balsa-Canto et al. 2008;Strigul et al. 2009;Yu et al. 2018). The computation of the model's sensitivities with respect to its unknown inputs is central to the OED method, with the motivation being to achieve tight constraints on the estimated parameters (Söderström and Stoica 1989;Machado et al. 2009;Alaña and Theodoropoulos 2011). Within the field of numerical coastal and estuarine modelling, de Brauwere et al. (2009) performed optimal experiment design for the calibration of a reactive transport model, designing a sampling strategy to best constrain mortality and sedimentation rates of E. coli in the Scheldt Estuary using a given number of water samples. Vandenberghe et al. (2002) utilised a similar method to design an optimal sampling strategy in a water quality model of the River Dender. Graham et al. (2017) take an alternative sensitivity-based approach to optimising experiment design for a storm surge case study, in order to determine an optimal configuration of observation buoys with which to infer a given set of unknown friction parameters. A similar concept to optimal experiment design is that of 'targeted observations' for state estimation (e.g. Langland 2005). However, while formal OED methods have been applied to state estimation, the focus of the present study is on parameter estimation.
The second aspect of experiment design is the selection of the parameters to be estimated. This aspect is the focus of the present study since, as is the case in many similar coastal regions, a reasonable set of observation data already exists, while there is no obvious choice for the control parameters. A similar and commonly approached problem in the OED literature is the selection of a subset of model parameters for estimation; this has often been solved via sensitivity-based OED methods (Machado et al. 2009;Chu and Hahn 2009;Kravaris et al. 2013;Li et al. 2004;Wang et al. 2018).
In this work, we focus on BFC estimation, since this is commonly undertaken in the literature, and is particularly relevant to our chosen case study region of the Bristol Channel, since the tidal resonance in the region is sensitive to the bottom friction (Gao and Adcock 2016;Warder et al. 2021a). We aim to determine a BFC field which is piecewise-constant in space within the Bristol Channel, such that the field is specified by a vector of length m. The objective is to determine the optimal number of parameters m, and the optimal mapping from these parameters onto the model domain, such that tight constraints on the parameters can be achieved via a parameter estimation exercise using the observations described in Sect. 2.2. The choice of this mapping and number of parameters constitutes the experiment design. We emphasise that this problem is distinct from conventional applications of OED to parameter selection, where an experiment design typically consists of a selection of a subset of parameters, with unselected parameters held constant (e.g. Li et al. 2004;Chu and Hahn 2007;Machado et al. 2009).
We first apply an a priori constraint on the friction coefficient, by splitting the model domain into subdomains, within each of which the friction coefficient is assumed to be constant; this is further described in Sect. 3.2. Having established a suitable set of subdomains, the remaining space of experiment designs can be explored to determine the optimal design. Each design corresponds to a grouping of the a priori subdomains into m groups, such that the friction coefficient is specified by m parameters. In order to rank the experiment designs, we compare the parameter covariance matrix which results from the use of each design, utilising an adjoint sensitivity-based method to estimate this covariance matrix without solving the parameter estimation problem. This approach is described in detail in Sect. 3.3.

A priori subdomain parameterisation
We divide the domain into slices as shown in Fig. 2. This division is somewhat arbitrary; the slices were chosen to divide the Bristol Channel fairly evenly, with slightly smaller sections further into the estuary where the density of observation locations is higher. The possible experiment designs are further constrained such that the slices are grouped into contiguous blocks (i.e. region 1 and region 3 cannot be assigned the same friction parameter unless region 2 is also included).
Each possible experiment design corresponds to a grouping of the selected subdomains, according to this a priori constraint. Given that there are eight subdomains, for a parameterisation with m = 2 (i.e. a grouping of the subdomains into two groups), there are seven possible designs. For m = 3, there are 21 possible designs, and for m = 4 there are 35.
The choice to consider eight subdomains is somewhat arbitrary. However, the modest number of possible designs (as enumerated above) facilitates the use of an exhaustive search algorithm to find the optimal design. Note however that the presented method is not restricted to such strong constraints, and may also be applied to other a priori design choices, such as piecewise-linear BFC fields.
We note also that other approaches to dividing up the model domain are taken in the literature, including the use of bathymetry contours (e.g. Sraj et al. 2014b), or sedimentological data (e.g. Guillou and Thiébot 2016;Mackie et al. 2020b;Warder et al. 2021a). In particular, Warder et al. (2021a) utilised sedimentological data within a model of the Bristol Channel and Severn Estuary, although the results suggested at some limitations to the applicability of such data for constraining the friction parameter. The OED method presented here is very general and could be applied to the selection of designs utilising bathymetry contours or sedimentological data. However, our choice to divide the domain by slices as described above makes for a simple expo- sition, and requires minimal assumptions about the spatial variation of the friction coefficient, highlighting the value of the OED approach to applications where prior information may be limited.

The OED method
For reviews of OED methods for parameter selection, the reader is referred to works from other fields, e.g. Machado et al. (2009), Kravaris et al. (2013). However, note that the problem considered here is subtly different to the conventional parameter selection problem, where the objective is typically to select an optimal subset from a large set of parameters, with unselected parameters taking prescribed values. In this work, we seek an experiment design in which the friction parameter is specified everywhere within a given region; each experiment design corresponds to a grouping of model subdomains, not a subset selection. This is perhaps more similar to the clustering of parameters considered by Chu and Hahn (2009), although we take a different approach. The following is a simple exposition of the OED method.
Consider a parameter estimation problem where a vector of unknown parameters θ is to be estimated by minimising a misfit functional given by with respect to θ , where y and f(θ ) are vectors of observed and modelled quantities respectively, and cov e is the observation error covariance matrix. The parameter covariance matrix resulting from the minimisation of equation (7) can be estimated as where J is the Jacobian matrix with elements given by It is common within the OED literature to utilise a finite difference method to compute the Jacobian matrix. In this work we instead use the adjoint mode of the numerical model; this is described in Sect. 3.3.3. Note that the parameter covariance given by Eq. (8) does not depend on the observations y, but only on their error covariance. However, it implicitly depends on the number, location and timing of the observations, and the choice of control parameters θ . For a given experiment design, the parameter covariance can therefore be estimated based only on the model sensitivities and the observation covariance.
As a single scalar measure of the predicted covariance matrix cov p , we define a criterion Σ as det cov p is the volume of the confidence ellipsoid of the estimated parameters. The reason for the inverse power of m is that the criterion Σ then has the units of the parameter variance (s 2 m −2/3 ), independent of the number of parameters being estimated (m). Σ is a measure of the geometric mean estimated parameter variance. Its value can therefore be directly compared between experiment designs with different numbers of parameters. In the OED literature, minimising such a criterion is referred to as D-optimal design.

A note on nonlinearity
The above exposition is only strictly valid when the model is linear. If the model is nonlinear with respect to the unknown parameters (as is the case within this work, since the numerical tidal model is nonlinear), Eq. (8) is not exact, but the right-handside is nevertheless a measure of the maximum achievable parameter precision (de Brauwere et al. 2009). In this case, Eq. (8) is considered to give a lower bound on the parameter covariance (Alaña and Theodoropoulos 2011). That is, where the inequality is understood to mean that (cov p − (J T cov −1 e J ) −1 ) is positive semidefinite (Söderström and Stoica 1989;Emery and Nenarokomov 1998;Machado et al. 2009).
Nonlinearity also means that the Jacobian, and therefore the estimated cov p , are functions of the unknown parameters. The minimisation of the criterion Σ as defined above therefore identifies only a 'locally optimal' experiment design (Ford et al. 1989;Huan and Marzouk 2013). That is, the optimal experiment design is optimal only within a region of the parameter space local to the initial parameter estimate, where the model response is approximately linear. This issue can be overcome with sequential or iterative design (Blanchet et al. 2008;Catania and Paladino 2009), or 'maximin' optimisation (Pepelyshev et al. 2004;Rojas et al. 2007;Sun 2007;Ushijima and Yeh 2015), but such designs are outside the scope of this study. Instead, it is assumed that an initial estimate for the friction coefficient of n = 0.025 sm −1/3 is appropriate, and that the model response to perturbations in the friction coefficient from this value can be assumed to be linear. The results presented in Sect. 4.1 suggest that this is an acceptable assumption, and that the bound provided by Eq. (11) is reasonably tight.

Defining a 'good' experiment design
It is useful to consider the values of the criterion Σ for which the corresponding experiment design produces acceptable constraints on the estimated parameters. This is particularly important when comparing experiment designs with different numbers of unknown parameters. In the context of estimating Manning coefficients, where typical values used in the literature vary between 0.01-0.05 sm −1/3 , the experience of the authors suggests that a 'good' standard deviation in an estimated parameter might be 10 −3 sm −1/3 , i.e. a variance of 10 −6 s 2 m −2/3 . We therefore seek experiment designs satisfying Σ < 10 −6 s 2 m −2/3 .
This threshold value is somewhat arbitrary, although we find in Sect. 5 that this value does appear to be appropriate. Due to the inequality in Eq. (11), an experiment design satisfying Eq. (12) is not guaranteed to achieve the target parameter estimate precision. However, we show in Sect. 4.1 that the bound provided by Eq. (11) appears to be reasonably tight. We emphasise that the value of the OED method is that the Σ criterion can be computed in advance of the parameter estimation procedure, and therefore Eq. (12) provides a useful a priori indicator of experiment design performance.

Construction of the Jacobian via the adjoint model
The crux of the OED method is the construction of the Jacobian (Eq. 9), which consists of the sensitivity of each model output with respect to each unknown parameter. For the Bristol Channel case study, the model outputs used for calibration are the M2 and S2 harmonic amplitudes and phases at 15 locations within the domain, and the model inputs correspond to the set of friction parameters, denoted θ . The difficulty in computing these sensitivities for an arbitrary experiment design arises from the fact that the mapping from the friction parameters onto the model domain is unique to each experiment design. Here, we use the numerical adjoint model to overcome this issue.
An arbitrary spatially varying Manning coefficient field can be represented by a vector n of length N , with the elements of the vector corresponding to the value of the Manning coefficient at each mesh node, and N the number of mesh nodes within the region of variable friction. For a given experiment design, denoted D, we have where θ is the parameter vector of length m. The experiment design D is an N × m matrix, which maps the parameters θ onto the mesh nodes. First, we use the adjoint model to compute the Jacobian with respect to n, given by This requires one adjoint model run for each observation i, i.e. for each row of the above matrix. For the Bristol Channel case study there are 60 observations (M2 and S2 amplitude and phase at 15 locations), so the computation of this Jacobian requires 60 adjoint runs. The Jacobian matrix with respect to the parameter vector θ can then be computed as where the ∂ f i ∂n k term has been computed via the set of adjoint model runs as described above.
Note that, for a given model setup (mesh, etc.), the adjoint model runs are a one-off computational overhead, and the Jacobian with respect to the parameter vector can be computed via Eq. (15) at negligible additional computational cost, for an arbitrary design D. The computational cost of each adjoint model run is on the same order of magnitude as that of a forward model run. Note that the common approach in the literature to construct Jacobians for OED is to use a finite difference method, based on running the forward model with perturbed values for each model input. While such an approach would be feasible within this study due the relatively strong a priori constraints described in Sect. 3.2, it would rapidly become infeasible as a greater space of possible experiment designs is considered. In the most general case, the forward differences approach would require N + 1 forward model runs, where N is O(10, 000) for the mesh used in this work; this is clearly not computationally feasible. The use of an adjoint model is a prerequisite for the extension of this OED method to more complex experiment design spaces, and we therefore take the adjoint approach here.

Optimisation and interpretation
As described in Sect. 3.2, there are only seven possible designs for m = 2, 21 for m = 3 and 35 for m = 4. This facilitates the use of an exhaustive search of the experiment designs, to find the one which produces the optimal value for the criterion Σ, for a given value of m.
The selection of the optimal value for m can be made by comparing the Σ values with the threshold value described in Sect. 3.3.2. The optimal value for m is the largest value which can produce a Σ value below the target variance. This is because a larger m is likely to lead to better model performance as measured against the calibration data, while a set of estimated parameters with sufficiently small covariance (i.e. designs satisfying the Σ criterion threshold) is unlikely to have overfitted, and should therefore also produce the best model performance as measured against validation data.

OED results
In Sect. 4.1, we verify that the Σ criterion, computed via the adjoint-derived sensitivities as described above, provides a reliable indicator of experiment design performance, as measured by the parameter covariance. Section 4.2 then summarises the optimal designs returned by the OED framework, as a function of the number of parameters m.

Verification of the 6 criterion as an estimate of parameter uncertainty
The effect of the inequality in Eq. (11), which represents the nonlinearity of the system, is that the OED approach is expected to underestimate the true covariance achieved using a given experiment design. Here, the aim is to investigate the significance of this inequality, i.e. to verify whether the approach is suitable for characterising experiment design performance within the friction parameter estimation context. We proceed by utilising a Bayesian inference framework to solve the parameter estimation problem corresponding to a selection of experiment designs, computing the resulting parameter covariance matrix for each. We can then compare the Σ criterion with the equivalent measure of this parameter covariance matrix, given by det(cov p ) 1/m .
If the inequality of Eq. (11) were replaced with an equality, the expression given by Eq. (16) would exactly equal the Σ criterion. By comparing Eq. (16) with Σ, we can therefore investigate the significance of the inequality in Eq. (11) in characterising experiment designs. While the OED approach described in this paper is applicable to a variety of parameter estimation procedures, here we choose to use a Bayesian inference approach via a Markov Chain Monte Carlo (MCMC) algorithm to solve selected parameter estimation problems.
We choose this approach because it is a straightforward and well-studied method, and it yields a direct estimate of the parameter covariance matrix, which is required for the present validation of the OED method. Details of the MCMC algorithm, which utilises a Gaussian process emulator as a surrogate for the full numerical model, can be found in "Appendix A", and here we simply summarise the results.
For a selection of experiment designs, Table 1 presents values of both the Σ criterion and Eq. (16) resulting from the MCMC algorithm. As expected due to the inequality in Eq. (11), we find that the det(cov p ) 1/m values are larger than the Σ values in almost all  cases (the exception being the m = 1 case, which we attribute to the other consequences of nonlinearity described in Sect. 3.3.1.). However, the det(cov p ) 1/m values are all reasonably close to the estimated bound provided by Σ, which therefore appears to be a good estimate of the parameter uncertainty obtained with a given experiment design.

Exploration of optimal designs
Having verified in Sect. 4.1 that the Σ criterion constitutes a good metric of experiment design performance, here we explore the full space of possible experiment designs. Figure 3 shows the Σ criterion values returned by all possible experiment designs, as a function of the number of parameters in the design, m. The optimal designs for m = 2, 3 and 4 are shown in Fig. 4. We make the following observations: (i) The minimum achievable values of the Σ criterion increase as the number of unknown parameters m increases. This is expected, since the information provided by the observation data remains constant, but is used to attempt to constrain an increasing number of parameters. (ii) We find that it is possible to achieve the target criterion (10 −6 s 2 m −2/3 ) when grouping the friction subdomains into two or three groups (m = 2 or 3), but no four-parameter designs meet the threshold Σ criterion. For this threshold value, the optimal number of parameters is therefore m = 3. This emphasises the value of an OED procedure, since a poor choice of experiment design can be wasteful with the information provided by the observations, and it may be possible to achieve a tighter constraint on a larger number of parameters by optimising the experiment design. (iv) As shown in Fig. 4, the optimal designs do not always split the domain into regions with either equal area, or enclosing equal numbers of gauge locations, as might have been expected. This implies that the information provided by a given tide gauge about the BFC field is not restricted to the local neighbourhood of the tide gauge, and also that there is some redundancy in the information provided by the set of tide gauges.

Model performance
Here we present results from running the numerical model, after calibration using a variety of experiment designs. For each experiment design, the calibration has been performed using the Bayesian inference approach described in "Appendix A". The objective of this section is to explore the relative performance against the calibration and validation datasets, and how this depends on the experiment design. Four of the experiment designs we use are shown in Fig. 4, corresponding to the optimal designs with m = 2, 3 and 4, and Fig. 5, which is the median m = 3 experiment design (11th best of the 21 possible designs). For comparison, we also show results from a uniform BFC parameter (m = 1), and the m = 8 case (there is only one design for m = 8). The optimal parameters for each design are summarised in Table 2. For the m = 1, 2, 3, 4 designs, the BFC values are in the range of typical values for Manning coefficients.
In the m = 8 case, some of the values are very high or very low, due to the poor constraints provided by the observation data.
The performance of the model with the optimal parameters for each experiment design, measured against the calibration dataset, is summarised in Table 3. For m = 1, 2, 3, 4, the improved performance as m increases is driven by the M2 amplitude, with the other three RMSEs showing little change with each experiment design. Perhaps surprisingly, the M2 amplitude RMSE for the median m = 3 design is lower than for the optimal m = 3 design. The m = 8 case performs best overall, measured against the calibration data. Table 4 shows a similar set of RMSEs, but here the model is compared with the set of validation data (the yellow circles in Fig. 1), and we additionally use the M6 constituent, since this shallow water overtide will be particularly sensitive to the BFC. This validation data was not used within the OED procedure, or the calibration. The result from the optimal m = 3 design performs best overall, since all six of its RMSEs are within 0.2 cm or 0.1 • of the lowest achieved. Note that the median m = 3 design results in greater errors for all but one of the RMSEs compared with the optimal m = 3 design, despite achieving lower RMSEs against the calibration data. This demonstrates the importance of the spatial configuration of the experiment design, even when comparing different approaches with the same number of control parameters.
The m = 8 case, which performed best against the calibration data, performs worst against the validation data, due to overfitting. The optimal m = 4 design also shows signs of overfitting; it outperformed the optimal m = 1, 2, 3 designs on the calibration dataset, but although it performs well for M2 amplitude against the validation data, it exhibits significantly greater S2 amplitude and phase RMSEs than the optimal m = 3 design, and also performs worse for the M6 constituent. This suggests that the threshold value for the Σ criterion selected in Sect. 3.3.2 is well placed; the results of Sect. 4.2 Table 2 Optimal parameters for each experiment design  Table 3 Root mean squared errors (RMSEs) of the modelled M2 and S2 amplitudes (α) and phases (φ), for the optimal BFC field with each experiment design, aggregated across the calibration tide gauges (red circles in Fig. 1) Uniform BFC (m = 1) 5.5 2.6 6.5 3.0 Optimal m = 2 4.8 2.6 6.5 3.1 Optimal m = 3 4.5 2.6 6.5 3.1 Optimal m = 4 3.7 2.6 6.6 3.0 11th best m = 3 3.6 2.6 6.6 3.1 m = 8 3.8 2.4 6.4 2.5 Table 4 Root mean squared errors (RMSEs) of the modelled M2, S2 and M6 amplitudes (α) and phases (φ), for the optimal BFC field with each experiment design, aggregated across the validation tide gauges (yellow circles in Fig. 1) indicate that no designs with m = 4 achieve the target value for the Σ criterion, and here we have found that even the optimal m = 4 design suffers from overfitting.

Application of OED framework to a model of the northwest European continental shelf
In this section, we present the application of the OED framework to another model domain in order to further demonstrate the method. We choose to use a model of the northwest European continental shelf. The model mesh is shown in Fig. 6a, which also indicates 39 locations within the UK Tide Gauge Network, from which we take observation data for this case study. We force the model at its ocean boundaries with the M2, S2, K1 and O1 tidal constituents. Each model run spans approximately 50 days, with the final 14.77 days used for harmonic analysis of model outputs. As described in Sect. 3, the OED framework proceeds as follows: 1. Define the observation dataset. We use the M2 and S2 amplitude and phase observations at each of the 39 tide gauges indicated in Fig. 6, resulting in a total of 156 data points. 2. Select a model setup with some suitable initial estimate for the Manning coefficient (we use a uniform value of n = 0.025 sm −1/3 ). Using the adjoint model, compute the gradient of each model output (corresponding to the 156 observations) with respect to the fully spatially varying bottom friction parameter. This requires a total of 156 adjoint model runs. 3. Propose a decomposition of the model domain into small regions, within which the bottom friction coefficient will always be taken as uniform.
Here we divide our model domain into 14 such subdomains, as indicated in Fig. 6b. We provide a further constraint that groupings of these subdomains must produce contiguous blocks. For a grouping into m = 5 parameters, this results in 10,645 possible experiment designs. 4. For each of the possible experiment designs, use the adjoint-derived gradients to compute the Σ criterion. Find the grouping which minimises Σ. The optimal experiment design, following the above schema for a grouping into m = 5 parameters, is shown in Fig. 7. The value of the Σ criterion for this design is 2.3 × 10 −7 , and this design therefore satisfies the threshold criterion proposed in Sect. 3.3.2. Note the large variation in the size of the parameter regions, and the number of tide gauges included within each. In particular, we find that the Irish Sea (region B) and the Bristol Channel (region D) are assigned their own respective friction subdomains. The southern North Sea also features a subdomain of relatively small area (region C). We note that each of these regions exhibits fairly complex tidal dynamics. The southern North Sea region, for example, includes the Humber and Thames estuaries, and the Wash; these regions have been shown in a recent storm surge model sensitivity study to interact with the southerly-propagating tidal Kelvin wave (Warder et al. 2021b). Meanwhile, the optimal design assigns a single friction subdomain to the entire northern half of the model domain (region A), with this subdomain covering the largest area and containing the greatest number of observation locations. The dynamics in this region are perhaps more strongly dictated by the tidal boundary condition, leading to reduced sensitivity to the friction parameter. Alternatively, the relative lack of complexity in the tidal dynamics in the region may mean that there is a high degree of redundancy in the information provided by each tide gauge in the region, in contrast to the regions of greater complexity where each tide gauge is more likely to provide distinct information to its close neighbours. Consistent with intuition, the OED approach is unlikely to assign BFC subdomains in regions where there is no observation data, and more likely to assign subdomains in regions of more complex dynamics, and/or a greater density of observation data points. The use of further observation data, e.g. from tide gauges on the continental European coastlines, may result in further division of the North Sea. Table 5 summarises the optimal BFC parameters obtained by solving the parameter estimation problem corresponding to the experiment design shown in Fig. 7, using the Bayesian inference approach. The value of det(cov p ) 1/m computed from the Bayesian inference results is 5.9×10 −7 s 2 m −2/3 . As expected, this is greater than, but of similar magnitude to, the Σ criterion for this design. Table 6 summarises the performance of the model, before and after calibration, as measured against amplitude and phase data for the M2, S2, K1 and O1 constituents. Note that only the M2 and S2 data were used for the OED and calibration procedures, and the K1 and O1 constituents are included for model validation. The M2 and S2 RMSEs all decrease substantially after calibration, as would be expected. The K1 constituent is also better represented in the calibrated model, although model performance for the O1 constituent is slightly worse after calibration.
In order to avoid the exaggeration of phase RMSEs for constituents (such as K1 and O1) whose amplitudes are often small, an alternative definition of the RMSE is given for a constituent C by  Table 6 RMSEs against calibration and validation data, with a uniform Manning coefficient n = 0.025 sm −1/3 , and with the optimal parameters from the optimal m = 5 experiment design n = 0.025 sm −1/3 Calibrated using m = 5 design M2 amp ( where N is the number of gauges, A Ci and φ Ci denote the modelled harmonic amplitude and phase of constituent C at gauge i, respectively, andÂ Ci andφ Ci are the corresponding observed amplitude and phase (Cummins and Oey 1997). The values for this aggregated amplitude-phase RMSE are summarised in Table 7 for each constituent, before and after calibration. From these results, we see that the improvement in model performance for the K1 constituent outweighs the degradation of the modelled O1 constituent. Overall, therefore, the model performs well against both calibration and validation data.

Discussion
We emphasise that the OED method is distinct from, and more useful than, the selection of experiment designs motivated only by minimising the misfit between model outputs and observations. Such a design would favour the use of a large number of tuning parameters, which can produce low model-observation misfits while uncertainty in the estimated parameters (and therefore in model outputs at new locations) remains high. We also found in Sect. 5 that the optimal m = 3 design is actually outperformed by the median m = 3 design when measuring against the calibration data, but performs better when measured against the validation dataset.
Our results indicate that the number of friction parameters that can be well constrained by the observations is typically much smaller than the number of observations. This is in contrast to the findings of Das and Lardner (1991), who suggested that the number of parameters which can be estimated is up to two times the number of tide gauges. This is due in part to the ratio between the (assumed) observation error variance and the target parameter estimate covariance (and the relationship between them via the model), but also due to redundancy in the information contained in the observations. This is particularly evident in the application to the northwest European continental shelf. Das and Lardner (1991) also neglected observation uncertainty and the presence of other modelling errors, since they used only twin experiments within an idealised test case. In other words, a large quantity of data does not justify the estimation of a large number of model parameters, and the methods proposed within this work offer a rigorous technique for the selection of an appropriate set of model parameters for estimation from a given set of observations.
Throughout this study we have used separate sets of observation data for calibration and validation. In the Bristol Channel application, we used the same tidal constituents for both datasets, but at different locations. Here, it was implicitly assumed that the calibration and validation gauge locations are sufficiently far from each other that the data sets are independent. Were this not the case, then any evidence of overfitting would occur at a greater value for m than otherwise. The use of further harmonic constituents could be used in future work as further validation. For the application to the continental shelf model, we used additional tidal constituents for validation, at the same locations as the calibration data. Again, future work could use validation datasets from both additional tide gauge locations, and the inclusion of further tidal constituents.
It is useful to compare the computational cost associated with the OED method presented here with the cost of a 'trial and error' approach where the parameter estimation problem is solved for a selection of candidate experiment designs. The 60 pairs of forward and adjoint runs required for the Bristol Channel case study incurred a computational cost of around 200 forward model runs. In contrast, the Bayesian inference algorithm (via a Gaussian process emulator) required 30, 40 and 50 model runs of the full numerical model for two-, three-and four-parameter experiment designs, respectively. The computational cost of the adjoint-based OED framework is therefore similar to that of performing just four or five individual parameter estimation experiments. Note also that this cost scales well with the space of experiment designs, as demonstrated by the application to the continental shelf, where the number of possible experiment designs for m = 5 was 10,645. The adjoint-based OED approach is therefore an efficient method for obtaining the optimal design.
Nevertheless, the question of whether user experience may be sufficient to identify suitable experiment designs without the need for an OED method was not investigated in this study. As an alternative to an OED approach, additional physical information may be valuable in constraining the spatial variation of BFCs, such as land use (Mayo et al. 2014), bathymetry contours (Sraj et al. 2014b), or sedimentological data (Guillou and Thiébot 2016;Mackie et al. 2020b;Warder et al. 2021a). The OED approach is likely to be particularly useful when such data is unavailable, or observation data is particularly scarce and an appropriate number of parameters to estimate is unclear. Regarding the use of sedimentological data, Warder et al. (2021a) employed an identical model domain for the Bristol Channel and Severn Estuary as used here, finding that sedimentological data can provide a useful a priori constraint on the spatial variation of the BFC, but that theoretically derived BFC values based on sediment types were not helpful. This is likely due to the presence of a large number of other sources of modelling error, which could well weaken the connection between the 'optimal' BFC values and the physical processes at the sea bed. In general, the spatial distribution of sediments on the sea bed varies on smaller length scales than the a priori subdomains selected within this study, and meaningful comparisons between the optimal parameters from this study, and the spatial distribution of sediments, are therefore not possible. Future work could consider the incorporation of sedimentological data (or bathymetry contours, land use etc.) within the a priori experiment design space.
We emphasise that a model calibration process, including the optimal selection of the experiment design, is always specific to a particular model configuration, and will always compensate for multiple sources of model error, for example modelling choices (3D or depth-averaged, barotropic or baroclinic), discretisation errors, finite mesh resolution (which relates to both discretisation errors and the representation of coastlines and bathymetry), or imperfect input data (e.g. bathymetry, tidal boundary conditions). The implicit correction for these other sources of model error may explain the fact that, in the Bristol Channel application, performance as measured using the M6 constituent is optimal when using a uniform BFC (m = 1); since this overtide constituent is particularly sensitive to frictional effects, overfitting is particularly likely. However, our modelling choices are appropriate to the spatial scales and geographical regions used as case studies within this paper, and are typical of many studies in the literature, including several utilising the same depth-averaged Thetis model as we use here Harcourt et al. 2019;Baker et al. 2020;Mackie et al. 2020a). We have also neglected any temporal changes in the bottom friction effect. By performing calibration based on modelled and observed harmonic constituents, we have neglected any variation within the spring-neap cycle, and have also assumed that meteorological conditions are calm, i.e. that there is no surge generated either internally or externally to the model domain. Frictional effects may also undergo seasonal variation (Huybrechts et al. 2021). It is left to future work to assess the extent to which the optimal experiment design, or the corresponding optimal BFC values, vary temporally in these ways.
An additional aspect relating to the computational cost of our OED approach is the search algorithm used for finding the optimum within the space of possible experiment designs. Here, the selection of fairly strong a priori constraints, resulting in a fairly modest space of possible designs, ensured that an exhaustive search algorithm was feasible. However, the exploration of larger experiment design spaces will be considered in future work (in order to fully exploit the advantages of the adjoint-based approach as described above) and may require alternative optimisation approaches, such as genetic algorithms (e.g. Chu and Hahn 2007), which have been used more commonly in the related problem of OED with respect to the set of observations (e.g. Catania and Paladino 2009).
The use of the OED framework can also be compared with more traditional approaches in the literature, where low-dimensional parameterisations are selected less rigorously. In such applications, regularisation (via a penalty term added to the misfit functional) is typically required in order to avoid overfitting. The use of the OED framework presented within this paper avoids this issue entirely, and can be considered as an alternative to regularisation. The regularisation approach requires the selection of one or more regularisation parameters, controlling the magnitude of the penalty term in the functional. A cross-validation method for regularisation parameter selection, such as in Ullman and Wilson (1998), not only requires a second observation dataset for the cross-validation (whereas the OED framework in principle does not), but also requires the repeated optimisation of the model with respect to the control parameters, for each possible choice of regularisation parameter, thus increasing the computational cost. The construction and solution of the OED problem within our Bristol Channel case study required 60 adjoint model runs (and note that these model runs can be performed simultaneously, i.e. the solution of the OED problem is "embarrassingly parallel"). The experience of the authors suggests that a typical adjoint gradient-based approach to model calibration might require tens (or even hundreds) of iterations to converge, and must be run several times with different regularisation parameter values, for cross-validation. The OED approach is therefore competitive in terms of computational cost.
There are a number of avenues for the further application of OED methods within coastal ocean model parameter estimation. Firstly, we chose within this work to construct the experiment designs based on a simple piecewise-constant representation of the BFC. As noted in Sect. 3.2, alternative choices for selecting the piecewise-constant subdomains are possible, while other approaches include the selection of 'independent points', between which the parameter is determined by interpolation. The OED framework can be applied to these cases straightforwardly. Secondly, the definition of the Σ criterion within this work is similar to the so-called D-criterion within the OED literature. This approach was found to produce good results, but other design criteria are possible. For example, Machado et al. (2009) combine a normalised Dcriterion with a modified E-criterion which characterises the shape of the parameter estimate confidence region, thus favouring designs producing similar constraints on all unknown parameters (i.e. a more spherical parameter confidence ellipsoid). Thirdly, a complementary application of OED within friction parameter estimation would be to identify an optimal observation strategy for constraining a given set of friction parameters. Integrated approaches to simultaneously optimise the observation strategy and the parameter space are also possible (Chu and Hahn 2008), and may be valuable in coastal ocean applications. Finally, this study has considered only the estimation of uncertain bottom friction parameters. Although this is a common parameter for calibration within the numerical coastal ocean modelling literature, parameter estimation methods can also be applied to other model inputs including bathymetry (e.g. Mourre et al. 2004), boundary conditions (e.g. Chen et al. 2014), or combinations of multiple inputs simultaneously (e.g. Heemink et al. 2002). The OED framework presented here is highly general and can be readily applied to these other sources of uncertainty. Note also that the computational cost of the OED methodology would not increase if additional model inputs were included. A greater variety of observation data (e.g. current observations from ADCPs), would likely increase the number of identifiable parameters for BFC estimation, and would also be particularly important for the estimation of other model inputs such as bathymetry.

Conclusion
In this study, we have adapted a technique from the OED literature, and applied it to a numerical tidal model of the Bristol Channel and Severn Estuary, where we have used harmonic analysis data at 15 tide gauge locations within the domain for the estimation of the bottom friction parameter. We have demonstrated the application of our OED approach to selecting an optimal set of control parameters, specifying a piecewiseconstant Manning coefficient field. The method is based on model sensitivities with respect to the bottom friction coefficient, computed using a numerical adjoint model. The design optimality criterion we proposed was based on achieving the tightest possible constraints on the estimated parameters, given the observation data. Our results show that the constraints predicted by the sensitivity analysis technique are a good approximation to the true constraints achieved via a Bayesian inference algorithm. Utilising a validation dataset from a further five tide gauges excluded from the calibration dataset, we have also demonstrated that the use of optimal designs leads to good performance on both calibration and validation data, and therefore avoids overfitting, without the need for regularisation.
The application of our OED framework to the Bristol Channel case study revealed that only three control parameters may be constrained by the observation data, far fewer than the number of observations. This is due to redundancy in the observation dataset, the presence of other modelling errors, and the assumed uncertainty in the observations themselves. These factors were also evident in our secondary case study using a model of the northwest European continental shelf, where we estimated five BFC parameters using observations from 39 tide gauges around the UK.
By utilising a numerical adjoint model for gradient computation, the presented OED framework constitutes a computationally efficient method for identifying the optimal experiment design, compared with a trial-and-error search. The experience of the authors also suggests that the method is competitive with gradient-based optimisation approaches using regularisation. While user experience is often sufficient to identify suitable experiment designs in practice, our OED method may be particularly useful in case studies with more complex dynamics, or scarce observation data, since in this case it is essential to extract the greatest volume of information from available observations. Such applications will be the subject of future work.

A Bayesian inference
Bayesian inference is a powerful statistical technique for solving inverse problems, and has been applied to bottom friction parameter estimation previously (Hall et al. 2011;Sraj et al. 2014b). Within this work, we use Bayesian inference to solve parameter estimation problems for a selected number of experiment designs. Our Bayesian inference framework proceeds via a Markov Chain Monte Carlo (MCMC) method, using a Gaussian process emulator (GPE) as a surrogate for the full numerical model, since the method relies on a large number of model evaluations. Section A.1 describes Gaussian process emulation, and section A.2 details the MCMC algorithm.

A.1 Gaussian process emulation
This exposition follows (Rasmussen 2003), to which the reader is referred for further detail. The crux of Gaussian process emulation is that, under the assumption that model outputs follow a multivariate Gaussian distribution, a vector of 'test' model outputs f * for model inputs X * can be predicted from 'training' model outputs f computed for model inputs X. The model outputs satisfy the conditional distribution where μ denotes the mean function, Σ the covariance matrix for the training points, Σ * * the covariance matrix for the test points, and Σ * the covariance between the training and test points. The covariance matrices are typically parameterised by Σ = k(X, X); Σ * = k(X, X * ); Σ * * = k(X * , X * ), where k(x, x ) is a covariance function, with a common choice given by where σ 2 is a covariance parameter and C(x, x ) a parameterised correlation function (in this work we use a Matérn 5/2 kernel). Similarly, the mean function μ is typically taken as a simple linear or quadratic function of the model inputs. The parameters introduced by the mean and covariance functions can be determined via maximum likelihood estimation using the training data. Once these parameters are determined, equation (18) can be evaluated at low computational cost, and the mean of the conditional distribution estimates model outputs for unseen values of the model inputs. The generalisation to higher dimensions and multiple model outputs is straightforward.
Within this work, we use the Python package GPy (GPy 2012); the reader is referred to the package documentation for further implementation detail.
For the application to Bayesian inference for the Bristol Channel case study (and similarly for the European continental shelf), the model inputs are the vector of unknown parameters θ for a given experiment design, and the model outputs are the M2 and S2 amplitudes and phases at each of the 15 tide gauge locations (a total of 60 outputs). The training input samples (X in the above notation) are selected using Latin Hypercube Sampling from uniform prior distributions in the range [0.01, 0.05] for each Manning coefficient. According to the '10d' rule (Sobol 2001;Hristov et al. 2017), it is common to train a Gaussian process emulator using at least 10d samples, where d is the number of input parameters. As a cautious approach, within this work we use 10(d + 1) training samples for each selected experiment design to be tested using Bayesian inference. The corresponding output for each training sample is generated from a forward run of the Thetis numerical model comprising a 13-day spin-up period followed by a 14.77-day harmonic analysis period. Harmonic analysis for the M2, S2, M4, S4, MS4, 2SM2, M6 and 2MS6 constituents is performed at each of the 15 tide gauge locations, using the Python package uptide . The M2 and S2 outputs are used as the training data to construct a Gaussian process emulator as described above.
Prior to the use of the GPEs within the Bayesian inference algorithm, we first verified their faithfulness as a surrogate for the full numerical model by comparing their outputs with additional full numerical model runs for random samples of the model parameters. We found that the GPE error covariance calculated via Eq. (18) provides a good estimate of the true emulator error covariance, and that this covariance is small compared with the assumed observation uncertainty. This covariance is therefore neglected when using the GPE within the Bayesian inference algorithm.

A.2 Markov Chain Monte Carlo algorithm
The Bayesian inference framework follows a similar approach to Sraj et al. (2013Sraj et al. ( , 2014b and Warder et al. (2021a). Denoting the vector of tidal harmonic observations y and the vector of model parameters θ , Bayes' theorem gives (θ |y) ∝ L(y|θ)q(θ ), where is the posterior distribution of the parameters θ given the observed data y, L is the likelihood of observing the outputs y given the parameters θ, and q is the prior distribution of the parameter vector θ. The prior distribution on each individual It is assumed that the model-observation discrepancies y − f, where f is the vector of model outputs, are independent and identically distributed variables with zero mean, and a covariance matrix where σ 2 amp and σ 2 phase are the amplitude and phase measurement covariances, respectively, as described in Sect. 2.2, and I denotes an identity matrix of the specified size (we use 30 amplitude observations and 30 phase observations for the Bristol Channel case study). The likelihood L(y|θ ) is therefore given by where K = 60 is the number of observations. Equation (21) gives the probability distribution of the unknown Manning coefficients, given the set of observations y, and its evaluation represents the parameter estimation problem. A technique for sampling this posterior distribution when it cannot be directly calculated is the Markov Chain Monte Carlo (MCMC) method, which has the advantage that the constant of proportionality need not be determined. This work uses an implementation of the Random Walk Metropolis Hastings MCMC algorithm (Hastings 1970), which is given by Algorithm 1. Within the algorithm, we set the proposal distribution covariance matrix to Σ step = 0.001 2 I m×m .
This was found to give satisfactory results, without the need for an adaptive MCMC algorithm.
In the results presented within this paper, the number of MCMC samples is selected as M = 10 6 , with the first 10 5 samples discarded as a burn-in period. The resulting chain of values θ [i] generated by the MCMC algorithm constitute samples from the posterior distribution, from which the covariance of the estimated parameters can be trivially computed.