Simultaneous perturbation stochastic approximation for tidal models
 869 Downloads
 3 Citations
Abstract
The Dutch continental shelf model (DCSM) is a shallow sea model of entire continental shelf which is used operationally in the Netherlands to forecast the storm surges in the North Sea. The forecasts are necessary to support the decision of the timely closure of the moveable storm surge barriers to protect the land. In this study, an automated model calibration method, simultaneous perturbation stochastic approximation (SPSA) is implemented for tidal calibration of the DCSM. The method uses objective function evaluations to obtain the gradient approximations. The gradient approximation for the central difference method uses only two objective function evaluation independent of the number of parameters being optimized. The calibration parameter in this study is the model bathymetry. A number of calibration experiments is performed. The effectiveness of the algorithm is evaluated in terms of the accuracy of the final results as well as the computational costs required to produce these results. In doing so, comparison is made with a traditional steepest descent method and also with a newly developed proper orthogonal decompositionbased calibration method. The main findings are: (1) The SPSA method gives comparable results to steepest descent method with little computational cost. (2) The SPSA method with little computational cost can be used to estimate large number of parameters.
Keywords
Numerical tidal modeling Parameter estimation Simultaneous perturbation Stochastic approximation1 Introduction
Accurate sea water level forecasting is crucial in the Netherlands. This is mainly because large areas of the land lie below sea level. Forecasts are made to support the storm surge flood warning system. Timely water level forecasts are necessary to support the decision for closure of the movable storm surge barriers in the Eastern Scheldt and the New Waterway. Moreover, forecasting is also important for harbor management, as the size of some ships have become so large that they can only enter the harbor during high water period. The storm surge warning service (SVSD) in close cooperation with the Royal Netherlands meteorological institute is responsible for these forecasts. The surge is predicted by using a numerical hydrodynamic model, the Dutch continental shelf model (DCSM) (see Stelling 1984; Verboom et al. 1992). The performance of the DCSM regarding the storm surges is influenced by its performance in forecasting the astronomical tides. Using inverse modeling techniques, these tidal data can be used to improve the model results.
Most efficient optimization algorithms require a gradient of the objective function. This usually requires the implementation of the adjoint code for the computation of the gradient of the objective function. The adjoint method aims at adjusting a number of unknown control parameters on the basis of given data. The control parameters might be model initial conditions or model parameters (Thacker and Long 1988). A sizeable amount of research on adjoint parameter estimation was carried out in the last 30 years in fields such as meteorology, petroleum reservoirs, and oceanography for instance by Seinfeld and Kravaris (1982), Bennet and Mcintosh (1982), Ulman and Wilson (1998), Courtier and Talagrand (1990), Lardner et al. (1993) and Heemink et al. (2002). A detailed description of the application of the adjoint method in atmosphere and ocean problems can be found in Navon (1998).
One of the drawbacks of the adjoint method is the programming effort required for the implementation of the adjoint model. Research has recently been carried out on automatic generation of computer code for the adjoint, and adjoint compilers have now become available (see Kaminski et al. 2003). Even with the use of these adjoint compilers, this is a huge programming effort that hampers new applications of the method. Courtier et al. (1994) had proposed an incremental approach, in which the forward solution of the nonlinear model is replaced by a low resolution approximate model. Reduced order modeling can also be used to obtain an efficient loworder approximate linear model (Hoteit 2008; Lawless et al. 2008).
This paper focuses on a method referred to as the simultaneous perturbation stochastic approximation (SPSA) method. This method can be easily combined with any numerical model to do automatic calibration. For the calibration of numerical tidal model, the SPSA algorithm would require only the water level data predicted from the given model. SPSA is stochastic offspring of the Keifer–Wolfowitz Algorithm (Kiefer and Wolfowitz 1952) commonly referred as finite difference stochastic approximation (FDSA) method. This algorithm uses objective function evaluations to obtain the gradient approximations. Each individual model parameter is perturbed one at a time and the partial derivatives of the objective function with respect to the each parameter is estimated by a divided difference based on the standard Taylor series approximation of a partial derivative. This approximation of each partial derivative involved in the gradient of the objective function requires at least one new evaluation of the objective function, thus this method is not feasible for automated calibration when we have large number of parameters.
The SPSA method uses stochastic simultaneous perturbation of all model parameters to generate a search at each iteration. SPSA is based on a highly efficient and easily implemented simultaneous perturbation approximation to the gradient. This gradient approximation for the central difference method uses only two objective function evaluation independent of the number of parameters being optimized. The SPSA algorithm has gathered a great deal of interest over the last decade and has been used for a variety of applications (Hutchison and Hill 1997; Spall 1998, 2000; Gerencser et al. 2001; Gao and Reynolds 2007). As a result of the stochastic perturbation, the calculated gradient is also stochastic, however the expectation of the stochastic gradient is the true gradient (Gao and Reynolds 2007). So one would expect that the performance of the basic SPSA algorithm to be similar to the performance of steepest descent.
The gradientbased algorithms are faster to converge than any objective functionbased gradient approximations such as SPSA algorithm when speed is measure in terms of the number of iterations. The total cost to achieve effective convergence depends not only on the number of iterations required, but also on the cost needed to perform these iterations, which is typically greater in gradientbased algorithms. This cost may include greater computational burden and resources, additional human effort required for determining and coding gradients.
Vermeulen and Heemink (2006) proposed a method based on proper orthogonal decomposition (POD) which shifts the minimization into lower dimensional space and avoids the implementation of the adjoint of the tangent linear approximation of the original nonlinear model. Recently, Altaf et al. (2011) applied this PODbased calibration method for the estimation of depth values and bottom friction coefficients for a very largescale tidal model. The method has also been applied in petroleum engineering by Kaleta et al. (2011) for history matching problems. One drawback of the PODbased calibration method is its dependence on the number of parameters.
In this paper the SPSA algorithm is applied for the estimation of depth values in the tidal model DCSM of the entire European continental shelf. A number of calibration experiments is performed both simulated and real data. The effectiveness of the algorithm is evaluated in terms of the accuracy of the final results as well as the computational costs required to produce these results. In doing so, comparison is made with a traditional steepest descent method and also with a newly developed PODbased calibration method.
The paper is organized as follows. Section 2 describes the SPSA algorithm. This section also briefly discusses the PODbased calibration approach which is used here as comparison with SPSA method. The following section briefly explains the DCSM model used in this study. Section 4 contains results from experiments with the model DCSM, to estimate the water depth. The paper concludes in Section 5 by discussing the results.
2 Parameter estimation using SPSA
 1.Define the n ^{ p } dimensional column vector \(\triangle_{l}\) byand$$ \triangle_{l}=[\triangle_{l,1},\triangle_{l,2},\cdots,\triangle_{l,n^p}]^{T}, $$(5)where \(\triangle_{l,i}, i=1,2,\cdots,n^p\) represents independent samples from the symmetric ±1 Bernoulli distribution. This means that + 1 or − 1 are the only possible values that can be obtained for each \(\triangle_{l,i}\). It also means that$$ \triangle_{l}^{1}=[\triangle_{l,1}^{1},\triangle_{l,2}^{1},\cdots,\triangle_{l,n^p}^{1}]^{T}, $$(6)and$$ \triangle_{l,i}^{1}=\triangle_{l,i}, $$(7)where E denotes the expectation.$$ E[\triangle_{l,i}^{1}]=E[\triangle_{l,1}]=0, $$(8)
 2.
Define a positive coefficient c _{ l } and obtain two evaluations of the objective function J(γ) based on the simultaneous perturbation around the current γ ^{ l }: \(J(\gamma^l+c_l\triangle_{l})\) and \(J(\gamma^lc_l\triangle_{l})\).
 3.A realization of the stochastic gradient is then calculated by using central difference approximation asSince \(\triangle_{l}\) is a random vector, \(\hat g_l\) is also random vector. So by generating a sample of \(\triangle_{l}\), we generate a specific sample of \(\hat g_l\). The FDSA algorithm involves computation of each component of ∇J by perturbing one model parameter at a time. If one does a onesided approximation for each partial derivative involved in \(\nabla J(\gamma^{l})\), then computation of the gradient requires n ^{ p } + 1 evaluations of J for each iteration of the steepest descent algorithm. In contrast, the SPSA requires only two evaluations of the objective function \(J(\gamma^l+c_l\triangle_{l})\) and \(J(\gamma^l+c_l\triangle_{l})\) at each iteration.$$ \hat{g_{l}}({\gamma^{l}})=\frac{J(\gamma^l+c_l\triangle_{l})J(\gamma^lc_l\triangle_{l})}{2c_l}\triangle_{l}^{1} $$(9)
2.1 Choice of a _{ l } and c _{ l }
The value of constant c should be chosen so that c is equal to the standard deviation of the noise in objective function J. If one has perfect objective function, then c should be chosen as small positive number.
2.2 Average stochastic gradient
2.3 PODbased calibration method
Vermeulen and Heemink (2006) proposed a method based on POD which shifts the minimization into lower dimensional space and avoids the implementation of the adjoint of the tangent linear approximation of the original nonlinear model. Due to the linear character of the PODbased reduced model its adjoint can be implemented easily and the minimization problem is solved completely in reduced space with very low computational cost.
2.3.1 Collection of the snapshots and POD basis
2.3.2 Approximate objective function and its adjoint
The value of the approximate objective function \(\hat J\) is obtained by correcting the observations Y(t _{ i }) for background state X ^{ b }(t _{ i }) which is mapped on the observational space through a mapping H and to the reduced model state ξ(t _{ i }, Δγ) which is mapped to the observational space through mapping \(\hat H\), with \(\hat H = HP\).
Recently, Altaf et al. (2011) applied this PODbased calibration method for the estimation of depth values and bottom friction coefficients for a very largescale tidal model. The method has also been recently applied in petroleum engineering by Kaleta et al. (2011) for history matching problems. One drawback of the PODbased calibration method is its dependence on the number of parameters.
3 The Dutch Continental Shelf Model
 x, y

Cartesian coordinates in horizontal plane
 t

time coordinate
 u, v

depthaveraged current in x and y direction, respectively
 h

water level above reference plane
 D

water depth below the reference plane
 H

total water depth (D + h)
 f

coefficient for the Coriolis force
 C _{2D }

Chezy coefficient
 τ _{ x }, τ _{ y }

wind stress in x and y direction, respectively
 ρ _{ w }

density of sea water
 p _{ a }

atmospheric pressure
 g

acceleration of gravity
 h _{0}

mean water level
 H

total water depth
 f _{ j } H _{ j }

amplitude of harmonic constituent j
 ω _{ j }

angular velocity of j
 θ _{ j }

phase of j
3.1 Estimation of depth
The bathymetry for a model is usually from nautical maps. These maps usually give details of shallow rather than deepwater areas. If we use these maps to prescribe the water depth, it is reasonable to assume that this prescription of the bathymetry is erroneous. So depth can be a parameter on which model can be calibrated. In the early years of the developments of the DCSM, the changes to bathymetry were made manually. Later automated calibration procedures based on variational data assimilation were developed (TenBrummelhuis et al. 1993; Mouthaan et al. 1994). The complete description on the development of these calibrated procedures for DCSM can be found in Verlaan et al. (2005).
4 Numerical experiment
4.1 Experiment 1
The DCSM model used in this experiment covers an area in the northeast European continental shelf, i.e., 12°W to 13°E and 48°N to 62°N, as shown in Fig. 1. The resolution of the spherical grid is 1/8° × 1/12°, which is approximately 8 × 8 km. With this configuration there are 201 × 173 grid with 19,809 computational grid points. The time step is \(\triangle{\it{t}}=10\) min.
Seven observation points were included in the assimilation, two of which are located along the east coast of the UK, two along the Dutch coast and one at the Belgium coast (see Fig. 1). The truth model was run for a period of 15 days from 13 December 1997 00:00 to 27 December 1997 24:00 with the specification of water depth \(D_{n_{1},n_{2}}^{b}\) as used in the operational DCSM to generate artificial data at the assimilation stations. The first 2 days were used to properly initialize the simulations and set of observations Y of computed water levels h were collected for last 13 days at an interval of every ten minutes in seven selected assimilation grid points, which coincide with the points where data are observed in reality. The observations were assumed to be perfect. This assumption was made to see how close the estimate is to the truth; 5 m was added in \(D_{n_{1},n_{2}}^{b}\) at all the grid points in domain Ω to get the initial adjustments \(\gamma_{k}^{b}\).
For the SPSA optimization algorithm, two methods were applied to calculate the stochastic gradient. In the first method, the stochastic gradient \(\hat{g_{l}}({\gamma^{l}})\) was computed according to Eq. 9. In the second method, the gradient was computed by Eq. 13 referred as average SPSA where expectation is taken over two independent stochastic gradients.
The values of a, c, A, \(\hat \alpha\), and \(\hat \beta\) were obtained according to the guidelines given in section 2.1. These values were determined as best from several forward model simulations. The iteration cycle for the SPSA algorithm was aborted when the value of the objective function J did not change for the last three iterations of the minimization process (Wang et al. 2009).
For all the algorithms, there was a significant improvements in parameters for regions coinciding with the UK, Dutch and Belgian coast, but there was not much improvement in deep water regions Ω_{1} and Ω_{7}. Since the subdomains containing deep areas are less sensitive as compared the subdomains containing shallow areas, so it is much difficult to estimate γ _{ k } in regions Ω_{1} and Ω_{7}.
Comparison of estimated parameters to true parameters for the twin experiment
ζ  SPSA (%)  Average SPSA (%)  Steepest descent (%) 

All parameters  35.11  29.27  21.02 
Sensitive parameters  9.95  6.29  6.49 
RMSE results for the minimization process after 5th, 10th, 15th, and 20th iterations
 SPSA (cm)  Average SPSA (cm)  Steepest descent (cm) 

Initial  22.80  22.80  22.80 
β = 5  9.95  8.92  6.05 
β = 10  5.63  4.09  2.91 
β = 15  4.10  3.27  – 
β = 20  3.55  –  – 
The RMSE with SPSA after β = 15 and average SPSA after β = 10 is similar. At this point the computational costs of both SPSA and average SPSA are also comparable. It is also clear from the Table 2 that the smallest RMSE value is achieved by steepest descent method in ten iterations.
4.2 Experiment 2
 1.
water level measurement data from the Dutch DONAR database and
 2.
British Oceanographic Data Center offshore water level measurement data.
The PODbased calibration method converged in only two iterations as compared to 14 iterations with the SPSA, respectively. However, the cost of single iteration in the PODbased calibration method is much higher and is dependent on the number of parameters n ^{ p } and the POD modes r used to construct the reduced model (Altaf et al. 2009). So for this experiment one iteration of the POD method required 13 initial simulations of the original nonlinear model to get the ensemble and then additional simulations of the original model to construct the POD reduced model in each iteration β of the optimization process. The SPSA method on the other hand required only two objective function evaluations to compute the gradient in each iteration β of the optimization procedure. For this application, the POD method is also fast since it is not needed to use a full simulations of the original model for the generation of the ensemble (Altaf et al. 2011). One disadvantage of PODbased calibration method is if the number of parameters is large the size of ensemble becomes large too and to construct a good reduced model is usually difficult with large ensemble size. For both the experiments performed the SPSA algorithm converged in almost similar iterations although the number of parameters were different. So, it is expected that the SPSA algorithm will work even with more parameters as the SPSA algorithm is independent of the number of the estimated parameters.
5 Conclusions
In the absence of the adjoint model, the gradient is usually obtained by objective function evaluations to obtain the gradient approximations. Each individual model parameter is perturbed one at a time and the partial derivatives of the objective function with respect to the each parameter is estimated. This method is not feasible for automated calibration when large number of parameters are estimated. Simultaneous perturbation stochastic approximation (SPSA) method uses stochastic simultaneous perturbation of all model parameters to generate a search at each iteration. SPSA is based on a highly efficient and easily implemented simultaneous perturbation approximation to the gradient. This gradient approximation for the central difference method uses only two objective function evaluation independent of the number of parameters being optimized.
SPSA algorithm is applied to calibrate the model DCSM. The DCSM is an operational storm surge model, used in the Netherlands for realtime storm surge prediction in North sea. A number of calibration experiments was performed both with simulated and real data. The results from twin experiment showed that SPSA has a lower convergence rate than the steepest descent and PODbased calibration methods. The steepest descent algorithm converged in ten iterations as compared to 20 and 15 iterations in SPSA and average SPSA, respectively. However, the computational cost of single iteration in the steepest descent and the PODbased calibration methods is much higher and is dependent on the number of parameters n ^{ p }. Although both SPSA and steepest descent methods converged to similar value of the objective function, none of the optimization algorithms achieved the expected reduction in the objective function.
The results from a very largescale tidal model and with real data showed that SPSA algorithm gives comparable results to PODbased calibration method. The PODbased calibration method converged in only two iterations as compared to 14 iterations with the SPSA, respectively. The PODbased calibration method though required 13 initial simulations of the original model to get the ensemble and then extra simulations to construct the POD reduced model in each iteration β of the optimization process. The SPSA method on the other hand required only two objective function evaluations to compute an approximation of the gradient in each iteration β of the optimization procedure independent of the number of estimated parameters. Thus, SPSA algorithm proved to be a promising optimization algorithm for model calibration for cases where adjoint code is not available for computing the gradient of the objective function.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
References
 Altaf MU, Heemink AW, Verlaan M (2009) Inverse shallowwater flow modelling using model reduction. Int J Multiscale Com Eng 7:577–596CrossRefGoogle Scholar
 Altaf MU, Verlaan M, Heemink AW (2011) Efficient identification of uncertain parameters in a large scale tidal model of European continental shelf by proper orthogonal decomposition. Int J Numer Methods Fluids. doi: 10.1002/fld.2511
 Bennet AF, Mcintosh PC (1982) Open ocean modeling as an inverse problem: tidal theory. J Phys Oceanogr 12:1004–1018CrossRefGoogle Scholar
 Chin DC (1997) Comparative study of stochastic algorithms for system optimization based on gradient approximation. IEEE Trans Syst Man Cybern 27:244–249Google Scholar
 Courtier P, Talagrand O (1990) Variational assimilation of meteorological observations with the direct and adjoint shallow water equations. Tellus 42:531CrossRefGoogle Scholar
 Courtier P, Thepaut JN, Hollingsworth A (1994) A strategy for operational implementation of 4dvar, using an incremental approach. Q J R Meteorol Soc 120:1367–1387CrossRefGoogle Scholar
 Gao G, Reynolds AC (2007) A stochastic algorithm for automatic history matching. SPE J 12:196–208Google Scholar
 Gerencser L, Hill SD, Vagoo Z (2001) Discrete optimization via spsa. In: Proc. of American control conference, USAGoogle Scholar
 Heemink AW, Mouthaan EEA, Roest MRT (2002) Inverse 3D shallow water flow modeling of the continental shelf. Cont Shelf Res 22:465–484CrossRefGoogle Scholar
 Hoteit I (2008) A reducedorder simulated annealing approach for fourdimensional variational data assimilation in meteorology and oceanography. Int J Numer Methods Fluids 58:1181–1199. doi: 10.1002/fld.1794 CrossRefGoogle Scholar
 Hutchison DW, Hill SD (1997) Simulation optimization of airline delay with constraints. In: Proc. 36th IEEE conference on decision and control, San Diego, USAGoogle Scholar
 Kaleta MP, Henea RG, Jansen JD, Heemink AW (2011) Modelreduced gradientbased history matching. Comput Geosci 15:135–153CrossRefGoogle Scholar
 Kaminski T, Giering R, ScholzeM(2003) An example of an automatic differentiationbased modeling system. Lect Notes Comput Sci 2668:5–104Google Scholar
 Kiefer J, Wolfowitz J (1952) Stochastic estimation of a regression function. Ann Math Statist 23:462–466CrossRefGoogle Scholar
 Lardner RW, AlRabeh AH, Gunay N (1993) Optimal estimation of parameters for a two dimensional hydrodynamical model of the arabian gulf. J Geophys Res Oceans 98:229–242CrossRefGoogle Scholar
 Lawless AS, Nichols NC, Boess C, BunseGerstner A (2008) Using model reduction methods within incremental 4dvar. Mon Weather Rev 136:1511–1522CrossRefGoogle Scholar
 Leendertse J (1967) Aspects of a computational model for longperiod water wave propagation. Ph.D. thesis, Rand Corporation, Memorandom RM5294PR, Santa MonicaGoogle Scholar
 Mouthaan EEA, Heemink AW, Robaczewska KB (1994) Assimilation of ERS1 altimeter data in a tidal model of the continental shelf. Dtsch Hydrogr Z 36(4):285–319CrossRefGoogle Scholar
 Navon IM (1998) Practical and theoratical aspects of adjoint parameter estimation and identifiability in meteorology and oceanography. Dyn Atmos Oceans (Special issue in honor of Richard Pfeffer) 27:55–79CrossRefGoogle Scholar
 Ray RD (1999) A global ocean tide model from topex/poseidon altimetry: Got99.2. NASA Technical Memorandum 209478Google Scholar
 Seinfeld JH, Kravaris C (1982) Distributed parameter identification in geophysicspetroleum reservoirs and aquifers. In: Tzafestas, SG (ed) Distributed parameter control systems. Pergamon, Oxford. pp 367–390Google Scholar
 Sirovich L (1987) Choatic dynamics of coherent structures. Physica D 37:126–145CrossRefGoogle Scholar
 Spall JC (1998) Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans Aerosp Electron Syst 34:817–823CrossRefGoogle Scholar
 Spall JC (2000) Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans Automat Contr 45:1839–1853CrossRefGoogle Scholar
 Stelling GS (1984) On the construction of computational methods for shallow water flow problem. PhD thesis, Rijkswaterstaat Communications 35, RijkswaterstaatGoogle Scholar
 TenBrummelhuis PGJ (1992) Parameter estimation in tidal flow models with uncertain boundary conditions. Ph.D. thesis, Twente University, The NetherlandsGoogle Scholar
 TenBrummelhuis PGJ, Heemink AW, van den Boogard HFP (1993) Identification of shallow sea models. Int J Numer Methods Fluids 17:637–665CrossRefGoogle Scholar
 Thacker WC, Long RB (1988) Fitting models to inadequate data by enforcing spatial and temporal smoothness. J Geophys Res 93:10655–10664CrossRefGoogle Scholar
 Ulman DS, Wilson RE (1998) Model parameter estimation for data assimilation modeling: temporal and spatial variability of the bottom drag coefficient. J Geophys Res Oceans 103:5531–5549CrossRefGoogle Scholar
 Verboom GK, de Ronde JG, van Dijk RP (1992) A fine grid tidal flow and storm surge model of the north sea. Cont Shelf Res 12:213–233CrossRefGoogle Scholar
 Verlaan M, Mouthaan EEA, Kuijper EVL, Philippart ME (1996) Parameter estimation tools for shallow water flow models. Hydroinformatis 96:341–348Google Scholar
 Verlaan M, Zijderveld A, Vries H, Kroos J (2005) Operational storm surge forcasting in the Netherlands: developments in last decade. Philos Trans R Soc A 363:1441–1453CrossRefGoogle Scholar
 Vermeulen PTM, Heemink AW (2006) Modelreduced variational data assimilation. Mon Weather Rev 134:2888–2899CrossRefGoogle Scholar
 Wang C, Gaoming L, Reynolds AC (2009) Production optimization in closedloop reservoir management. SPE J 14:506–523Google Scholar