Surrogate-assisted distributed swarm optimisation for computationally expensive geoscientific models

Evolutionary algorithms provide gradient-free optimisation which is beneficial for models that have difficulty in obtaining gradients; for instance, geoscientific landscape evolution models. However, such models are at times computationally expensive and even distributed swarm-based optimisation with parallel computing struggle. We can incorporate efficient strategies such as surrogate-assisted optimisation to address the challenges; however, implementing inter-process communication for surrogate-based model training is difficult. In this paper, we implement surrogate-based estimation of fitness evaluation in distributed swarm optimisation over a parallel computing architecture. We first test the framework on a set of benchmark optimisation problems and then apply to a geoscientifc model that features landscape evolution model. Our results demonstrate very promising results for benchmark functions and the Badlands landscape evolution model. We obtain a reduction in computationally time while retaining optimisation solution accuracy through the use of surrogates in a parallel computing environment. The major contribution of the paper is in the application of surrogate-based optimisation for geoscientific models which can in the future help in better understanding of paleoclimate and geomorphology.


Introduction
Evolutionary algorithms are loosely motivated by the theory of evolution where species are represented by individuals in a population that compete and collaborate with each other, producing offspring over generations that to improve quality given by fitness measure [24,45].Particle swarm optimisation (PSO), on the other hand, is motivated by the flocking behaviour of birds or swarms represented by a population of particles (individuals) that compete and collaborate over time [1,26,27,35].Evolutionary and swarm optimisation methods have been prominent in a number of areas such as realparameter global optimization, combinatorial optimization and scheduling, and machine learning [47,53,49,1].Research in evolutionary and swarm optimisation has focused on different ways to create new solutions with mechanisms that are heuristic in nature; hence, different variants are available [42,26].A major challenge has been in applying them in large-scale or computationally expensive optimisation problems that require thousands of function evaluations where a single function evaluation can take minutes to hours, or even days [33].An example of an expensive function is a geoscientific model for landscape evolution problem [12], and deep learning models for big data problems [79].Computationally expensive optimisation can be addressed with distributed/parallel computing and swarm optimisation [6,62,3]; however, we need efficient strategies for representing the problem.
Surrogate-assisted optimisation [31,40,52,81,30] provides a remedy for expensive models with the use of statistical and machine learning models to provide a low computational replicate of the actual model.The surrogate model is developed by training from available data generated during optimisation that features a set of inputs (new solutions) and corresponding output (fitness) given by the actual model.The method is also known as Bayesian optimisation where the surrogate model (acquisition function) is typically a Gaussian process model [10]; however, neural networks and other machine learning models have also been used [60].Evolutionary and swarm optimization methods have been used in surrogate-assisted and Bayesian optimization, and have been prominent in fields of engine and aerospace design [51,37,58,31], robotics [10], experimental design [29], and machine learning [63,60].
Although limited studies exist, surrogate-assisted optimisation has been applied to geoscience problems.Wang et al. [71] presented reliability-enhanced surrogate-assisted PSO in landslide displacement prediction where the method was used for feature selection and hyperparameter optimization.The PSObased surrogate model was used to search the hyperparameters and feature sets in the long short-term memory (LSTM) deep learning model for predicting landslide displacement.Gong et al. [28] presented an ensemble-based surrogate-assisted co-operative PSO for water contamination source identification.Zhou et al. [80] used a surrogate-assisted evolutionary algorithm that incorporated multi-objective optimization for oil and gas reservoirs focusing on good placement and hydraulic fracture parameters.The method employed global-local hybridization searching strategy via PSO with low-fidelity surrogate model that used a multilayer perception.Furthermore, Zhand et al. [78] used surrogate-assisted PSO for gas hydrate reservoir development.Wang et al. [72] presented a surrogateassisted model for the optimization of hyperspectral remote sensing images.Chen et al. [22] utilised a surrogate-assisted evolutionary algorithm for heat extraction optimization of enhanced geothermal systems.
Evolutionary algorithms provide gradient-free optimisation which is beneficial for models that do not have gradient information, for instance, landscape evolution models [18,12].Some instances of such models are so expensive that even distributed evolutionary algorithms with the power of parallel computing would struggle.Hence, we need to incorporate efficient strategies such as surrogate-assisted optimisation that further improves their performance, but this becomes a challenge given parallel processing and inter-process communication for implementing surrogate estimation.
Landscape evolution models are geoscientific models that can be used to reconstruct the evolution of the Earth's landscape over tens of thousands millions of years [23,20,8,48,7].These models guide geologists and climate scientists in better understanding Earth's geologic and climate history that can further also help in foreseeing the distant future of the planet [8,64].These models use data from geological observations such as bore-hole data and estimated landscape topography millions of years ago and require climate conditions and geological parameters that are not easily known.Hence, it becomes an optimisation problem to estimate these parameters which has been tackled mostly with Bayesian inference via Markov Chain Monte Carlo (MCMC) sampling in previous studies that used parallel computing [18], and surrogate assisted Bayesian inference [13].Motivated by these studies, we bring the problem of the landscape evolution model to the evolutionary and swarm optimisation community; rather than viewing it as an inference problem, we view it as an optimisation problem.
In this paper, we implement a surrogate-based optimisation framework via swarm optimisation over a parallel computing architecture.We apply the framework for benchmark optimisation functions and a selected landscape evolution model.We investigate performance measures such as the accuracy of surrogate prediction given different types of problems that differ in terms of dimension and fitness landscape.The contribution of the paper is in the application of surrogate-based optimisation for geoscientific models using parallel computing.The estimation of parameters through surrogate-assisted estimation can in the future help in better understanding paleoclimate and geomorphology which can enhance knowledge about climate change.
The rest of the paper is organised as follows.Section 2 provides background and related work, while Section 3 presents the proposed methodology.Section 4 presents experiments and results and Section 5 concludes the paper with a discussion for future research.

Surrogate-assisted optimization
PSO has been continuously becoming prominent in surrogate-assisted optimisation.Yu et al. [76] compared surrogate-assisted hierarchical PSO, standard PSO, and a social learning PSO for selected benchmark functions under a limited computational budget.Li et al. [44] presented a surrogateassisted PSO for computationally expensive problems where two criteria were applied in tandem to select candidates for exact evaluations.These included a performance-based criterion and a distance-based criterion used to enhance exploration that does not consider the fitness landscape of different problems.The results demonstrated better performance over several stateof-the-art algorithms for selected benchmark functions and a propeller design problem.Chen et al. [21] presented a hierarchical surrogate-assisted differential evolution algorithm for high-dimensional expensive optimization problems with RBF network for selected benchmark functions and an oil reservoir production optimization problem.Yi et al. [75] presented an online variable-fidelity surrogate-assisted harmony search algorithm with a multi-level screening strategy that showed promising performance for expensive engineering design optimization problems.
Furthermore, Li et al. [43] presented an ensemble of surrogates assisted PSO of medium-scale expensive problems which used multiple trial positions for each particle and selected the promising positions by using the superiority and uncertainty of the ensemble simultaneously.In order to feature faster convergence and to avoid wrong global attraction of models, the optima of two surrogates that featured polynomial regression and RBF models were evaluated in the convergence state of particles.Liao et al. [46] presented multi-surrogate multi-tasking optimization of expensive problems to accelerate the convergence by regarding the two surrogates as two related tasks.Therefore, two optimal solutions found by the multi-tasking algorithm were evaluated using the real expensive objective function, and both the global and local models were updated until the computational budget was exhausted.The results indicated competitive performance with faster convergence that scaled well with an increase in problem dimension for solving computationally expensive single-objective optimization problems.Dong et al. [25] presented surrogate-assisted black wolf optimization for high-dimensional and computationally expensive black-box problems that featured RBF-assisted meta-heuristic exploration.The RBF featured knowledge mining that includes a global search carried out using the black wolf optimization and a local search strategy combining global and multi-start local exploration.The method obtained superior computation efficiency and robustness demonstrated by comparison tests with benchmark functions.Chen et al. [21] presented efficient hierarchical surrogate-assisted differential evolution for highdimensional expensive optimization using global and local surrogate models featuring RBF network with an application to an oil reservoir production optimization problem.The results show that the method was effective for most benchmark functions and gave a promising performance for a reservoir production optimization problem.Ji et al. [38] presented a dualsurrogate-assisted cooperative PSO for expensive multimodal problems which reported highly competitive optimal solutions at a low computational cost for benchmark problems and a building energy conservation problem.Ji et al. [39] further extended their previous approach using multi-surrogate-assisted multitasking PSO for expensive multimodal problems.Some of the prominent examples of surrogate-assisted approaches in the Earth sciences include modelling water resources [54,5], computational oceanography [69], atmospheric general circulation models [59], carbon-dioxide storage and oil recovery [4], and debris flow models [50].

Landscape evolution models and Bayeslands
Landscape evolution models (LEMs) use different climate and geophysical aspects such as tectonics or climate variability [73,65,55,11,2] and combine empirical data and conceptual methods into a set of mathematical equations that form the basis for driving model simulation.Badlands (basin and landscape dynamics) [55,56] is a LEM that simulates landscape evolution and sediment transport/deposition [34,32] with an initial topography exposed to climate and geological factors over time with given conditions (parameters) such as the precipitation rate and rock erodibility coefficient.The major challenge is in estimating the climate and geological parameters and those that are linked with the initial topography since they can range millions of years in geological timescale depending on the problem.A way is to develop an optimisation framework that utilizes limited data.Since gradient information is not available, Badlands can be seen as a black-box optimisation model where the unknown parameters need to be found via optimisation or Bayesian inference.So far, the problem has been approached with Bayesian inference that employs MCMC sampling for the estimation of these parameters in a framework known as Bayeslands [14].
The Bayeslands framework had limitations due to the computational complexity of the Badlands model; hence, in our earlier works, we extended it using parallel tempering MCMC [19] that featured parallel computing to enhance computational efficiency.Although we used parallel computing with a small-scale synthetic Badlands model, the procedure remained computationally challenging since thousands of samples were drawn and evaluated.Running a single large-scale real-world Badlands model can take several minutes to hours, and even several days depending on the area covered by the landscape evolution model considered, and the span of geological time considered in terms of millions of years.Therefore, parallel tempering Bayeslands was further enhanced through surrogateassisted estimation.We used developed surrogate-assisted parallel tempering MCMC for landscape evolution models where a global-local surrogate framework utilised surrogate training in the main process that managed MCMC replicas running in parallel [13].We obtained promising results, where the pre-diction performance was maintained while lowering the overall computational time.

PSO
PSO is a population-based metaheuristic that improves the population over iterations with a given measure of accuracy known as fitness [41].The population of the candidate solution is known as a swarm while the candidate solutions are known as particles that get updated according to the particle's position and velocity.In the swarm, each particle's movement is typically influenced by its local best-known position which gets updated when better positions are discovered by other particles.Equations 1 and 2 show the velocity and position update of the particle in a swarm, respectively.
where v t x t represent the velocity and position of a particle at time step t, respectively.c 1 and c 2 are the user-defined cognitive and social acceleration coefficients, respectively.γ 1 and γ 2 are random numbers drawn from uniform U[0, 1] distribution, and α is a user-defined inertia weight.There are several variants in the way the particles get updated which have their strengths and limitations for different types of problems [41,61,68,77,74,35].

Surrogate assisted Distributed framework
We update the swarm using the canonical particle update method [41] and execute the swarms in a distributed swarm framework that employs parallel computing.

Inter-process Communication
In our framework, we exchange selected swarm particles after a certain number of generations with inter-process communication.During the exchange, we replace 20 percent of the weaker particles with stronger ones from other swarms.This enhances the exploration and exploitation properties of our framework for the optimization problem.Afterwards, the process continues where local swarms create particles with new positions and velocities as shown in Figure 1.Hence, we feature distributed swarms and parallel processing for better diversity and computational complexity.We execute distinct parallel processes for respective swarms with central processing units (CPUs).Each process features the optimisation function which is either a synthetic problem made to be computationally expensive using time delay, or an application problem.

Surrogate Model
Suppose that the true function (model) is represented as F = g(x); where g() is the function and x is a solution or particle from a given swarm.Our surrogate model outputs pseudofitness F = ĝ(x) that would give an approximation of the true function via F = F + e; where, e represents the difference between the surrogate and true function.The surrogate model gives an estimate using the pseudo-fitness for replacing the true function when required by the framework.
S prob is an important hyperparameter that controls the use of a surrogate in the prediction.We do not want it to be too high in case we are not very confident, i.e. when enough data is not present for our surrogate model.We do not want it to be too low since the entire optimisation process will become time-consuming; hence, we need to tune this parameter.The surrogate model is trained by accumulating the data from all the swarms, i.e. the input x i,s and associated true-fitness F i,s pairs; where s represents the particle and i represents the swarm.In order to benefit from the surrogate model in the optimization process, it's very crucial to manage the surrogate training and surrogate usage.We cannot use a surrogate from the very beginning of the optimization as its predictions will be random, and we also cannot wait too long as the computational efficiency will be affected.Hence, to manage the surrogate training and its use, an important hyperparameter ψ is used which refers to the surrogate interval measured in terms of the number of generations.ψ is the interval from which the collected data is used to train the model.The updated surrogate model is used until the next interval is reached, and knowledge in the surrogate model is refined in the next stage (defined by ψ) -this can be seen as a form of transfer learning.The collected input features (Φ) combined with the true fitness λ, create θ for the surrogate model.
where x i,s represents the given particle from the swarm, s, F i,s is the output from the true fitness, and M is the number of swarms.Therefore, the surrogate training dataset (Θ = [Φ, λ]) is made up of input features (Φ) and response (λ) for the particles that get collected in each surrogate interval (s + ψ).The pseudo-fitness is given by ŷ = f (Θ).

Surrogate-assisted Framework
In Algorithm 1, we present further details about our framework that features surrogate-assisted optimisation using distributed swarms.
We implement the algorithm using distributed computation over CPU cores, as shown in Algorithm 1.The manager process is shown in black where inter-process communication among swarms takes place which exchange parameters at regular intervals (given by φ).Furthermore, the surrogate model is also trained at regular intervals (ψ).The parallel swarms of the distributed framework have been highlighted in pink in Algorithm 1. Stage 0 features the initialization of particles in the swarm.We begin the optimisation process by initialising all the swarms (Stage 0.1) in the ensemble with random real numbers in a range as required by the optimisation function (model).
Once the swarms are initialised, we begin the evolution (optimisation) by first evaluating the particles in the swarms using the fitness (objective) function.Hence, we iterate over surrogate interval (ψ) and evolve each swarm for φ generations, both user-defined parameters.We update the best particle and best fitness for each of the respective swarms in the ensemble afterwards.Once these basic operations are done, we begin the evolution where we create a new set of swarms for the next generation by velocity and position update (Stage 1.1).
The crux of the framework is when we consider whether to evaluate the fitness function (true fitness) or to use the surrogate model of the fitness function (pseudo-fitness) when computing fitness values.Stage 1.2 shows how to update the fitness using either surrogate or true fitness of the particle depending on the interval and S prob .Initially, this is not done until the very first surrogate interval is not reached, where all the fitness evaluations are from the true function.In Stage 1.3, we calculate the moving average of the past three fitness values for a particular particle by F past = mean(F g−1 , F g−2 , F g−3 ) to combine with the surrogate model prediction (Stage 1.4).This is done so that we incorporate the recent history of the true model fitness, with the surrogate-based estimated fitness, which is motivated by the autoregressive moving average (ARMA) model.Note that if present, the surrogate estimation fitness will be also considered as part of the past three fitness values.In the case when surrogate fitness is included, we note that additional errors will be added; however, this will be further averaged with true values as there cannot be two surrogate fitness values.In Stages 1.5 and 1.6, we calculate actual fitness and save the values for future surrogate training.Our swarm particle update depends on x i pbest and x i gbest , and due to poor surrogate performance, some of the weaker particles can get higher fitness scores.In order to avoid this issue, we ensure that x i pbest and x i gbest are from true fitness evaluation.In Stage 2.1, given a regular interval (φ generations), we prepare an exchange of selected particles with neighbouring swarms via elitism, where we replace a given percentage of weak particles (given by fitness values).β is a userdefined parameter that determines how much of the swarm particles need to be exchanged.We have given recommendations for β value in the design of experiments.We select only 20% of elitist values as we do not want the majority of the swarms to be similar and maintain diversity.
We note that before we consider the use of pseudo-fitness, we need to train the surrogate model with the same training data which is created from the true fitness.Hence, we need to collect the training data for the surrogate model from all the swarms in the ensemble.In Stage 3.0, the algorithm uses surrogate training data collected from Stage 1.6 (Θ as shown in Equation 3).In Stage 4, the algorithm trains the surrogate model in the manager process with data from Stage 3. The knowledge from the trained surrogate model is then used in the fitness estimation as shown in Stage 1.4.Stage 5.0 implements the termination condition where the algorithm signals the manager process to decrement the number of swarms alive (active) in order to terminate the swarm process.This is done when the maximum number of fitness evaluations has been reached (T max ) for the particular swarm.We use a neural network-based surrogate model with Adam optimisation [2].In order to validate the performance of the algorithm, we measure the quality of the surrogate estimate using the root mean squared error (RMSE): where F i and Fi are the true and pseudo-fitness values, respectively.N denotes the number of times the surrogate model is used for estimation.Figure 1 provides a visual description of the proposed algorithm where multiple swarms are executed using the manager process that also controls the particle exchange and surrogate model update.

Application: Landscape evolution models
Similar to optimizing mathematical functions, in certain problems, we are required to evaluate a score using some simulation or computationally expensive process, such as geoscientific models [48,67,20].Landscape evolution models (LEMs) are a class of geoscientific models that evolve a given topography over a given time with given geological and climate conditions such as rock erodibility and precipitation [55].LEMs are used to model and understand the landscape and basin evolution back in time over millions of years showing surface processes such as the formation of river systems and erosion/deposition, where there is movement of sediments from source (mountains) to sink (basins) [20].LEMs help geologists and paleoclimate scientists understand the evolution of the planet and climate history over millions to billions of years; however, there are major challenges when it comes to data.LEMs generally require data regarding paleoclimate processes which is typically unavailable; and hence, we need to estimate them with methods such as MCMC sampling [17,15].There is limited work in the literature where optimisation methods have been used to estimate the unknown parameters for LEMs, which is the focus of this study.We note that typically, LEMs are computationally very expensive which is dependent on the resolution of the study area (points/kilometer) and the duration of evolution (simulation) back in time (millions of years).Hence, large scale study areas can take from hours to days to run a single LEM even with parallel computing [55].There is no gradient information in the case of Badlands LEM used for this study, and hence estimating the model parameters is a challenge.
In order to demonstrate the optimisation procedure, we select problems where synthetic initial topography has been created using present day topography and used in our previous work [17,15].The selected LEM features a continental margin (CM) problem that is selected taking into account computational time of a single model run as it takes less than three seconds on a single central processing unit (CPU).The CM problem initial topography is selected from the present-day South Island of New Zealand as shown in Figure 2 which covers 136 by 123 kilometres.We provide a visualization of the initial and final topographies along with an erosion/deposition map for CM problem in Figure 3.The CM features six free parameters (Table 1).The notable feature of all three problems is that they model both elevation and erosion/deposition topography.We use the initial topography (Figure 3) and the true values given in Table 1, and run the Badlands LEM by simulating 1 million years to synthetically generate the ground-truth topography.We then create a fitness function with the ground-truth topography and set experiments so that the proposed optimisation methods can get back the true values.The details about the fitness function are given in the following section.

Fitness function
The Badlands LEM produces a simulation of successive time-dependent topographies; however, only the final topography D T is used for topography fitness since no successive ground-truth data is available.The sedimentation (ero- sion/deposition) data is typically used to ground-truth the timedependent evolution of surface process models that include sediment transportation and deposition [55,14].We adapt the fitness function from the likelihood function used in our previous work that used Bayesian inference via MCMC sampling for parameter estimation in Badlands LEM [17].Ω represents the vector of free parameters, such as precipitation rate and erodibility which are independent and optimised by our proposed algorithms based on PSO.The initial topography is given as a two-dimensional matrix D u,v , where corresponds to the location which is given by the latitude u and longitude v (Figure 2).Hence, our topography fitness function F topo for the topography is given by computing f (.) that represents the final topology (at final time t = T ) by Badlands LEM.
where, ν is the number of observations.Badlands LEM produces a sediment erosion/deposition topography at each time frame.We use a selected vector of locations (Figure 3 -Panel c) at time (z t ) simulated (predicted) by the Badlands LEM for given Ω.The sediment fitness F sed (Ω) is given below The total fitness is the combination of the topography and sediment fitness.Note that the initial topography should not be confused with the initial state of the swarms of the PSO.The initial topography is simulated using pre-day topography of the region and hence not generated via any distribution.[17,15].The x-axis denotes the latitude and the y-axis denotes the longitude from the location given in Figure 2, and the elevation is given in meters which is further shown as a colour bar.

Experiments and Results
In this section, we present the results of our framework on synthetic benchmark functions and Badlands LEM.The experiments consider a wide range of performance measures which includes optimization performance in terms of fitness score, computational time, and accuracy of the surrogate model.

Experiment design
We provide the experimental design and parameter setting for our experiment as follows.We implement distributed swarm optimization using parallel computing and inter-process communication where the swarms can have separate processes and exchange solutions (particles) with Python multi-processing library 1 .
The synthetic benchmark optimisation functions are given in Equation 6(Spherical), Equation 7(Ackley), Equation 8(Rastrigin), and Equation 9 (Rosenbrock).In Equation 7, we use the user-defined parameters, a = 20 and b = 0.2.Similarly in Equation 8, a and b and c are user-defined parameters.Note that these functions are chosen due to different levels of difficulty in optimisation and the nature of their fitness landscape.The Spherical function is considered to be a relatively easier optimisation problem since it does not have interacting variables.Ackley and Rastrigin are known to have many local minimums, while Rosenbrock is known as the valley-shaped function.
We use (M = 8) swarms which run as parallel processes (swarms) that inter-communicate with each other at regular interval (ψ = 1).We use the same values for certain hyperparameters (ψ = 1 and φ = 1) to simplify the algorithm.We exchange of a subset of particles (best 20% from the population) with the neighbouring island (worst 20% of the population).This is implemented by setting β = 0.2 in Algorithm 1.We use the population size of 20 particles per swarm, the inertia weight w = 0.729 with social and cognitive coefficients(c1 = 1.4 and c2 = 1.4).We determined these values for parameters in the trial experiments by taking into account the performance on different problems with a fixed number of evaluations.We use minimum and maximum bounds on the parameters as described in Table 1, respectively.We run experiments for 30 and 50 dimension (30D and 50D) instances of the respective benchmark problems.The dimension of our problem for synthetic functions (eg.Rosenbrock) is much larger than LEM since in the literature, 30D optimization functions are more common.We set the total number of function evaluations to 100,000 and 200,000, respectively.In the case of the Badlands model, we use 10,000 function (model) evaluations for the CM problem.
We used the PyTorch machine learning library2 for implementing the surrogate model that uses a neural network model with Adam learning.The surrogate neural network model architecture is given in Table 2, where the input dimensions are defined, i.e. 30D and 50D instances of the benchmark functions.In a similar way, we extend our approach for optimization in the Badlands model with the CM problem, featuring a 6-dimensional (6D) search space.The first and second hidden layers of the neural network-based surrogate model are shown in Table 2.We note that the output layer in the model for all the problems contains only one unit which provides the estimated fitness score.

Results for synthetic benchmark functions
We first present the results (fitness score) for different problems using serial (canonical) PSO, distributed PSO (D-PSO) and surrogate-assisted distributed swarm optimisation (SD-PSO) as shown in Table 3.We use a fixed surrogate probability (s prob = 0.5) and present results featuring mean, standard deviation (std), and best and worst performance for over 30 independent experimental runs with different random initialisation in swarms as shown in Table 3.Note that lower fitness scores provide better performance.
We find a significant reduction in elapsed time for all the problems in Table 3.Note that we added a 0.05 seconds time delay to the respective problems in order to make them slightly computationally expensive to depict real-world application problems (models).When we consider serial PSO with D-PSO and SD-PSO in terms of optimisation performance given by the fitness score, we find that D-PSO improves the PSO significantly for 30D and 50D cases of Rosenbrock, Spherical and Rastrigin problems.We also see improvement in Ackley's problem, but it's not as large as in previous problems.We find the variation in the results given by the standard deviation is lowered highly with D-PSO which shows it is more robust to initialization and has the ability to provide a more definitive solution.
Moreover, in comparison of SD-PSO with D-PSO, we mostly get better fitness with SD-PSO, with a reduction of computation time due to the use of surrogates.Despite the use of surrogatebased fitness estimation, we observe that the fitness score has not greatly depreciated.It is interesting to note in Ackley 30 and 50D case, addition of surrogates improved the fitness score.We find a 30% reduction in computational time, it is expected this to increase if we had more time delay (rather than 0.05 seconds), which will be shown in the Badlands LEM experiments to follow.
The RMSE of prediction of fitness by the surrogate model is shown in Table 4.We notice large RMSE values for the Rosenbrock problem when compared to the rest.In surrogate prediction accuracy (RMSE) given in Figure 5, we observe a constant reduction in RMSE with intervals (along the x-axis) in Ackley (30D and 50D), and Spherical (30D) model functions.In other problems, the RMSE is lower towards the end, but the trend is not that smooth.We note that in the Rosenbrock function, there exists an interval where our surrogate performs poorly causing a major decrease in accuracy.
Figure 6 and 7 provide a visual description of surrogates prediction quality in the evolution process.We show the bar plots for mean values with a 95% confidence interval (shown by error bars) of the actual fitness and pseudo-fitness at regular intervals for different benchmark problems.We notice that the Rastrigin and Ackley problems have a better surrogate prediction with better confidence intervals when compared to the Spherical problem.Finally, we evaluate the effect of the surrogate probability for Rosenbrock and Rastrigin 30D problems.Figure 4 (Panel a) provides a graphical analysis of the effect of the fitness score and computational time (Panel b) given different surrogate probabilities.We observe that the computational time decreases linearly with an increase in the surrogate probability.On the other hand, the fitness score degrades with an increase of surrogate probability (Rosenbrock 30D); however, with an elbow-shaped curve -a trade-off can exist between time and optimization performance (surrogate probability of 0.6).In the case of the Rastrigin 30D, there is not a large difference in loss of fitness (surrogate probability ≤ 0.5) when compared to the Rosenbrock 30D; we note that these problems have distinctly different fitness landscapes which could explain about the difference in the performance.

Results for the Badlands model
Finally, we present results for the case of the Badlands CM problem which is a 6D problem.The results for CM problem highlighting our methods (PSO, D-PSO and S-DPSO) when compared to previous approaches (PT-Bayeslands and SAPT-Bayeslands) are shown in Table 5.The results show the computational time and prediction performance of the Badlands model in terms of elevation and erosion/deposition RMSE (using Equations 4 and 5, respectively) given the optimised parameters.The results show the mean and standard deviation from 30 experimental runs from independent initial positions.We see a major reduction in computational time using D-PSO when compared to PSO and find consistent performance in terms of elevation and erosion RMSE.The experiments used a surrogate model at an interval of 10 generations with a probability of 0.5.We observe that the SD-PSO further improved the performance by reducing computational time using surrogates.The RMSE of the estimation of fitness function by the surrogate model when compared to the actual Badlands model is shown in Table 4.We note that the RMSE in this case cannot be compared to the synthetic fitness functions (eg.Rosenbrock) since the fitness function is completely different.In synthetic fitness functions, there is no data whereas in Badlands LEM, we use Badlands prediction and ground-truth topography data to compute the fitness.Figure 5 (Panel e) shows surrogate training accuracy (RMSE) for different surrogate intervals where we observe a constant improvement of performance by the surrogate model over time (surrogate intervals).This implies that the surrogate model is improving as it gathers more data over time.
In Figure 8, we show the change in CM topology over selected time-slices simulated by Badlands according to the parameters optimized by S-DPSO.The elevation RMSE in Table 5 considers the difference between ground-truth topography given in Figure 3.We notice that visually the final topography (present-day) in Figure 8 (Panel f) resembles Figure 3 (Panel b).Furthermore, we show in Figure 9 a cross-section (Panel a) for Badlands predicted elevation vs the ground-truth elevation for final or present-day topography.We also show the bar plot (Panel b) of predicted vs ground-truth sediment erosion/deposition at 10 selected locations taken from Figure 3 (Panel c).The cross-section and bar plots show that the Badlands prediction well resembled the ground-truth data, respectively.We obverse that the cross-section (Panel a) uncertainty is higher for certain locations as highlighted.The high uncertainty is in an area of a high slope below sea level which is reasonable given the effect of sediment flow due to precipitation.

Discussion
We note that the surrogate-assisted method (SD-PSO) estimates the fitness and the velocity update in the next generation as done in a standard PSO (Equations 1 and 2).The surrogate fitness does not change the structure of the PSO, it is only used as a way to estimate the fitness and hence reduce the computational time.We note that the performance of an optimisation method depends on the nature of the optimisation benchmark problems [36] due to their fitness landscape modality, i.e., unimodal (Rosenbrock, Sphere) and multimodal (Rastrigin, Ackley), and separability (Rosenbrock is considered nonseparable and Sphere separable).It is easier to construct optimisation methods for separable functions using a divide and conquer approach, which faces challenges in separable problems [57,16].In terms of the results, we find that SD-PSO has done better than PSO and D-PSO (Table 3 and Table 5) for most problems.This could be done to the fitness estimation by the surrogate model, i.e. surrogate -based fitness estimations could have helped the algorithm in escaping local optima and hence      it achieved better fitness.In general, the results show that distributed surrogate-assisted swarm optimisation framework can improve performance by decreasing computation time while retaining optimisation accuracy (fitness).
We highlight that the Badlands model does not provide gradient information regarding the parameters; and hence, only gradient-free optimisation and inference methods can be used.In our previous work, MCMC sampling (PT-Bayeslands and SAPT-Bayeslands [14,19]) has been used, where the parameter inference was implemented via random-walk proposal distribution with MCMC replicas running in parallel.In this paper, the results show that the use of meta-heuristic (evolutionary) search operators from particle swarm optimisation provides better search features.The results motivate the use of the proposed methodology for expansive optimisation models, which can feature other geoscientific models.Further use of surrogates in larger instances of the Badlands LEM can provide a significant reduction in computational time.
A major contribution of the framework is in the implementation using parallel computing, which takes into account interprocess communication when exchanging particles (solutions) during the optimisation process.In our proposed framework, the surrogate training was implemented in the manager process and the trained parameters were transferred to the parallel swarm processes, where the local surrogate model was used to estimate the fitness of the particle when required (Figure 1).This implementation seamlessly updated the local surrogate model at regular intervals set by the user.We note that although less than eight parallel swarm processes were used, in largescale problems, the same implementation can be extended and amended.We note that in the case when the number of parameters in the actual model significantly increases, different ways of training the surrogate model can be explored.We also note that the number of particles per swarm is strongly dependent upon the problem in hand [41,35].
Another major contribution from the optimisation process for the case of the landscape evolution model is the estimation of the parameters, such as precipitation values in the Badlands model.We note that MCMC sampling methods provide inference whereas, optimisation methods provide an estimation of parameters.The major difference is that we represent the parameters using probability distribution in the case of inference, whereas optimisation methods provide single-point estimates.Through optimisation, we can estimate what precipitation values gave rise to the evolution of landscape which resulted in the present-day landscape.The landscape evolution model hence provides a temporal topography map of the geological history of the region under study.These topographical maps, along with the optimised values for geological and climate parameters (such as precipitation and erodibility) can be very useful to geologists and paleoclimate scientists.
Although PSO has been selected as the designated evolutionary optimisation algorithm, the framework has the flexibility to enable the implementation of other evolutionary algorithms depending on the application problem.A wide range of PSO variants exist in the literature with strengths and limitations [66,70,9].In the case when the application involves combinatorial optimisation or a scheduling problem, then an appropriate evolutionary algorithm would be needed.
Our experiments considered a simulation where the presentday topography of a selected region was used as the initial topography.The Badlands model ran depicting a million years in time, to simulate successive topographies.However, in the area of LEMs, the focus is generally to understand the environmental and climate history back in time that led to present-day topography.In such cases, we need to estimate the initial topog-  raphy (eg. a million years back in time), which can be based on present-day topography.The estimation of initial topography is an optimisation problem of its own and joint optimisation of parameters such as precipitation can make the process very complex.However, this can be addressed in future research.

Conclusions and Future Work
We presented a surrogate framework that features parallel swarm optimisation processes and seamlessly integrates surrogate training from the manager process to enable surrogate fitness estimation.Our results indicate that the proposed surrogate-assisted optimisation method significantly reduces the computational time while retaining solution accuracy.In certain cases, it also helps in improving the solution accuracy by escaping from the local minimum via the surrogates.Although we used PSO as the designated algorithm, other optimisation algorithms, such as genetic algorithms, evolution strategies, and differential evolution can also be used.
In future work, the parallel optimisation process could be improved by a combination of different optimisation algorithms which can provide different capabilities in terms of exploration and exploitation of the search space.The proposed framework can also incorporate other benchmark function models, particularly those that feature constraints and also be applied to discrete parameter optimisation problems which are expensive computationally.

Ethical approval
The data used in the manuscript is openly available and does not need any ethical approvals.

Funding details
There are no external funding sources to report.

Conflict of interest
The authors do not have any conflict of interest with the manuscript and publication process.

Availability of data and materials
We provide an open-source implementation of the proposed algorithm in Python along with data and sample results3 .

Figure 1 :Figure 2 :
Figure 1: Surrogate-assisted distributed swarm optimisation features surrogates to estimate the fitness of expensive models or functions.
(a) CM initial topography (b) CM ground-truth topography (c) CM erosion/deposition map where yellow dots show the location (well) for computing sediment fitness (Equation6).

Figure 3 :
Figure 3: The figures show the initial and evolved ground-truth topography and erosion/deposition after one million years for CM problems taken from[17,15].The x-axis denotes the latitude and the y-axis denotes the longitude from the location given in Figure2, and the elevation is given in meters which is further shown as a colour bar.

Figure 4 :
Figure 4: The figure shows the effect of surrogate probability on fitness score and computational time.
(a) Cross-section reconstruction by optimizing Badlands CM model using S-DPSO.The ground truth represents the actual topography, whereas Badlands prediction reports the mean topography predicted by S-DPSO over 30 experimental runs.The 95% confidence interval represents the uncertainty.(b) Sediment prediction using S-DPSO.The ground truth represents the actual topography, whereas the Badlands prediction reports the mean topography predicted using S-DPSO over 30 experimental runs.

Figure 9 :
Figure 9: Prediction cross-section (Panel a) and sediment erosion/deposition (Panel b) with uncertainty given as 95 % confidence interval from 30 experimental runs.

R.
Chandra contributed by writing, coding, experiments and analysis of results.Y. Sharma contributed by coding, experiments, writing and analysis.

Table 2 :
Model architecture for surrogate model

Table 3 :
Optimization results on benchmark functions