Parallelization of AMALGAM algorithm for a multi-objective optimization of a hydrological model

A calibration procedure is essential step to achieve a realistic model simulation particularly in hydrological model which simulates water cycle in the basin. This process is always faced with challenges due to selection of objective function and highly time-consuming. This study aimed to take advantage of parallel processing to accelerate the computations involved with simulation process of hydrologic model linked with the multi-objective optimization algorithm of AMALGAM for multi-site calibration of SWAT hydrologic model parameters. In order to illustrate how meaningful SWAT model calibration trade-off between the four objective functions involved in AMALGAM optimization program, the Pareto solution sets were provided. Furthermore, it is implemented a group of model runs with a number of cores involved (from one to eight) to demonstrate and evaluate the running of parallelized AMALGAM with taking advantages of “spmd” method to decrease the running time of the SWAT model. The results revealed the robustness of the method in reducing computational time of the parameter calibration significantly. This strategy with 4-objective functions focuses on high streamflow (Nash–Sutcliffe coefficient), low streamflow (Box–Cox transformed root–mean–square error), water balance (runoff coefficient error), and flashiness (slope of the flow duration curve error) provided an efficient tool to decide about the best simulation based on the investigated objective functions. This study also provides a strong basis for multi-objective optimization of hydrological and water quality models and its general analytical framework could be applied to other parts of the world.


Introduction
Hydrological and water quality models are used frequently by hydrologists and water resources managers to assess the hydrological variables and understand and manage natural and anthropogenic systems (Zhang et al., 2008;Wu et al. 2013;Afshar and Hassanzadeh 2017;Hemmat Jou et al. 2019) are not only the viable alternatives to field monitoring of watershed scale processes but are critical to aid in evaluating various scenarios of climate, land use, and best management practices.These numerical models are becoming sophisticated and more complex with increasing input data and parameter requirements to represent the field conditions over large areas and thus requiring increasing the computational demand (Sharma et al. 2006;Vaze et al., 2011).As the spatial and temporal precise of a model prediction is enhanced, it essentially needs more computational power which may be due to high-dimensional parameters or model complexity (Beck 1999;Brun et al. 2001;Newman et al. 2017).On the other hand, model calibration and validation procedures need so many of model simulations which lead to more executing time and therefore, reduces speed of model running.
Many studies have attempted to find optimal solutions to overcome the above-mentioned issues, and various concepts and techniques have been introduced (Yapo et al. 1998;Zhang et al., 2008;Wu et al. 2013;Li et al. 2015;Her et al., 2015;Newman et al. 2017, Hemmat Jou et al. 2019).Most of the existing solutions have focused mainly on the use of different single-objective optimization methods.However, practical experiences revealed that majority of single goal functions is not able to take into account all aspects of observed data properly (Yapo et al. 1998;Vrugt et al., 2003;Kollat et al., 2012;Liu et al., 2016).Thus, there is an essential need to take advantage of multi-objective optimization as an efficient calibration procedure to be able extracting the effective information contained in the observed data time series.The advantages of multiple-objective calibration are discussed by Gupta et al. (1998) for the first time, and they proved that such schemes are applicable and desirable for calibrating most of hydrological and water quality models.Subsequently, numerous studies have concentrated on multi-objective frameworks to calibrate rainfall-runoff models (Seibert 2000;Cheng et al. 2002;Madsen et al. 2002;Vrugt et al., 2003;Shafii and Smedt 2009;Zhang et al. 2010;Rajib et al. 2016;Werstuck and Coulibaly 2017;Zhang et al. 2017).
Over past recent years, various algorithms have been developed and evaluated the multi-objective optimization issues in hydrological modeling researches (Vrugt et al., 2003;Zhang et al. 2010;Kollat et al., 2012;Pfannerstill et al. 2017;Kan et al. 2017;Mousavi et al. 2017).Adaptability and flexibility of the evolutionary algorithms (EAs) in locating the global optimum make them to be more popular in different optimization problems (Thyer et al. 1999;Madsen et al. 2002;Her et al., 2015;Hosseini-Moghari et al., 2017;Qu et al. 2018).These algorithms are useful in finding the best solutions near the optimal solutions especially in a high-dimensional parameter space and complicated system, although they usually involved with many trials as an important issue and more time-consuming and computational cost than the deterministic methods (Her et al., 2015).Thus, the improvement in finding the optimal solutions usually is leads to increase the computational time.Since EAs work with huge starting population size, then, the optimization processes face with a major challenging problem especially in distributed or semi-distributed rainfall-runoff models like SWAT with large number of parameters (Rong et al. 2023;Wu et al., 2023;Eckhardt and Arnold 2001;Lin and Radcliffe 2006;Her et al., 2015;Piotrowski 2017).Therefore, the choosing an efficient and effective optimization framework in a complex model (e.g., SWAT) is becoming a critical issue to decrease the running model time.For instance, Hashemi Aslani et al. (2023) established a connection between the SWAT model and MOPSO as a multi-objective optimization algorithm.Their study aimed to minimize various objective functions, such as the number of best management practices and nitrate concentration, while considering different scenarios.
Several researchers have tried to improve the computational efficiency of optimization algorithms by utilizing parallel computing strategies (Waintraub et al. 2009;Tu and Liang 2011;Her et al., 2015;Peng et al. 2017;Kan et al. 2017).Parallel computing as a technique take advantage of simultaneous running of multiple calculations with multi-core processor to run program faster (Li et al. 2011;Wu et al. 2013;Kan et al. 2017).Since the parallelizing the model structure is related to programming, then, it is involved with much knowledge of/familiarity with some modification in model coding.In such a way, separate CPU cores are be able to run each individual simulation.Therefore, as the complexity of hydrologic models, particularly in the context of calibrating distributed watershed hydrologic models, continues to grow, parallel computing emerges as a promising approach to address this challenge (Asghari et al., 2023).
This paper aims to present an application of A Multi-Algorithm, Genetically Adaptive Multi-objective knows as AMALGAM (Vrugt and Robinson 2007) for multiobjective calibration of SWAT (soil and water assessment tool) model, a widely applied hydrological model (Arnold et al. 1998).In most studies, calibration of SWAT model is performed using single-objective optimization algorithms (i.e., SUFI-2, genetic algorithm, particle swarm optimization algorithm) with the parameter estimator software like SWAT-CUP (Abbaspour 2011).The sum of squared (SS) differences between the measured and predicted river discharges is the famous single-objective function to be minimized in most of the optimization algorithms.However, this criterion is used mostly to be biased for high river flows (Shafii and Smedt 2009).A proper alternative could be the use of log-transformed flow values to emphasize low flows; however, this may also result to various optimal parameter sets as well as a dilemma creation for selecting the best parameter set for the user.Multi-objective formulation (here AMALGAM) builds on Van Werkhoven et al. (2009) can consider all aspects of a hydrograph containing of flashiness, water balance, low flows, and peak flows.Adding water balance and flashiness to the objectives provides further useful information about catchments behave and how accurately the model adjusts this behavior.This highly efficient algorithm has found versatile applications within the hydrological community.Notably, Shin et al. (2023) employed it to enhance ensemble prediction accuracy, while Pourreza-Bilondi et al. (2022) and Vargas et al. (2023) achieved success in employing AMALGAM for improving the prediction of destructive floods and enhancing streamflow routing characteristics, respectively.
Then, this study aimed to demonstrate how to parallelize multi-algorithm AMALGAM to make running of SWAT hydrological model faster on a windows platform with multiple CPU cores.The results of this study are expected to provide valuable information for hydrological model practitioner, experts and especially the decision makers for better insights of watershed behavior.

Hydrological model
The hydrological SWAT model, developed by Arnold et al. (1998), is a catchment scale tool for estimating longterm effects of land management on water resources, and agricultural chemical yields in large complex un-gauged basins with different management conditions and varying land uses and soils (Arnold et al. 1998;Onusluel Gul and Rosbjerg, 2010;Khoi and Suetsugi 2012;Leta et al. 2017;Molina-Navarro et al. 2017;Hassanzadeh et al. 2019).It is a conceptual, semi-distributed, continuoustime, and physical-based model which operates on yearly, monthly, and daily time steps (Neitsch et al., 2011) and is used widely for water resources management and hydrological modeling.The water balance formulation in soil profile is the main equation of this hydrology model where all important processes including surface runoff, infiltration, lateral flow, precipitation, evapotranspiration, and percolation are simulated (Bouraoui et al. 2005;Neitsch et al., 2011).All other related eco-hydrological processes (such as vegetation growth, water flow, and land use and water management) are also integrated at sub-basin spatial scale by SWAT.The study area is divided into a multiple sub-basins which are characterized by one or more HRUs (hydrologic response units) where each consists of homogeneous soil characteristics, slope, and land use.During the simulation processes, the vertical flows of water and nutrients are first calculated in HRUs (as the smallest spatial units), and then, it is aggregated and summed for each sub-catchment.Also predicted surface flows, sediments, pesticides, and chemicals for each HRU aggregated and routed to the watershed outlet.Furthermore, two variable storage and Muskingum routing methods can be performed for water flow to be routed through the channel network (Arnold et al. 1998;Neitsch et al., 2011;Khoi and Suetsugi 2012;Kalogeropoulos and Chalkias 2013;Leta et al. 2017;Marhaento et al. 2017).
The main components of water balance equation for each HRU are storage volumes of snow, soil profile (0-2 m), shallow aquifer (typically 2-20 m), and deep aquifer (> 20 m) (Arnold et al. 1998).The peak discharge values are determined with a modified rational method (Chow et al. 1988).Several processes including surface plant uptake, lateral flow, surface runoff, infiltration, percolation, and evaporation involved with to lower layers (Neitsch et al., 2011).Lateral flow which is predicted with kinematics storage routing depends on the several factors, including slope length, degree of slope, and saturated hydraulic conductivity in each soil layer (Arnold et al. 1998).By creating shallow aquifer storage, the proportion of ground water flow to total river flow is accounted (Arnold and Allen 1996).Also, the shallow aquifer recharge is supposed to be equaled with the percolation from the bottom of the root zone.To estimate potential evapotranspiration, three different methods including Hargreaves, Priestley-Taylor, and Penman-Monteith methods can be used (Arnold et al. 1998).

AMALGAM
AMALGAM (Vrugt and Robinson 2007) takes advantage of the attributes of four different evolutionary optimization algorithms including Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Adaptive Metropolis Search (AMS), and Differential Evolution (DE) (Storn and Price 1997;Haario et al. 2001;Deb et al. 2002).First step in AMALGAM is to create a random initial parameter set (initial population) P i of size of N which is created by Latin Hypercube Sampling (LHS) method.For each sample P i , fast non-domination sorting operator is applied to assign a rank.A proposal population P i of size N is then generated by carrying each possible algorithm out within AMALGAM to create a pre-speciuding Genetic Algorithm (GA), N = N 1 i , N 2 i , … N k i , from P i .By using the fast nondomination sorting operator, the best N solutions within R i = P i ∪ P i are selected into P i+1 , which is produced by the multi-method search and adaptive offspring creation method repeatedly until stopping criteria is satisfied (i.e., reach to maximum number of iteration).The AMALGAM key procedures are the self-adaptive offspring creation and simultaneous multi-method search.In the current study, NSGA-II, PSO, DE, and AMS algorithms were integrated into the AMALGAM following the study of Vrugt and Robinson (2007).The number of offspring points generated by each algorithm ( (1) as follows: where O k i is the corresponding number in the previous generation, and shows the proportion of new population (Vrugt and Robinson 2007).The minimum value for N k i is 5 and

Multi-objective calibration functions
To provide additional hydrological information about watershed behavior and to investigate how carefully the model simulate this behavior, like Kollat et al. (2012), (1) four different objective functions emphasizing on different aspect of streamflow values were employed to be optimized simultaneously:

Nash-Sutcliffe coefficient (NSC)
The NSC coefficient (Nash and Sutcliffe 1970) (Eq.2) is widely used as a most common calibration objective to emphasize on the peak flow errors and indicates how well the plots of the observed versus the estimated flow values fits 1:1 line.
where Q ot is the measured discharge at time t, Q st shows the predicted discharge at time t, and Q o is the mean observed flow value during the calibration processes.NSC ranges from −∞ to 1 (optimal), where the closer the NSC to 1, the more accurate the model is.

Runoff coefficient percent error (ROCE)
The ROCE objective function is calculated using the following equation: where Q s is the annual mean simulated river discharge, and Q o is the annual mean observed river discharge.The mean annual is calculated using Y years of calibration period data.

Transformed root-mean-square error (TRMSE)
Our third calibration objective function involved the Box-Cox transformation of low discharge errors (Box and Cox 1964), abbreviated as TRMSE (Eq.4): where Qot is the measured discharge transformed by Box-Cox at time step t where λ = 0.3, and Qst is the predicted discharge transformed by Box-Cox at time t, and N shows the number of calibration time steps.The summation is done over time from 1 to N. The Box-Cox transformation also causes to reduce the effects of heteroscedasticity in addition to the emphasizing low-flow periods. (2)

Slope of the flow duration curve (SFDCE)
The SFDCE shows the flashiness of a basin which considers minimizing the slope of flow duration curve (Eq.6): where Q s,67% and Q s,33% are the 67th and 33rd percentile of the simulated flows, respectively.Likewise, Q o,67% and Q o,33% are the 67th and 33rd percentiles of the observed flows, respectively.

Parallel computing
The parallel computing has been used in parameter calibration process of SWAT model (SWAT 2012 as a public domain model) in order to test its effectiveness and efficiency.The primary difference between parallelized and non-parallelized AMALGAM algorithms is how they utilize processing resources.Parallelized versions take advantage of multiple cores or threads to speed up optimization tasks, while non-parallelized versions perform these tasks sequentially with a single processing core.The choice between the two depends on the specific optimization problem, available hardware resources, and the need for computational efficiency.For our study area (the Kashafrood River watershed), 20 hydrologic SWAT parameters (Table 1) were calibrated to monthly streamflow measurements made using the proposed parallel computing method.These parameters were finalized based on sensitivity and uncertainty analysis (Afshar et al., 2018); hence, best model performance and acceptable uncertainty in the SWAT modeling may be provided through obtained parameter combination (Her et al., 2015;Abbaspour et al., 2017;Molina-Navarro et al. 2017).The four objective functions described earlier were used as the objective functions to quantify the agreement between the monthly simulated and observed streamflow.Then, the high-dimensional parameter feasible space was explored by AMALGAM to locate the global optimum parameter value set that maximized and/or minimized the objective functions.Then, the SWAT model is linked with the AMALGAM multi-objective optimization algorithm.The optimization process was continued until the predefined termination criteria were satisfied.In this study, the maximum number of function evaluations generated by AMALGAM (equaled by 6000) was defined as the stopping criteria.Initial population consists of 64 individuals sampled through Latin Hypercube Sampling (LHS) method, and hence, it needs to be generated 100 times.

Study area and model setup
The Kashafrood River is a main branch of the Kashafrood watershed in the northeastern of Iran.The watershed area is about 16,870 km 2 .The Kashafrood River catchment, the topography, the location of gauging and meteorological stations, and stream network are showed in Fig. 1.This basin is highly mountainous with elevations ranging from about 391 to 3234 m.The mean annual precipitation and temperature of the catchment are about 340 mm and 13.85 °C, respectively (Afshar et al., 2017).Predominant land use of the basin is being pasture (50.9%), while the other half consists mainly of agricultural land-generic (28.6%), winter wheat (15.6%), forest, urban and water body (4.9%) areas.Digital elevation model (DEM), with a grid size of 30 m, stream network coverage, land use, soil map, and climate data is enclosed as the basic input data to SWAT.Stream network creation was carried out in the environment of Arc-GIS as a DEM map product.Soil data including sand, silt and clay contents, rock fragment content, organic carbon content, soil electrical conductivity (EC e ), water content, porosity, bulk density, saturated hydraulic conductivity (Ks), and soil hydrologic groups were obtained from the soil map prepared by the State Soil Geographic (STATSGO) (USDA, 1994) at a scale of 1:250,000.Climate data, including daily precipitation and temperature values, were obtained from the precipitation and air temperature stations in the study for a period of 20 years (1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011).Figure 2 shows the input maps (soil and land use) to SWAT.
After inputting the soil, slope, and land use maps into the model, each sub-basin is divided into a series of homogenous hydrological response units (HRU).In this study, the basin was subdivided into 217 sub-basins and 635 HRUs.In this study, the watershed was studied for a period 1998-2011, where the first 3 years were considered as a warm-up.The calibration periods was also consisted of 11 years (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011) for runoff data in four hydrometric stations which is shown in Fig. 1.

Computer configuration
When implementing this calibration algorithm, the simulation speed depends mainly on the computer configuration.In this study, an ordinary PC (personal computer) was used to evaluate the running time of the parallelized and nonparallelized AMALGAM algorithms for the calibration of SWAT model.This computer type is widely used in various regions globally, with the following configuration details: It features an Intel (R) Core i7 duo-2.2GHz CPU, comprising 7 physical cores and 8 logical cores.It is equipped with 8GB of RAM and operates on a 64-bit Windows 10 operating system.The total time needed for completing computational tasks of solving the optimization problems using each parallelized and non-parallelized AMALGAM algorithm was recorded.The calculated speedup ratio was incorporated as a surrogate measurement of the parallelization efficiency.The entire process, which involves setting up a hydrological SWAT model and integrating it with parallel computing through AMALGAM algorithm, is represented as a schematic map in Fig. 3.
It is important to highlight that the coupling of SWAT and AMALGAM, as depicted in Fig. 3, is accomplished through a batch file.This file simultaneously edits model parameter values for all Hydrologic Response Units (HRUs)

AMALGAM Parallelization
To demonstrate and evaluate the running of the parallelized AMALGAM, we equipped several model that runs with a number of cores (worker) involved (from one to eight).The developed parallelized AMALGAM algorithm for the calibration of SWAT model illustrated that less time is needed for a model calibration in comparison with non-parallelized AMALGAM algorithm due to more CPU cores involvement in the calibration processes.In addition, the reaching to a less time for model calibration, more CPU cores should be involved with meaning the degree of speed up diminishes for every increased core.It is observed that an eight-processor run can reduce time by about 6% compared to a sixprocessor run.Nevertheless, parallelization of AMALGAM algorithm can reduce the time required for both manual and automatic calibrations substantially.In a similar study, Wu et al. (2013) reported that in the parallelization of SWAT model by using the message passing interface method, the execution time can be reduced by 42-70% while using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes).Her et al. (2015) assessed the performance of two parallel computing methods, 'spmd' and 'parfor,' for improving consuming time efficiency in solving optimization problems in hydrologic modeling applications.They clearly demonstrate effectiveness of the mentioned methods in reducing computational time of the parameter calibration.In this study, we utilized the benefits of the 'spmd' method to significantly reduce the runtime of the SWAT model by up to 45% when compared to standard execution, which does not involve parallel processing.

AMALGAM pareto front
In order to illustrate how meaningful SWAT model calibration trade-off among the four objective functions involved in our AMALGAM optimization program, the Pareto solution set is provided (Figs. 4,5,6 and 7).In this figures, the horizontal and vertical axes are the TRMSE and ROCE objective functions, respectively.The size and color of the circles specified in these graphs contain the values of the NSE and SFDCE functions, respectively.The size and color of the circles specified in the graphs also contain the NSE and SFDCE function values, respectively.In addition, the two horizontal and vertical axes at all stations were considered similarly for a better comparison.The R programming environment was used to draw these figures to simultaneously visualize the values of the four investigated objective functions, and thus, the presentation of 2-D and 3-D (containing of two or three objectives) plots was discarded.
In results, almost in all station, the two objective functions of NSC and TRMSE will never reach to their optimum values (i.e., NSC = 1.00 and TRMSE = 0.00) while other two investigated objective functions (i.e., ROCE and SFDCE) may achieve their optimal values of zero.The best The high density of the Pareto front points in SARAS-SHA and ZIRBAGOL stations is seen clearly so that all points are within a certain and close range.In the other two stations, however, the Pareto front occupies a wide range of ROCE values.The high conformity of the ROCE values with the NSE objective function values is also noteworthy.The NSE and SFDCE objective functions were in contradiction, and their variations were not in a direction.As it is seen, the increase in NSE (the improvement of this

Multi-objective simulation
Since a multi-objective calibration has been performed in several stations in this research, an efficient tool is available to decide about the best simulation based on the investigated objective functions.The obtained results from the Pareto fronts are summarized in Table 2.
In this table, the average of non-dominated Pareto front values for all four objective functions is shown where the best values of each function also are highlighted.Generally, this table depicts that the model had the best simulation performance slightly at the SARASSHA station where the two objective functions of NSC and ROCE have a significant difference compared to the rest.In addition, the values of other two objective functions, which are assigned to the HESDEHB and ZIRBAGOL stations, have a negligible difference with the SARASSHA station.Since the SARASSHA station is located in upstream of the basin, the river regime is more natural with slightest handicap by humans as well as other factors.For this reason, only this station could be considered in connection with social hydrology, which has recently been taken into account in hydrology community.
Figure 8 compares the simulated and observed discharge values relative to the best values for each target function in all hydrometric stations.The SARASSHA station with the highest NSC value has the best simulations for the relatively high discharge to peak values, which is expected due to the behavioral nature of the NSC function.In fact, this function attempts to eliminate the error by minimizing the differences between the simulated and observed high discharge values.

Conclusions
This paper aimed to take advantage of parallel processing to accelerate the computations involved with simulation process of a semi-distributed hydrologic model, (i.e., SWAT) linked with a multi-objective optimization algorithm (AMALGAM) for multi-site calibration of a mountainous watershed.All parallel processing materials (SPMD method) and linkage between SWAT and AMAL-GAM were coded in MATLAB, and running process was carried out to obtain the results.Also, presentation of 4-D plot containing of four objective functions was achieved through coding in R environment.
Finally, the main finding of this paper can be summarized as follows: 1 Because the parallelized AMALGAM with SWAT model may reduce the execution time of a non-parallelized model run, it can substantially reduce the time required for automatic parameter calibrations.In addition, the parallelized AMALGAM algorithm presented herein for calibration of SWAT model as an example can be useful and simply carried out for calibration of other hydrological and water quality models.2 A four objective optimization strategy focusing on high flows (NS coefficient), water balance measure (involved with runoff coefficient error), low flows (Box-Cox transformed root-mean-square error), and flashiness (slope of the flow duration curve error) was applied to calibrate SWAT model using multi-algorithm optimization AMALGAM.The high conformity of the ROCE values with the NSE objective function values is seen in 4-D pareto front plots.On the other hand, the NSE and SFDCE objective functions were in contradiction and their variations were not in a same direction.3 As a main finding of this research, it could be concluded that best simulation with considering of all four objectives is belonged to SARASSHA station which is located in upstream which the river regime is more natural with slightest handicap by humans as well as other factors.
For this reason, only this station could be considered in connection with social hydrology, which has recently been taken into account in hydrology community.4 It can be deduced that when the optimal values for all four objective functions converge within a specific region, as observed, for instance, at the SARASSHA station, and when the Pareto fronts closely overlap, the outcomes generated by the multi-objective algorithm do not significantly deviate from those produced by the single-objective algorithm.

Fig. 1
Fig. 1 Hydrologic network in the Kashafrood River catchment in northeastern Iran, including topography, the location of gauging and meteorological stations, and stream networks

Fig. 2
Fig. 2 Input maps to the SWAT models (land use, and soil maps)

Fig. 3
Fig. 3 Schematic map representing parallel computing of the SWAT model linked with AMALGAM optimization algorithm

Fig. 4
Fig. 4 Pareto front of four objectives for KARTIAN station

Fig. 6 Fig. 7
Fig. 6 Pareto front of four objectives for HESDEHB station

Table 1
Description of the SWAT input parameters selected for the discharge calibration r _: means the existing parameter value is multiplied by (1 + a given value), and v_: denotes the default parameter is replaced by a given value *

Table 2
Average of non-dominated Pareto fronts values for the four objective functions in all stations NSC Nash-Sutcliffe coefficient, TRMSE transformed root-meansquare error, ROCE runoff coefficient percent error, and SFDCE slope of the flow duration curve