Calibration of conceptual rainfall-runoff models by selected differential evolution and particle swarm optimization variants

The performance of conceptual catchment runoff models may highly depend on the specific choice of calibration methods made by the user. Particle Swarm Optimization (PSO) and Differential Evolution (DE) are two well-known families of Evolutionary Algorithms that are widely used for calibration of hydrological and environmental models. In the present paper, five DE and five PSO optimization algorithms are compared regarding calibration of two conceptual models, namely the Swedish HBV model (Hydrologiska Byrans Vattenavdelning model) and the French GR4J model (modèle du Génie Rural à 4 paramètres Journalier) of the Kamienna catchment runoff. This catchment is located in the middle part of Poland. The main goal of the study was to find out whether DE or PSO algorithms would be better suited for calibration of conceptual rainfall-runoff models. In general, four out of five DE algorithms perform better than four out of five PSO methods, at least for the calibration data. However, one DE algorithm constantly performs very poorly, while one PSO algorithm is among the best optimizers. Large differences are observed between results obtained for calibration and validation data sets. Differences between optimization algorithms are lower for the GR4J than for the HBV model, probably because GR4J has fewer parameters to optimize than HBV.


Introduction
Metaheuristics are widely used for the optimization in hydrology (Jahandideh-Tehrani et al. 2020;Maier et al. 2014), especially for conceptual catchment runoff models. Among various kinds of metaheuristics, Particle Swarm Optimization (PSO) (Kennedy and Eberhart 1995) and Differential Evolution (DE) (Storn and Price 1997) are two landmark examples of Swarm Intelligence and Evolutionary Algorithms (Boussaid et al. 2013). Both were proposed in the mid-1990's and gained widespread popularity in hydrological applications (Jahandideh-Tehrani et al. 2020;Okkan and Kirdemir. 2020b;Maier et al. 2014;Kisi et al. 2010;Xu et al. 2022). DE turned out also a stepping point for development of Markov Chain Monte Carlo-based differential evolution adaptive Metropolis (DREAM) approach (Vrugt et al. 2009). Both DE and PSO algorithms are popular and widely considered to be effective, so that they have been frequently hybridized together into a single algorithm (Xin et al. 2012;Parouha and Verma 2022). Such PSO-DE hybrids were used in water-related applications like optimal localization of hydrocarbon reservoir wells (Nwankwor et al. 2013), design of water distribution system in big cities (Sedki and Ouazar 2012), design of hydraulic structures (Singh and Duggal 2015) or suspended sediment load estimation (Mohammadi et al. 2021).
Because the performance of various optimization methods may be highly uneven for particular application, one may find numerous large-scale comparisons among optimization algorithms in literature (Kazikova et al. 2021;Tharwat and Schenck 2021;Ezugwu et al. 2020;Price et al. 2019;Bujok et al. 2019;Piotrowski et al. 2017a), including some guidelines how to organize such comparisons (Swan et al. 2022;LaTorre et al. 2021). One may also find many comparison studies in which various kinds of metaheuristics are applied for different types of catchment runoff models. Among papers published during last few years, Edited by Dr. Michael Nones (CO-EDITOR-IN-CHIEF). Jahandideh-Tehrani et al. (2021) compared PSO against Genetic Algorithm, Adnan et al. (2021) tested PSO against Grey Wolf optimizer, Tikhamarine et al. (2020) compared PSO against Harris-Hawks optimizer, Okkan and Kirdemir (2020a) proposed a hybrid PSO algorithm and compared it against five metaheuristics: the basic PSO, the basic DE, Genetic Algorithm, Invasive Weed Algorithm, and Artificial Bee Colony method, Hong et al. (2018) compared DE against Genetic Algorithm, Piotrowski et al. (2017b compared large field of 26 diversified metaheuristics, and Tigkas et al. (2016) compared Shuffled Complex Evolution, Genetic Algorithm and Evolutionary Annealing. Good reviews of older studies may be found in Meier et al. (2019) and Reddy and Kumar (2020).
There are, however, two main problems with the application of DE and PSO algorithms in hydrology. First, despite the popularity of both PSO and DE in water-related studies, there is no paper that directly compares various variants from PSO and DE families of methods for catchment runoff modelling. Second, plenty of DE and PSO algorithms have appeared in recent decade (Das et al. 2016;Bonyadi and Michalewicz 2017;Bilal et al. 2020;Shami et al. 2022), and many of them perform much better than the basic DE and PSO versions (e.g. Tanabe and Fukunaga 2014;Piotrowski et al. 2017a;Bujok et al. 2019). However, in many hydrological applications only the simplest, over 20-year-old versions of either DE or PSO are used. As a result, one cannot find out which kind of algorithms are de facto more efficient in solving hydrological problems, especially in calibration of rainfall-runoff models.
In the present paper, we aim at detailed and thorough comparison of DE versus PSO algorithms applied for calibration of rainfall-runoff models. One may as well find plenty of other Evolutionary Algorithms applied to this task (Cantoni et al. 2022;Okkan and Kirdemir 2020a, b;Kumar et al. 2019;Dakhlaoui et al. 2012;Gan and Biftu 1996), but the present study is restricted to the comparison solely between DE and PSO variants. Instead of using historical versions of DE and PSO, we test relatively recently proposed variants that may currently be considered the state of the art. For comparison purposes, we have selected five DE and five PSO variants that were proposed between 2012 and 2022. These ten algorithms are applied for calibration of two conceptual rainfall-runoff models, namely HBV (Hydrologiska Byrans Vattenavdelning model ;Bergström 1976;Lindström 1997) and GR4J (modèle du Génie Rural à 4 paramètres Journalier; Perrin et al. 2003). The research is performed on the Kamienna catchment that is located in the central part of Poland. We mainly focus on the relative performance of DE and PSO algorithms in calibration of hydrological models, as we wish to find out which family of methods perform better for this task. Direct comparisons between two hydrological models is considered to be of secondary importance in this paper.

Rainfall-runoff models
We consider two lumped conceptual catchment runoff models that are built of interconnected reservoirs with mathematical transfer functions used to describe the transfer of water between reservoirs and into the river.

HBV
The HBV model with a snow routine, proposed by Bergström and Forsman (1973), has been used in dozens of countries around the world. In the majority of these applications, the modified versions of the original HBV model have been used (Bergström 1976;Bergström and Lindström 2015). A block diagram of a particular version of the HBV model applied in this paper is shown in Fig. 1. A detailed description of the HBV model components for the version adopted in this paper is given in Piotrowski et al. (2017b).
The input variables of the model are: daily precipitation, average daily air temperature and daily potential evapotranspiration (PET). Precipitation can take the form of rain, Fig. 1 Structure, conceptual storages and parameters of the HBV model snow, or a mixture of snow and rain, which is described using the threshold temperature (TT) and the temperature interval (TTI). At temperatures lower than the lower limit (TT-0.5 TTI) only snow occurs, and at temperatures higher than the upper limit (TT + 0.5 TTI) only rain falls. In the interval between these limits, precipitation is a mixture of rain and snow, decreasing linearly from 100% snow in the lower limit to 0% in the upper limit.

GR4J
The original GR4J conceptual model is a daily lumped fourparameter catchment runoff model that takes into account changes in soil moisture and can be used for temperatures greater than zero (Perrin et al. 2003). Since our study is concerned with the catchment located in Polish climatic conditions, where snow plays an important role, the original model is extended by adding a snow module (Fig. 2). However, the original name GR4J is retained in this paper. This extended version has seven parameters, three of which (TT, TTI, CFMAX) relate to the snow routine. All GR4J parameters are listed in Table 2 with a brief description. A detailed description of the GR4J model can be found in Perrin et al. (2003).
The input variables to the GR4J model are the same as the HBV model. Similarly to the HBV model, precipitation may take the form of rainfall, snowfall or a mixture of snowfall Nonlinearity parameter is related to the conceptual storage representing soil moisture CFLUX The capillary transport between the fast runoff reservoir and the soil moisture reservoir KF Parameter of the fast runoff reservoir α Another parameter of the fast runoff reservoir PERC Percolation-the transport between the fast and slow run-off reservoirs KS Parameter of the slow runoff reservoir  and rainfall. Snowmelt is assumed to be directly proportional to the temperature and is computed by means of the degreeday method.

Optimization algorithms
This paper focuses on direct comparison between two families of optimization algorithms: Particle Swarm Optimization (PSO) and Differential Evolution (DE), for conceptual rainfall-runoff model calibration. After a quarter century of research, hundreds of DE and PSO variants could be found in the literature (Das et al. 2016;Bonyadi and Michalewicz 2017

Differential evolution and its variants
The classical Differential Evolution algorithm (Storn and Price 1997) defines a movement of population of NP individuals (solutions vectors) in D-dimensional decision space, where D is the number of parameters to be optimized, in a search for the global optimum. In generation g = 0, NP individuals: i,g = x 1 i,g , … , x D i,g , i = 1,…,NP are initialized at random according to the uniform distribution: Here, rand j i (0,1) is a random value within (0,1) interval that is generated separately for each j-th element of i-th individual. L j and U j are lower and upper bounds that define the After initialization of the population of solutions, in each g every individual makes a move across the search space following the three operations: mutation, crossover and selection. In the basic DE, the mutation is defined as: and is followed by the crossover. (1) In Eq. (2) r1, r2 and r3 are three different (r1 ≠ r2 ≠ r3 ≠ i) integers that are randomly chosen from the range [1, NP]. In Eq. (3) j rand,i is another integer, randomly selected within [1, D]. Note that there are three control parameters of the basic DE variant, NP, CR and F, which, in the basic DE, need to be defined by the user.
As much as the search space is often bounded (i.e. the values of model parameters to be calibrated are restricted within some range), some verification is needed after crossover to check whether the new solution u i,g is within the bounds (Kononova et al. 2021). If u i,g turns out to be outside the bounds, it has to be forced to stay within the search domain (e.g. by using some of the methods discussed in Helwig et al. (2013) and Kadavy et al. (2022)). After that the objective function is called for the solution u i,g that is within the bounds and one obtains its goodness of fit f(u i,g ) that represents the quality of the solution u i,g . Finally, selection operation is performed to choose only the better among x i,g and u i,g to enter the next generation.
After repeating the above procedures for each individual in the population, the NP individuals proceed to the g + 1 generation. The algorithm repeats the same steps in the subsequent generations until some stopping conditions are reached. In the present study, the maximum number of function calls set to 20,000 is defined as the stopping condition.
The majority of modern DE variants are much more complicated than the basic version from 1997, defined above (e.g. see Mohamed et al. 2021). A detailed review of DE variants may be found in Das et al. (2016), Al-Dabbagh et al. (2018) or Opara and Arabas (2018). The modern variants often adaptively modify the control parameters F and CR (Brest et al. 2006;Tanabe and (Piotrowski 2018;Caraffini and Neri 2019;Cai et al. 2020). DE is also sometimes hybridized with other metaheuristics (Gong et al. 2010;Xin et al. 2012;Awad et al. 2017). In the present study, we compare five advanced DE variants which are defined in Table 3. The detailed description of these algorithms may be found in the source papers. The control parameters of algorithms are the same as suggested in the source papers, but we provide an information on the population size used in Table 3.

Particle swarm optimization variants
Particle Swarm Optimization (Kennedy and Eberhart 1995) is a very popular stochastic population-based algorithm, inspired by the behavior of the swarm of animals. In PSO, the solutions (called particles) move across the D-dimensional search space all the time, but remember the best location they have visited so far. As in DE, initial positions x i,0 of NP PSO particles (i = 1,…, NP) are usually generated randomly within the bounds of the search space (Eq. (1)). However, in PSO each particle has an associated velocity vector. Depending on the specific PSO variant, the initial velocities v i,0 of each particle are either set to 0 or are generated from some pre-specified interval, which frequently depend on the differences between upper and lower bounds of the search space. The fitness value f(x i,0 ) is evaluated for each newly generated particle. Then in each generation g the particles are moving through the search space according to the following equation: where j = 1,…,D, pbest i,g and gbest g are the best positions visited during the search by i-th particle and the best position visited by any particle in the swarm, respectively. rand1 i,g j (0,1) and rand2 i,g j (0,1) are two random numbers generated at each generation from [0,1] interval for each i and j index separately, and c 1 and c 2 are acceleration coefficients (algorithm parameters to be set by the user). As may be seen, for each i-th particle three vectors are remembered-its current position x i,g , the best position pbest i,g visited by the i-th particle since the initialization of the search and i-th particle current velocity v i,g . The parameter w is the so-called inertia weight that was first introduced by Shi and Eberhart (1998). Like in the case of DE, modern PSO variants are often much more complicated than the initial version-for a survey readers are referred to Bonyadi and Michalewicz (2017), Cheng et al. (2018), and Shami et al. (2022). Modern PSO variants use different topologies-under this term we mean the communication possibilities between individuals (Lynn et al. 2018;Xia et al. 2020;Li et al. 2022), theoretically or empirically modify the values of control parameters (Clerc and Kennedy 2002;Harrison et al. 2018;Piotrowski et al. 2020;Cleghorn and Stapleberg 2022;Meng et al. 2022), introduce novel equations for movement of particles (Santos et al. 2020;Li et al. 2021;Houssein et al. 2021), bring together  Table 3.

Major differences between DE and PSO
Technically, PSO is a major algorithm within a family of Swarm Intelligence that is based on the communal behavior of animals, and DE is a version of Evolutionary Computation that is based on the evolutionary principles of life. However, such inspiration-focused differences are irrelevant from optimization point of view ( First of all, in DE the particular individual verifies the new solution in each generation, but moves to the new location only if it is not inferior to the solution at which it was located at the beginning of the generation. It means that DE population may test new locations, but stay in the former ones until some promising region of the search space is found. Because the probability of visiting particular part of the search space is a function of the current location of individuals in DE population, such lack of movements may lead to stagnation (Weber et al. 2009) and hamper proofs of convergence (Hu et al. 2016;Opara and Arabas 2018). However, this feature assures that each individual is always located in the best place it has visited so far, and the location of the whole population is a kind of space-based memory of the high-quality solutions. On the contrary, the PSO particle in each generation moves and stays in the new location, irrespective how poor it is. As a result, the particle requires an additional memory in which a best solution it has visited so far is remembered. PSO particles may fly all around, and may have a problem with returning to the promising solutions (Van der Bergh and Engelbrecht 2006). This inspired researchers to determine the relations between the trajectories of PSO and the values of control parameters or topologies in an analytical way (Clerc and Kennedy 2002;Harrison et al. 2018;Cleghorn and Stapleberg 2022).
Another main difference between DE and PSO is the crossover Eq. (3). In almost all DE variants, the sampled solution is a mix of the former solution and a solution that comes up as a result of initial move (which in most DE variants will be an extended version of Eq. (2)). The crossover is useful as it allows keeping some information from the previous solution within the newly tested one. It limits the diversity, but enhances the chance of finding a better solution; without that the number of successful steps would often be very low in DE, and population could stagnate for a long time in the former location. As PSO performs moves all the time, crossover is not necessary (although it has been tested in some PSO variants, e.g. Engelbrecht 2016; Gong et al. 2016;Molaei et al. 2021).
Finally, the majority of new DE algorithms use adaptive control parameters and population size modification schemes (Al-Dabbagh et al. 2018). Although, as noted above, adaptive and variable population-size PSO variants are also numerous, they are not as clearly superior to the variants with fixed but carefully chosen values (Bonyadi 2020). PSO variants that adaptively modify acceleration coefficients (c 1 , c 2 ) are relatively rare (for examples see Harrison et al. 2018) and do not much improve the performance.

Description of the study area
The study is performed on the Kamienna River catchment. The Kamienna Catchment is located in the Central Vistula basin in the Polish Upland area and covers 2020 km 2 (Fig. 3).
The main river of the catchment is the Kamienna (left tributary of the Vistula River), whose sources are located at the border of the Masovian and Świętokrzyskie provinces above the town of Skarżysko-Kamienna in the mountainous area. The river is 156 km long and runs from west to east, predominantly through the Świętokrzyskie Province. The catchment elevation varies from about 130-600 m a.m.s.l. There are large variations of the longitudinal slope of the channel in the upper part of the Kamienna River (around 10%). This part has a mountainous character up to Skarżysko-Kamienna, from where the slope gradually decreases, reaching near Kunów about 0.7% (Lenar-Matyas et al. 2006). The catchment area is prone to natural and human hazards (FramWat 2019). Human activities focused on increase in water retention in the catchment by constructing many small artificial reservoirs and two large ones: Wióry and Brody Iłżeckie.
According to the climate classification of Köppen-Geiger (adapted by Peel et al. 2007), the Kamienna Catchment climate is "cold" with no dry season and a warm summer. Annual areal precipitation for the period 1968-2018 varies from 410 to 920 mm, with a long-term annual mean of 600 mm, while the long-term monthly mean varies from about 30-90 mm (Senbeta and Romanowicz 2021). The minimum and maximum precipitations occur in winter and summer, respectively. The mean monthly temperature in the watershed in the same observational period varies from − 3.1 to 18.3 °C, with the minimum and maximum in January and July, respectively (Senbeta and Romanowicz 2021). The land use structure of the study catchment is dominated by agriculture (46.3%), a significant part of the area is also occupied by forest and semi-natural land (43.3%); other parts are artificial land and water bodies, 10% and 0.4%, respectively.

Dataset
Data used include daily hydrological and climatological variables, namely streamflow, air temperature, precipitation and potential evapotranspiration (PET) in and around the watershed. These data were collected for the historical period 1968-1982, during which the catchment could be considered free from anthropogenic influences. After 1982, the artificial reservoirs were constructed in the catchment, which changed the flow regime. The periods 1968The periods -1970The periods , 1971The periods -1976The periods and 1977The periods -1982 were used for warm-up, calibration and validation, respectively. Hydroclimatic data were obtained from the Institute of Meteorology and Water Management (https:// dane. imgw. pl/).
The temperature-based method was used to estimate PET at each meteorological station. As both the HBV and GR4J models are lumped, temperature, precipitation and PET in the catchment were averaged using Thiessen polygon method.

Comparison criteria
Both HBV and GR4J models are calibrated using mean square error (MSE). As a result, we compare algorithms using exactly the same criterion that was used as objective function during search. Each algorithm is run 30 times on every model (HBV and GR4J). The mean, median, best, and worst performances from these 30 runs obtained for calibration and testing datasets are used for comparison. This is a frequently adopted compromise between the confidence in the result's quality (the more runs, the more reliable results), and the applicability (more runs means more time for computation). As shown in Vecek et al. (2017), the number of runs has only moderate impact on the final conclusions from the research. In addition, we also report the standard deviation of the results obtained based on 30 runs. This allows us to discuss both the averaged performance, extremes, and the consistency of solutions found by particular optimization algorithm.

Calibration of the HBV model
Each time the model is calibrated, some data need to be set aside and not be available to calibration algorithm-we call this data set validation (or testing) data. This validation dataset is important, because it detects a potential model's overfitting. For obvious reason, we want our model to work correctly not just for the data that were used during calibration (calibration set), but also for some future, unknown data. Therefore, validation data set is needed to verify the practical effects of calibration. Thus the discussion of the results may be divided into two parts-the first covers the comparison based on the calibration data, and the second includes the comparison based on the validation data (see Table 4).
Based on the calibration dataset, two algorithms, namely PPSO and HARD-DE, appear to be the best ones for the HBV model. When comparing the performance based on the mean or median from 30 runs, the results obtained by PPSO are the best. In particular, the low median obtained by PPSO (14.039) shows that this algorithm often leads to the results with high performance. Three algorithms (HARD-DE, MDE_pBX and L-SHADE) achieve equal median (14.505) indicating all these algorithms frequently find a similar, although sub-optimal solution. However, according to the average values HARD-DE performs better than MDE_pBX and L-SHADE.
In contrast, OLSHADE-CS leads to by far the poorest results, with MSE that is over 30% (median MSE = 20.365) higher than that of PPSO and HARD-DE. In terms of the mean and median, PSO-based PPSO looks as the winner, The mean and median are not the only metrics to compare algorithms. Many users would simply be rather interested in the best results found. When one compares the best and the worst solutions obtained during 30 runs, HARD-DE becomes a winner. Indeed, the best solution found by  is about 7% better than the best solution found by PPSO (13.047). Moreover, HARD-DE never found a solution worse than the median (14.505), while the worst solution found by PPSO is over 10% poorer (16.161). We may also observe that the ranking of algorithms based on the best solutions found is generally different than rankings based on the mean or median. One should especially note that TAPSO and EPSO were able to find best solutions with lower MSE than the best solutions found by DE algorithms except HARD-DE. However, DEPSO and PSO-sono were not able to outperform DE methods anyway. Hence, according to the ranking based on the best solutions found, DE still outperforms PSO in general, but the relative positions of specific algorithms are different, and the whole picture is more complicated.
The superiority of DE over PSO algorithms on the calibration data set is probably an effect of the behavior of both families of algorithms. In the recent, efficient DE variants, the control parameters are often made adaptive. Hence they are flexibly being modified during search (Das et al. 2016;Al-Dabbagh et al. 2018), whereas the control parameters of PSO are more frequently set fixed throughout the whole search (e.g. Clerc and Kennedy 2002;Harrison et al. 2018). This flexibility of control DE parameters may give DE algorithm additional chances to cope with complicated fitness landscape of each specific problem, in cases where the PSO variants with fixed control parameter values are more conservative. Another difference that may partly justify better performance of DE is the selection operator. DE algorithms reject poorer solutions that are found during search, and move only to the better locations. Hence, the current DE population is composed of solutions that are better than all their predecessors. On the contrary, PSO algorithms keep moving all the time, and will produce final generations in both better or poorer locations. As a result, PSO variants may be considered more chaotic, and less effective in finding the precise location of the optima.
The results obtained for the validation dataset are over twice poorer than the results that were obtained for the calibration data set. It is, however, rather up to the data, not the calibration process. Considering mean and median measures, the PPSO algorithm is the winner for testing dataset, as it was for calibration dataset. However, the quality of solutions found by HARD-DE, MDE_pBX and L_SHADE is frequently not confirmed on testing dataset. All three algorithms achieved the second-best median on the calibration dataset, but the median MSE for the validation dataset is only 6th-8th, which is 10% poorer than the median MSE obtained by PPSO. In contrast to the calibration dataset, the PSO-based algorithms do not perform poorer than the DEbased ones on the validation dataset. It may suggest that finding the exact optimum based on the calibration data set is of moderate importance when the incoming data would significantly change (e.g. Beven 2006;Beven et al. 2022). Nonetheless, the overall best solution found by any method for the validation dataset again belongs to  and is again about 8% better than the best solution found by PPSO (35.099). This means that, in some sense, DE is still better than PSO on validation data, as one of DE variants is able to find much better result than all other competing algorithms. Whether one prefers to look at the mean or the best results is up to the user's taste.

Calibration of the GR4J model
Contrary to the HBV model, the calibration of the GR4J model seems to be much simpler, and almost all algorithms, apart from OLSHADE-CS, lead to almost the same median and best results. Only mean MSE slightly vary, and the DE-based methods (excluding OLSHADE-CS), especially HARD-DE, achieve clearly better mean results than the PSO algorithms (Table 5). This indicates that algorithms rather compete in terms of failures, not in finding the best results. The poor performance of OLSHADE-CS may be due to its very slow convergence, and may be a side-effect of the fact that OLSHADE-CS was initially tested on, and probably fitted to, problems with very large number of allowed function calls (see Kumar et al. 2022 for initial tests).
The mean MSE obtained by ) is by about 7% better than the mean MSE obtained by PPSO (16.567). This difference is also kept for the validation dataset, where HARD-DE is also about 8% better than PPSO. However, for the validation dataset surprisingly the best solution found by PPSO is better than the best solution found by HARD-DE. Moreover, the overall best solution of the GR4J model for the validation dataset is even found by the other PSO-based method, PSO-sono. This may look as the opposite finding compared with the one noted for HBV model. Nonetheless, the results obtained for the GR4J model are much less diversified than those for the HBV model. This may be due to much smaller number of parameters to be optimized-small number of parameters may lead to lower differences in performance between algorithms.

Conclusion
The present paper discusses numerous state-of-the-art variants from the PSO and DE optimization methods which were applied to calibration of the catchment runoff models. In the literature dealing with computational optimization methods, no broader comparison of performance between PSO and DE families has been presented so far. We have chosen five DE and five PSO variants proposed between 2012 and 2022. These ten algorithms were applied for calibration of two conceptual rainfall-runoff models: HBV and GR4J, on the Kamienna catchment located in the middle part of Poland. We aimed at finding whether the DE or PSO algorithms would be better suited for calibration of rainfall-runoff models. Furthermore, we focused on the relative performance obtained by the algorithms from the two different modern families in calibration of hydrological models, rather than comparing the results obtained by both conceptual rainfallrunoff models.
We show that the results obtained by different optimizers applied are roughly similar for the GR4J model, which has very few parameters. For the GR4J model, one may rather point at an inferior algorithm-OLSHADE-CS, rather than a winner, as many optimizers performed very similarly. No clear difference between PSO and DE methods could be found. This is probably because the GR4J model has low number of parameters that are relatively simple to calibrate. However, among the best results found during many runs, those found by two PSO variants (PPSO and PSO-sono) are better than those found by their DE competitors.
In the case of the HBV model, the results were much different. OLSHADE-CS showed the poorest performance as well, but the results obtained by other algorithms were diversified. Which exact method could be a winner depends on whether one focuses on calibration, or validation dataset, and whether one is interested in the mean/median performance, or in finding the best possible solution in one among 30 runs. Overall, two algorithms, PSO-based PPSO and DE-based HARD-DE performed best on the HBV model calibration. Comparison between both families of methods reveals that, in general, the DE algorithms slightly outperformed the PSO ones. The difference was, however, clearer for the calibration dataset than the validation dataset. We may recommend using adaptive variants of algorithms for model calibration, especially those that have flexible control parameters (e.g. HARD-DE) or advanced topology (e.g. PPSO) that may automatically tune the speed of information exchange between individuals within the population managed by the algorithm. DE algorithms seems to be more appropriate choice for the calibration of rainfall-runoff models than PSO variants, but the difference between their final performances is limited and depends on the measure that is used to create the ranking of algorithms.