Broadly speaking, the basic idea behind PSO is to use the so called swarm intelligence (see Bonabeau et al. 1999), that drives groups of individuals belonging to the same species when foraging. Every member of the swarm explores the search area keeping memory of its best position so far outreached, and it exchanges this information with the individuals belonging to a suitable neighborhood in the swarm. Thus, the whole swarm is supposed to converge eventually to the best global position reached by some swarm members. From a mathematical point of view, every member of the swarm (namely a particle) represents a possible solution point in the feasible set of the investigated optimization problem. Moreover, every particle is randomly initialized before PSO procedure is applied and it is also associated with a vector of velocity, also randomly initialized, which is used to determine its direction of movement.
Now, in order to formalize these ideas, let us denote with P the size of the swarm, and let \(f: {\mathbb {R}}^n \mapsto {\mathbb {R}}\) be the function to minimize. For each particle \(l=1,\dots ,P\), let \({\mathbf {x}}_{l}^k \in {\mathbb {R}}^n\) be its position at step \(k{ = 0, 1, \ldots }\) of the PSO procedure. Then, its new position at step \(k+1\) is
$$\begin{aligned} {\mathbf {x}}_{l}^{k+1}={\mathbf {x}}_{l}^k+{\mathbf {v}}_{l}^{k+1}, \qquad l=1,\dots ,P, \end{aligned}$$
(10)
where \({\mathbf {v}}_{l}^{k+1}\) is an n-real vector which is usually addressed in the literature as velocity, and is given by
$$\begin{aligned} {\mathbf {v}}_{l}^{k+1}=\chi ^k \left[ w^k{\mathbf {v}}_{l}^{k}+\varvec{\alpha }_{l}^{k}\otimes ({\mathbf {p}}_{l}^k-{\mathbf {x}}_{l}^k)+\varvec{\beta }_{l}^{k}\otimes ({\mathbf {p}}_{g}^k-{\mathbf {x}}_{l}^k)\right] . \end{aligned}$$
(11)
In (11) \(\chi ^k>0\) is a suitable parameter, \({\mathbf {v}}_{l}^{k}\) is the previous vector of velocity, and \({\mathbf {p}}_{l}^k\) and \({\mathbf {p}}_{g}^k\) are, respectively, the best solution so far found by particle l and by the whole swarm, i.e.
$$\begin{aligned} {\mathbf {p}}_{l}^k\in & {} \arg \min _{0\le h \le k} \left\{ f({\mathbf {x}}_{l}^h)\right\} , \qquad l=1,\dots ,P,\end{aligned}$$
(12)
$$\begin{aligned} {\mathbf {p}}_{g}^k\in & {} \arg \min _{1 \le l \le P} \left\{ f({\mathbf {p}}_{l}^k)\right\} . \end{aligned}$$
(13)
Finally, \(\varvec{\alpha }_{l}^{k}\), \(\varvec{\beta }_{l}^{k} \in {\mathbb {R}}^n\) are positive random vectors, with the symbol \(\otimes \) denoting the component-wise product. The most commonly used specifications in the literature for \(\varvec{\alpha }_{l}^{k}\), \(\varvec{\beta }_{l}^{k}\), which we will adhere to, are (see Blackwell et al. 2007)
$$\begin{aligned} \varvec{\alpha }_{l}^{k}= & {} {c_1 {\mathbf {r}}_{l,1}^k,} \end{aligned}$$
(14)
$$\begin{aligned} \varvec{\beta }_{l}^{k}= & {} {c_2 {\mathbf {r}}_{l,2}^k,} \end{aligned}$$
(15)
where \({\mathbf {r}}_{l,1}^k\), \({\mathbf {r}}_{l,2}^k\) are n-real vectors whose entries are uniformly randomly distributed in [0, 1], and \(c_1\), \(c_2 \in (0, 2.5]\). A full discussion on the choice of the parameters in (11) can be found in Serani et al. (2016).
In the remainder of this section, we present our original enhancements to PSO, in order to improve its effectiveness when tackling the optimization problem. Initially, in order to manage the constraints of (8), we replace the penalty function approach proposed and used in Corazza et al. (2015b) with a nonlinear reformulation of the optimization problem. This allows us to avoid possible numerical ill-conditioning when assessing the penalty parameters. Then, with reference to the initialization of the particles, we endow PSO with some recently introduced initialization procedures never used before in this research area.
A nonlinear reformulation of the optimization problem (8)
Since PSO was conceived for unconstrained problems, its direct application to (8) cannot prevent from generating infeasible particles’ positions when constraints are considered. To avoid this problem, different strategies have been proposed in the literature, and most of them involve repositioning of the particles (see for instance Zhang et al. 2005), or the introduction of some external criteria to rearrange the components of the particles (see for instance Cura 2009), or the use of a penalty function approach (see for instance Corazza et al. 2013, 2015b). On the contrary, in this paper we consider a novel approach which first encompasses a nonlinear reformulation of (8), so that we can maintain PSO as in its original iteration. Initially, with reference to the endogenous determination of only the weights \(\{w_j\}\), in place of (8) we can solve the unconstrained optimization problem
$$\begin{aligned} \min _{t_1, \ldots ,t_n} {\mathcal {I}} \left[ w_1({\mathbf {t}}), \ldots ,w_n({\mathbf {t}})\right] , \end{aligned}$$
(16)
where the mapping between the new variables, i.e. \((t_1, \ldots , t_n)\), and the old ones, i.e. \((w_1, \ldots , w_n)\), is given by the following nonlinear transformation
$$\begin{aligned} w_j({\mathbf {t}}) \leftarrow \frac{\psi _j({\mathbf {t}})}{\displaystyle \sum _{i=1}^n \psi _i({\mathbf {t}})}, \qquad j=1, \ldots ,n, \end{aligned}$$
(17)
where we assume \(\psi _j({\mathbf {t}})\) continuous and \(\psi _j({\mathbf {t}}) \ge 0\). Observe that any global solution \(\overline{{\mathbf {t}}} \in {\mathbb {R}}^n\) of (16) corresponds to a global solution \(\overline{{\mathbf {w}}} \leftarrow {\mathbf {w}}(\overline{{\mathbf {t}}})\) of (8), where \(\mathbf {w(t)}=(w_1({\mathbf {t}}), \ldots , w_n({\mathbf {t}}))^T\), even though (17) surely introduces nonlinearities which in principle increase the computational complexity. Apart from the last drawback, we remark that in case the vector \( {{\bar{\mathbf {t}}}} \ne 0\) solves (16), by (17) the vector \({\mathbf {w}}( {{\bar{\mathbf {t}}}})\) is a feasible point for (8).
Considering that on the overall we are adopting a metaheuristic procedure to approximately solve (8), the following two simple choices may be considered, with some care, for \(\psi _j({\mathbf {t}})\) in our framework, namely
$$\begin{aligned} \psi _j({\mathbf {t}})= & {} t_j^2, \qquad j=1, \ldots ,n, \end{aligned}$$
(18)
$$\begin{aligned} \psi _j({\mathbf {t}})= & {} |t_j|, \qquad j=1, \ldots ,n. \end{aligned}$$
(19)
In particular, we adopted (18) in our numerical experience (but also (19) might have been an appealing choice). The transformation (18) is continuous everywhere, excluding a neighborhood of t = 0, and t = 0 cannot correspond to any feasible set of weights in (8). Thus, recalling that PSO is a metaheuristics which does not guarantee any convergence to a global minimum, the use of (18) to large extent can be seen as a reasonable expedient to transform the constrained problem (8) into the unconstrained one (16), without a significant additional computational burden. We remark that in our implementation of PSO for solving (16), we introduced a check to prevent from possible division by small values in (17), when (18) is adopted. Anyway, in our numerical experience the latter test was never satisfied, since the algorithm likely tended to approach a global minimum and skip the point \({\mathbf {t}}=0\).
As concerns the instance in which both weights and indifference thresholds have to be endogenously determined, similarly to (17), to prevent the possible generation of infeasible values for the decision variables \(\{q_j\}\), we adopted the quadratic transformation \(q_j \leftarrow u_j^2\), with \(j=1, \ldots ,n\), and we solved by PSO an unconstrained optimization problem analogous to (16), for any of the two objective functions (9).
The penalty function approach, for comparison purposes
The (novel) unconstrained reformulation (16) of problem (8), which allows to keep PSO as in its original iteration, is not the only possible one. Indeed, as already mentioned, a penalty function approach is proposed and used in Corazza et al. (2013, 2015b) for reformulating the mathematical programming problems therein involved. In this subsection we synthetically present this latter approach; the results of its application are presented for comparison purposes in Subsect. 7.1.
Roughly speaking, in case of minimization as in (8), the idea behind the penalty function approach is to reformulate the starting constrained optimization problem in a new unconstrained one, in which the new objective function is given by adding a properly penalized sum of all the constraints violations to the original objective function. With particular reference to (8), the application of this approach leads to the following unconstrained optimization problem
$$\begin{aligned} \displaystyle \min _{w_1,\dots ,w_n} {\mathcal {I}}(w_1,\dots ,w_n) + \frac{1}{\epsilon } \left( \sum _{j=1}^n \max \left\{ 0,-w_j\right\} + \left| \sum _{j=1}^n w_j - 1 \right| \right) , \end{aligned}$$
(20)
where \(\epsilon \) is the so-called penalty parameter. It is possible to prove that there exists a penalty parameter value \(\epsilon ^*\) such that, under mild assumptions, for any \(\epsilon \in (0, \epsilon ^*)\) the solutions of (20) and of the constrained problem (8) coincide (see for details Corazza et al. 2013, 2015b and the references therein).
As regards the instance in which both weights \(\left\{ w_j\right\} \) and indifference thresholds \(\left\{ q_j\right\} \) have to be endogenously determined, the structure of the new objective function remains basically the same. The only difference consists of including constraints violations associated to \(q_j \ge 0\), with \(j=1, \ldots , n\).
Notice that, by construction, the penalty function approach suffers from some drawbacks which, on the contrary, are not present in the nonlinear reformulation approach proposed and used in this paper. In particular we have the following:
-
The proposal in Subsect. 5.1 always ensures that the found (sub)-optimal solution of problem (8) is always feasible; on the contrary, the last property does not necessarily hold for the penalty function-based approach;
-
The approach in Subsect. 5.1 does not require the burdensome assessment of the penalty parameter \(\epsilon \);
-
As a well known drawback from the literature (see, e.g., Corazza et al. 2013), in the penalty function-based approach the use of small values of the penalty parameter \(\epsilon \) may likely yield ill-conditioning; conversely, large values of \(\epsilon \) might not be satisfactory in order to guarantee the feasibility for the found solution.
The initialization procedures
As for every evolutionary algorithm, PSO performance depends on the choice of its parameters \(\chi \), w, \(c_1\), \(c_2\) and on the initial positions and velocities of the swarm, that is \({\mathbf {x}}_{l}^0\), \({\mathbf {v}}_{l}^0 \in {\mathbb {R}}^n\) for \(l=1,\dots ,P\). For the choice of the parameters, we complied with standard settings in the literature. As regards the initialization of the algorithm, we applied and compared three different proposals: the standard random one, mainly adopted in the literature, and two deterministic ones, namely Orthoinit and Orthoinit+, recently proposed in Corazza et al. (2015a) and Diez et al. (2016), respectively. The idea behind these two novel initializations is to scatter particle trajectories in the search space in the early iterations, in order to better initially explore the search space, and to obtain approximate solutions that are not grouped in a reduced sub–region of the feasible set. In the Appendix, we detail a brief summary of the theoretical results supporting these initializations (see Diez et al. 2016 for a more complete report).
Assuming for simplicity \(P=2n\), recalling the formulae (29), if we adopt the following initialization (Orthoinit) for PSO particles
$$\begin{aligned} \begin{pmatrix} {\mathbf {v}}_{i}^0 \\ \\ {\mathbf {x}}_{i}^0\end{pmatrix} =\rho _i z_i(k), \qquad \rho _i \in {\mathbb {R}}\setminus \{0\}, \qquad i=1,\dots ,n, \end{aligned}$$
(21)
and
$$\begin{aligned} \begin{pmatrix} {\mathbf {v}}_{n+i}^0 \\ \\ {\mathbf {x}}_{n+i}^0\end{pmatrix} =\rho _{n+i} z_{n+i}(k), \qquad \rho _{n+i} \in {\mathbb {R}}\setminus \{0\}, \qquad i=1,\dots ,n, \end{aligned}$$
(22)
then the first n entries of the free responses of the particles, i.e. the velocities \({\mathbf {v}}_{i}^0\), \(i=1, \ldots , 2n\), are orthogonal at step k of the deterministic PSO. To some extent, this also tends to impose a near orthogonality of the particles trajectories at step k, as well as in some subsequent iterations. While this initialization has the advantage of making the particles trajectories better scattered in the search space, unfortunately it tends to yield too sparse approximate solutions (see Diez et al. 2016), i.e. only few components of the vector of solutions are nonzero. This is essentially a consequence of the fact that \(z_i(k)\) and \(z_{n+i}(k)\) are very sparse vectors (see the Appendix).
In order to possibly pursue a dense final solution, a modification (namely Orthoinit+) has been proposed in Diez et al. (2016). Here, the vectors \(z_i(k)\), \(i=1,\dots , 2n\) in (21) and (22) are replaced by the following ones
$$\begin{aligned} \begin{array}{c} \displaystyle \nu _i(k) = z_i(k)-\alpha \sum _{\begin{array}{c} j=1\\ j\not =i \end{array}}^{n}z_j(k)-\gamma \sum _{j=n+1}^{2n}z_j(k), \qquad i=1,\dots ,n \\ \displaystyle \nu _{n+i}(k) = z_{n+i}(k)-\beta \sum _{\begin{array}{c} j=n+1\\ j\not =n+i \end{array}}^{2n}z_j(k)-\delta \sum _{j=1}^{n}z_j(k), \qquad i=1,\dots ,n,\\ \end{array} \end{aligned}$$
(23)
where \(\alpha \in {\mathbb {R}}\setminus \{-1,\frac{1}{n}\}, \quad \beta = \frac{2}{n-2}, \quad \gamma = 0, \quad \delta \in {\mathbb {R}}\setminus \{0,1\}\). It is possible to prove that the vectors \(\nu _1(k), \dots , \nu _{2n}(k)\) are still well scattered in \({\mathbb {R}}^{2n}\), as well as uniformly linearly independent (see Diez et al. 2016).