An improved particle swarm optimization with a new swap operator for team formation problem

Formation of effective teams of experts has played a crucial role in successful projects especially in social networks. In this paper, a new particle swarm optimization (PSO) algorithm is proposed for solving a team formation optimization problem by minimizing the communication cost among experts. The proposed algorithm is called by improved particle optimization with new swap operator (IPSONSO). In IPSONSO, a new swap operator is applied within particle swarm optimization to ensure the consistency of the capabilities and the skills to perform the required project. Also, the proposed algorithm is investigated by applying it on ten different experiments with different numbers of experts and skills; then, IPSONSO is applied on DBLP dataset, which is an example for benchmark real-life database. Moreover, the proposed algorithm is compared with the standard PSO to verify its efﬁciency and the effectiveness and practicality of the proposed algorithm are shown in our results.


Introduction
The team formation (TF) problem plays a crucial role in many real-life applications ranging from software project development to various participatory tasks in social networks. In such applications, collaboration among experts is required. There are a number of experts associated with their capabilities (i.e., skills) and a collaborative task (i.e., project) that requires set of skills needed to be accomplished. The problem is how to find the effective team of experts that covers all the required skills for a given task with least communication cost. It is known that this problem is NP-hard problem (Lappas et al. 2009); hence, it will be interesting to develop heuristic search methods to solve it.
It is well known that the swarm-based algorithms such as particle swarm optimization (PSO) (Vallade and Nakashima 2013) are capable of reaching solutions quickly and efficiently because they have the ability to generate different outputs from the same sample inputs. It is a heuristic method that based on execution of various alternative solutions via iterations to find the best solution. Another adaptive heuristic method is genetic algorithm (GA) (Holland 1975;Kalita et al. 2017). It is based on the natural law of evolution through the natural selection and the exchange of genetic information. Generally speaking, the goal of optimization methods is to find adequate incorporation of a set of parameters to achieve the most satisfaction (e.g., minimum or maximum) that depends on the requirement of the problem.
Therefore, the main objective of this research is to form the effective team of experts with minimum communication cost by using a hybrid improved PSO with a new swap operator and the main operator of GA (i.e., crossover operator). We call the proposed algorithm an improved particle optimization with a new swap operator (IPSONSO).
The problem in Karduck and Sienou (2004) is defined as the process anterior to the forming stage of the group development theory. The key problem is the selection of best candidates that fulfills the requirement specification for achieving the goal. Most of existing team formation based on approximation algorithms (Anagnostopoulos et al. 2012;Kargar et al. 2013) considers different communication costs such as diameter and minimum spanning tree (Lappas et al. 2009) and sum of distance from team leader (Kargar and An 2011).
A generalization of the team formation problem is given (Appel et al. 2014;Li and Shan 2010;Li et al. 2015) by assigning each skill to a specific number of experts. Consideration of the maximum load of experts according to different tasks is taken from Anagnostopoulos et al. (2010) without taking into consideration the minimum communication cost for team formation.
On the other side of team formation problem, a minimal research work has been done based on meta-heuristic algorithms such as PSO and GA (Haupt and Haupt 2004). These algorithms have been successfully applied in an optimization method as in Blum and Roli (2003), Pashaei et al. (2015), Sedighizadeh and Masehian (2009) for many real-world applications.
A group formation method using genetic algorithm is presented in Zhang and Si (2010), where the members for each group are generated based on the students' programming skill. A genetic algorithm in team formation is used in Nadershahi and Moghaddam (2012) based on Belbin team role that categorized individuals in nine roles regarding their specialty and attitude toward team working.
A team formation problem is presented in Gutiérrez et al. (2016) based on sociometric matrix in which a mathematical programming model for maximizing the efficiency understood relationships among people who share a multidisciplinary work cell is considered. A variable neighborhood local search meta-heuristic is applied in Gutiérrez et al. (2016) to solve team formation problem and showed the most efficient in almost all cases, but in our work, the global search meta-heuristic considered with least minimum communication cost among all the locals is the most efficient all over the search.
A team formation is considered in Huang et al. (2017) based on the available work time and set of skills for each expert in order to build the effective team. Each expert is associated with a skill level indicating his competence in this skill. In our research, all experts that have the ability to perform the skill are attentive to share in a collaborative group in order to achieve the goal.
A mathematical framework for dealing the team formation problem is proposed in Farasat and Nikolaev (2016) explicitly incorporating social structure among experts where a LK-TFP heuristic is used to perform variabledepth neighborhood search and compared the results with standard genetic algorithm. In our paper by given a pool of individuals, an improved PSO algorithm for team formation problem is proposed and compared the results with standard PSO.
Finally, in Fathian et al. (2017) a mathematical model is proposed to maximize team reliability by considering the probability of unreliable experts that may leave the team with a probability and prepare a backup for each unreliable one. In that case, for each team, associated team members in the two sets, namely, main and backup members, should be presented and is effective only in some specific situations. In contrast to our research, among all the available team members, the most feasible one is chosen in the team formation that has no incentive to leave the team.
The rest of the paper is organized as follows. Section 2 illustrates the definition of team formation problem. Section 3 introduces the formulation of proposed algorithm and how it works. Section 4 discusses the experimental results of the proposed algorithm. Finally, Sect. 5 concludes the work and highlights the future work.

Team formation problem
The team formation problem in social network can be formulated as finding a set of experts from a social network graph G(V, E) to accomplish a given task (i.e., project) in which a number of experts n exist such that, V ¼ fv 1 ; v 2 ; . . .; v n g and a set of m skills S ¼ fs 1 ; s 2 ; . . .; s m g, which represent their abilities to a given task. Each expert v i is associated with a set of specific skills sðv i Þ; sðv i Þ & S. The set of experts that have the skill s k is denoted as Cðs k Þ, (i.e., Cðs k Þ & V). A given task T is formed by a set of required skills (i.e., T ¼ fs i ; . . .; s j g S) that can be applied by a set experts forming a team. A set of possible teams that can achieve a given task is denoted as X; X ¼ x 1 ; x 2 ; . . .; x k . Therefore, the task (T S v i 2x k sðv i ÞÞ. The collaboration cost (i.e., communication cost) between any two experts (e.g., v i and v j ) is denoted by e ij 2 E that can be computed according to Eq. 1.
The goal is to find a team with least communication cost among team members CCðx k Þ according to Eq. 2.
where jx k j is the cardinality of team x k . The team formation problem can be considered as an optimization problem by forming a feasible team x Ã among a set of possible teams which covers the required skills for a given task with minimum communication cost among team's experts, and x Ã can be obtained by the following subjectto 8v i ; v j : e ij 2 ½0; 1 where the communication cost between any pair of experts within the range 0 and 1 and for each required skill in the given task, there exists at least one expert that have the required skill. All the skills should be achieved for a given task to obtain a feasible team x Ã . The notations of the team formation problem are summarized in Table 1.
Remark Set covering problem is one of the traditional problems in complexity theory and computer science. Set covering problem is regarded as one of the most important discrete optimization problems because it can be formulated as a model for various real-life problems, e.g., vehicle routing, resource allocation, nurse scheduling problem, airline crew scheduling, facility location problem. The name of problem, set covering problem, arises from covering the rows of an m-row/n-column zero-one matrix with a subset of columns at minimal cost set (Beasley and Chu 1996). Covering problem can be modeled as follows: Equation (5) is the objective function of set covering problem, where x j is decision variable and c j denotes to weight or cost of covering j column. Equation (6) is a constraint to assure that each row is covered by at least one column where a ij is constraint coefficient matrix of size m Â n whose elements consist of either ''1'' or ''0.'' Also, Eq. (7) is the integrality constraint in which the value is expressed as follows Despite the fact that it may look to be an easy problem from the objective functions and constraints of the problem, set covering problem is a combinational optimization problem and NP-complete decision problem (Lappas et al. 2009). As mentioned in the literature, e.g., Kargar and An (2011), team formation problem is a special instance of the minimum set cover problem.
An example of the team formation problem We describe an example of the team formation problem in Fig. 1.
In Fig. 1, a network of experts V ¼ fv 1 ; v 2 ; v 3 ; v 4 ; v 5 ; v 6 g is considered where each expert has a set of skills S and there is a communication cost between every two adjacent experts v i ; v j , which is represented as a weight of edge ðv i ; v j Þ (e.g., wðv 1 ; v 2 Þ ¼ 0:2). The communication cost between non-adjacent experts is represented by the shortest path between them.
The aim is to find team X of experts V with the required skills S with a minimum communication cost. In Fig. 1, two teams with the required skills X 1 ¼ fv 1 ; v 2 ; v 3 ; v 4 g and X 2 ¼ fv 2 ; v 4 ; v 5 ; v 6 g are obtained.

The proposed algorithm
In the following subsections, the main processes of the standard particle swarm optimization (PSO), single-point crossover, and the improved swap operator are highlighted and invoking them in the proposed algorithm is described.

Particle swarm optimization
Particle swarm optimization (PSO) is a population-based meta-heuristic method developed by Kennedy and Eberhart in 1995 (Eberhart et al. 2001). The main process of the PSO is shown in Fig. 2. The PSO population is called swarm SW, the swarm contains particles (individuals), and each particle is represented by n-dimensional vectors as shown in Eq. 8 Each particle has a velocity, which is generated randomly as shown in Eq. 9.
The best personal (P best ) and global positions (g best ) of each particle are assigned according to Eq. 10.
At each iteration, each particle updates its personal position (P best ) and the global position (g best ) among particles in the neighborhood as shown in Eqs. 11 and 12, respectively.
where c 1 and c 2 are the cognitive and social parameters, respectively. r 1 and r 2 are random vector 2 ½0; 1. The process are repeated till termination criteria are satisfied.

Single-point crossover
Crossover is the one of the most important operators in GA.
It creates one or more offspring from the selected parents. The single-point crossover (Goldberg 1989) is one of the most used operators in GA. The process starts by selecting a random point k in the parents between the first gene and the last gene. The two parents are swamping all the genes between the point k and the last gene. The process of the single-point crossover is shown Fig. 3.

A new swap operator
A swap operator (SO) in Wang et al. (2003), Wei et al. (2009) an Zhang and Si (2010) consists of two indices SO(a, b), which applied on the current solution to make a new solution. For example, if we have a solution S ¼ ð1 À 2 À 3 À 4 À 5Þ; SO ¼ ð2; 3Þ; then, the new solu- . A collection of one or more swap operators SO(s), which can apply sequentially, is called swap sequence (SS). SS applies on a solution by maintaining all its SS ¼ ðSO 1 ; SO 2 ; . . .; SO n Þ to produce a final solution.
In our proposed algorithm, the proposed swap operator NSOða; b; cÞ contains three indices: the first one argument a is the skill id , and the second and the third arguments b, c are the current and the new experts' indices, respectively, which are selected randomly and they have the same skill id where b 6 ¼ c. For example, NSOð2; 1; 3Þ means for skill id ¼ 2 there is a swap between the expert id ¼ 1 and expert id ¼ 3.

Improved Particle Swarm Optimization with New Swap Operator (IPSONSO)
In this subsection, the main structure of the proposed IPSONSO is explained and shown in Algorithm 1.
In the following subsections, the proposed IPSONSO is applied and explained how to solve team formation problem.

Initialization and representation
IPSONSO starts by setting the initial values of its main parameters such as the population size P, social and  is the skill id and y, z are the indices of experts that have the skill from experts' list Cðs i Þ ¼ f1; 2; . . .; E i g.

Particle evaluation
The relationship between experts is represented by a social network, where nodes represent experts and edges represent the communication cost (i.e., weight) between two experts. The weight between expert i and expert j is represented in Eq. 1. The least communication cost among team members CCðx k Þ can be computed according to Eq. 2. The particle with minimum weight among all evaluated particles is considered as a g best (global best particle), where the local best is assigned for each particle as pbest.
best {the best global solution in the swarm}. 6: Set p (t) best i {the best personal solution in the swarm}. 7: repeat 8: best . 25: end if 26: Set t = t + 1 {Iteration counter is increasing}. 27: until Termination criteria are satisfied. 28: Report the best particle.

Particle velocity update
The initial particles' velocities contain a set of random new swap operators ðNSOðsÞÞ. Each particle updates its velocity as shown in Eq. 13.
The single-point crossover operator is used to produce new individuals by combining sub-individuals from the current individual and the global best individual (g best ) in the whole population. After applying the crossover operator, two new individuals are obtained with mixed expert assignments from each other. Finally, one team configuration will be selected randomly x

Particle position update
Particle positions are updated according to Eq. 14 by applying the sequences of the new swap operators ½NSOðsÞ to the current particle in order to obtain the new particle with a new position. All previous process are repeated till reaching to the maximum number of iterations.
The relationship between experts is represented by a social network where the nodes represent experts and the edges represent the communication cost (i.e., weight) between two experts as shown in Fig. 4.
The weight between experts can be computed as shown in Eq. (1). Some of teams that have the required skills can be formed such as T 1 ¼ fa; cg, T 2 ¼ fa; dg, T 3 ¼ fa; eg, T 4 ¼ fb; cg, T 5 ¼ fa; b; cg and T 6 ¼ fa; d; eg. The communication cost of the formed teams is defined as follows: CðT 1 Þ ¼ 1, CðT 2 Þ ¼ 0:8, CðT 3 Þ ¼ 0:8, CðT 4 Þ ¼ 1, CðT 5 Þ ¼ 0:66, CðT 6 Þ ¼ 1:6 A particle in IPSONSO algorithm is an array list of size 1 Â 3, where the first needed skill is ''Network,'' the second one is ''Analysis,'' and the third skill is ''Algorithm'' as shown in Fig. 5. Figure 5 represents the possible values for each index of a particle in the IPSONSO algorithm. As for required skill id ¼ 1, there are three experts that have this skill, i.e., (a,b,e).
In the following subsection, the main steps of the proposed algorithm are highlighted when it is applied on the random dataset as described in Sect. 3.5 and shown in Figs. 4 and 5.
-Initialization In the IPSONSO algorithm, the initial population (particles) and their velocities are generated randomly. Each velocity is a swap sequence (i.e., sequence set of swap operators) that represented by a tuple \x; y; z [ where x is the skill id and y and z are the indices of the current and the new experts, respectively. An example of the initialization of two particles A, B is shown in Table 2. -Particles evaluation The communication cost for each particle is computed as: CðAÞ ¼ 1, and CðBÞ ¼ 1:55. -Particle positions and velocities update The particle with minimum weight among all evaluated particles is considered as a g best (particle B in our example), and the local best is assigned for each particle as pbest. In each iteration, the updated velocities and particle positions are computed as shown in Equations 13 and 14, respectively. -Crossover The single-point crossover is applied between the g best and particle A as shown in Fig. 6. The particle with minimum weight is chosen as a result of crossover, in our example CðA1Þ ¼ 1:55 and CðA2Þ ¼ 0:66. Therefore, the x ðtÞ cross particle is A 2 = (a,c,b).
-Velocity update.  A (a,c,a) is updated to (e,e,a), and its communication cost is CðAÞ ¼ 0:8. The same processes are applied for particle B. The next iteration, a pbest, is updated for particle A that changed from 1 to 0.8, and the same g best can be updated according to the particle that has a minimum communication cost. After a number of iterations, the    most feasible team is formed so far for required skills (i.e., the global best particle g best so far).

Numerical experiments
Ten experiments are performed on random dataset as described in Sect. 3.5 with different skills and expert numbers to evaluate the performance of the proposed algorithm that focuses on iteratively minimizing the communication cost among team members. The proposed algorithm is compared against the standard PSO (SPSO). Also, the performance of the proposed algorithm is investigated on real-life DBLP dataset. The experiments are implemented by Eclipse Java EE IDE V-1.2 running on Intel(R) core i3 CPU-2.53 GHz with 8 GB RAM and (Windows 7).

Parameter setting
In this subsection, the parameter setting of the proposed algorithm is highlighted, which is used in the ten experiments for a random dataset. The parameters are reported in Table 3.

Random dataset
In this subsection, the performance of the proposed algorithm is investigated on random dataset which is described in Sect. 3.5. The proposed algorithm is applied on different numbers of experts and skills. The results of the proposed algorithm are reported on the subsequent subsections.

Comparison between SPSO and IPSONSO on random data
The first test of the proposed algorithm is to compare it against the standard PSO (SPSO) to verify its efficiency.
The results are reported in Table 4. In Table 4, the minimum (min), maximum (max), average (mean) and the standard deviation (SD) of the results are reported over 50 random runs. The best results are reported in bold font. The results in Table 4 and Fig. 7 show that the proposed algorithm is better than the standard PSO.
Also, the performance of the SPSO and the IPSONSO is shown in Fig. 8 by plotting the number of iterations versus the communication costs. The solid line represents the results of the proposed algorithm, while the dotted line represents the results of the standard PSO (SPSO). The results in Fig. 8 show that the proposed algorithm can obtain minimum communication cost faster than the standard PSO.

DBLP: real-life data
In this work, the DBLP datasets are used, which has been extracted from DBLP XML released on July 2017. The DBLP dataset is one of the most popular open bibliographic information about computer science journals and different proceedings that can be extracted in the form of XML document type definition (DTD). The following steps are applied to construct 4 tables as follows.
1. Author ( name, paper_key), 6054672 records. Our attention is focused on papers that have been published only in year 2017 (22364 records). Then, the DBLP dataset is restricted to the following 5 fields of computer science: databases (DB), theory (T), data mining (DM), artificial intelligence (AI), and software engineering (SE). In order to construct the DBLP graph, the following steps are applied. It worth to mention that the papers of the major 10 conferences in computer science (with 1707 records) are included. Five experiments are conducted, and the average results are taken over 50 runs. The number of skills selected randomly from the most shared skills among authors with initial population is 3 and 10 number of iterations.

Comparison between SPSO and IPSONSO on DBLP dataset
In this subsection, the proposed algorithm is compared against the standard PSO (SPSO) with different numbers of experts and skills for DBLP dataset by reporting the maximum (max), average (mean) and standard deviation (SD) in Table 5. Also, in Fig. 9, the results of the standard PSO (SPSO) and the proposed IPSONSO are presented by plotting the number of iterations versus the CI of average communication cost. The solid line represents the results of the proposed IPSONSO, while the dotted line represents the results of the SPSO. The result in Fig. 9 shows that the performance of the proposed algorithm is better than the performance of SPSO.

Confidence interval (CI)
A confidence interval (CI) measures the probability that a population parameter falls between two set values (upper and lower bound). It constructed at a confidence level (C) such as 95% (i.e., 95% CI). The 95% confidence interval b Fig. 8 Comparison between SPSO and IPSONSO on random dataset on average communication cost uses the sample's mean and standard deviation by assuming a normal distribution. CI can be computed as follows.
(c, SD, sample size), where c depends on the confidence level (i.e., c ¼ 1 À C), SD is the standard deviation of the sample, and sample size is the size of population. In case of Fig. 9 Comparison between SPSO and IPSONSO on DBLP data on average communication cost where Avg ðSPSOÞ and Avg ðISPSONSOÞ are the average results obtained from SPSO and IPSONSO algorithms, respectively.

Confidence interval (CI) for random data
In the following tables, the CI of average communication cost is presented for 10 experiments on random generated data. The results in Table 6 show the average communication cost for experiments 1 and 2. In Table 6, the results of IPSONSO decrease iteratively to the number of iterations than SPSO with achieving better performance ranged from 8% in the second iteration to 19% in the last iteration for experiment 1, while the percentage of the improved results ranged from 4 to 10% when it is compared with SPSO in experiment 2. The results of experiments 3 and 4 are reported in Table 7. In Table 7, the results of IPSONSO are better and more efficient than SPSO with average communication cost and went down from 5 to 13% during iterations and the average communication cost of proposed IPSONSO minimized by percentage ranged from 2 to 9% at the end of iterations when compared with SPSO results (Table 7).
In Table 8, for experiment 5, the performance of average communication cost of IPSONSO solution is improved within the range 2-8% when it is compared with SPSO along with the number of iterations and the proposed IPSONSO has proven its efficiency for team formation with minimum communication cost in the range from 3 to 7% better than SPSO.
In Table 9, the results of experiment 7 show that the IPSONSO achieves better performance results and reaches to 8% than SPSO with respect to the average communication cost along the number of iterations, and for experiment 8, the average communication cost of the proposed IPSONSO is reduced by 8% over the 20 iterations when it is compared with SPSO.
In Table 10, the results of experiment 9 show that the average communication cost of IPSONSO performance results is improved from 3 to 9% iteratively with respect to number of iterations when it is compared with the SPSO solution and the results of experiment 10 show that the average communication cost of IPSONSO is reduced iteratively and achieved better performance than SPSO by 8% with respect to the large number of experts and skills. In Fig. 10, the CI of the proposed algorithm is presented against the standard PSO for different skill numbers by plotting the number of iterations against the CI on average communication cost. The solid line represents the results of the proposed algorithm, while the dotted line represents the standard PSO. The results in Fig. 10 show that the proposed algorithm is better than the standard PSO.

Confidence interval (CI) of SPSO and IPSONSO for DBLP dataset
In this subsection, the CI of SPSO and IPSONSO for DBLP dataset is reported with different numbers of skills as shown in Tables 11, 12 and 13. The results in Table 11 show that the average communication cost of IPSONSO achieves better results than SPSO over the number of iterations. The percentage of improved results is up to 5 and 8% for 2 and 4 skills, respectively, when it is compared with SPSO (Fig. 11).
In Table 12, the results of the PSO and IPSONSO are reported for 6 and 8 skills. The results in Table 12 show that the IPSONSO obtains better and more efficient results than SPSO with average communication cost and goes down from 3 to 5% during iterations for 6 skills, while it costs up to 6% better than SPSO for 8 skills.
Finally, the IPSONSO algorithm achieves better performance results ranged from 2 to 7% than SPSO with respect to the average communication cost along the number of iterations and number of skills. b Fig. 10 Confidence interval of SPSO and IPSONSO on random data  The results in Tables 11, 12 and 13 and Fig. 11 show that the performance of the proposed algorithm is better the performance of the standard PSO algorithm.
We can conclude from the previous tables and figure that the performance of the proposed algorithm is better than the performance of the standard PSO.

Average processing time of SPSO and IPSONSO on DBLP dataset
The average processing time (in seconds) of the SPSO and IPSONSO is reported in Table 14 over 30 runs. The time for forming a team by using the proposed algorithm IPSONSO increases almost linearly with number of skills with average processing time ranged from 8 to 34% more time than SPSO due to some processing factors such as the crossover and swap sequence operator.

Conclusion and future work
Team formation problem is the problem of finding a group of team members with the requirement skills to perform a specific task. In this study, a new particle swarm optimization algorithm is investigated with a new swap operator to solve team formation problem. The proposed algorithm is called improved particle optimization with new swap operator (IPSONSO). In IPSONSO algorithm, a new swap operator NSO(x, y, z) is proposed, where x is the skill id and y, z are the indices of experts that have the skill from experts' list. Invoking the single-point crossover in the proposed algorithm can exploit the promising area of the solutions and accelerate the convergence of it by mating the global best solution with a random selected solution. The performance of proposed algorithm is investigated on ten experiments with different numbers of skills and experts and five experiments for real-life DBLP dataset. The results of the proposed algorithm show that it can obtain a promising result in reasonable time. In the future work, combination of the proposed algorithm with other swarm intelligence algorithms is considered to accelerate the convergence of it and avoid the premature convergence. It is worthwhile to test our proposed algorithm over various benchmark problems of nonlinear mixed integer programming problems.