Medoidbased clustering using ant colony optimization
Abstract
The application of ACObased algorithms in data mining has been growing over the last few years, and several supervised and unsupervised learning algorithms have been developed using this bioinspired approach. Most recent works about unsupervised learning have focused on clustering, showing the potential of ACObased techniques. However, there are still clustering areas that are almost unexplored using these techniques, such as medoidbased clustering. Medoidbased clustering methods are helpful—compared to classical centroidbased techniques—when centroids cannot be easily defined. This paper proposes two medoidbased ACO clustering algorithms, where the only information needed is the distance between data: one algorithm that uses an ACO procedure to determine an optimal medoid set (METACOC algorithm) and another algorithm that uses an automatic selection of the number of clusters (METACOCK algorithm). The proposed algorithms are compared against classical clustering approaches using synthetic and realworld datasets.
Keywords
Ant colony optimization Clustering Data mining Machine learning Medoid Adaptive1 Introduction
Clustering is one of the most relevant areas in data mining and machine learning (Larose 2005; Witten and Frank 2005). Clustering techniques are based on the extraction of patterns in data blindly, referred to as unsupervised learning. Using clustering techniques, data analysts are able to extract information from different datasets without human or expert supervision. Clustering has been designed to group data by similarity. The aim is to minimize the value of a predefined cost function, assigning data instances to different groups (clusters) and optimizing this assignment in order to obtain the lowest value of the cost function.
There are several areas that have dealt with clustering problems. One of the most relevant is the statistics area, where wellknown clustering algorithms have been proposed, such as Kmeans, expectation maximization (EM), hierarchical, spectral and fuzzy clustering, among others. Over the last few years, bioinspired algorithms have received increasing attention. The potential that swarm intelligence and evolutionary algorithms have in optimization has made them potential techniques for clustering. This paper explores this potential, specifically focusing on ant colony optimization (ACO; Dorigo and Stützle 2004).
The proposed algorithms address the main problem with centroidbased approaches, that is the fact that they need to know the features of the search space in order to determine the central point and that they are sensitive to noise. This means, centroidbased clustering algorithms use a multidimensional space to represent the data based on their features in order to find the centroid (central point) position of each cluster. A distance metric (in most cases Euclidean) is used to set a centroid and optimize its position according to the distance between the centroid and the data. As a centroid position is determined by averaging the coordinate values of the data in each cluster, this process does not cope well with outliers. Centroidbased clustering algorithms work well when the data can be represented by features in a multidimensional space, e.g. clustering of houses based on features such as price, square metres, number of bedrooms/bathrooms, distance to public transportation. However, they are not appropriate in cases where the features of the data are not clear, e.g. clustering of face images—while it is straightforward to calculate the similarity of images, it not easy to define features to represent them in a multidimensional space.
Medoidbased clustering algorithms are usually more robust to noise effects, and data instances do not need to be represented in a multidimensional space. They use a notion of similarity/distance among the data instances, which can be obtained as a Gram matrix of a kernel or a distance measure, and they choose data instances to define clusters centres—the selected instances are called medoids.
This paper proposes two medoidbased ACO clustering algorithms, where the only information needed is the distance among data: one algorithm that uses an ACO procedure to determine an optimal medoid set (METACOC algorithm) and another algorithm that additionally uses an automatic selection of the number of clusters (METACOCK algorithm). These algorithms use a graphbased structure and a search strategy that requires no knowledge about the search space features. As aforementioned, this strategy is different from classical centroidbased approaches, where the position of the centroid is optimized in order to define the different clusters. In order to evaluate the performance of the proposed algorithms, we have compared them against the ACObased ACOC algorithm (Kao and Cheng 2006) using synthetic and realworld datasets, and also against five wellknown clustering algorithms: Kmeans (MacQueen 1967), partition around medoids (PAM; Kaufman and Rousseeuw 1987), PAMK (Kaufman and Rousseeuw 2009), EMBIC (Fraley and Raftery 2007) and Clues (Wang et al. 2007).
The remainder of this paper is organized as follows. Section 2 presents related work, discussing the clustering problem and previous ACO algorithms for clustering. Section 3 introduces the proposed algorithms. Computational experiments and analysis of the obtained results are presented in Sect. 4. Finally, Sect. 5 presents conclusions and future work.
2 Related work
Data mining and machine learning techniques have been used for several applications. One of the most prominent application areas is the identification of patterns in data, which helps data analysts to extract hidden information from data (Larose 2005). Recent data analysis demands have presented new challenges for machine learning techniques (Cao 2010); for example, the need for creating new scalable and robust methodologies is currently receiving increasing interest. In order to improve the robustness of these analysis, new methodologies based on swarm intelligence have shown promise due to the quality of the results extracted using these techniques, which are highly competitive when compared with classical algorithms.
One of the most successful swarm intelligence techniques is ACO (Dorigo and Stützle 2004). ACO algorithms are based on some aspects of the foraging behaviour of ants that collectively can find the shortest path from the nest to a food source. The use of ACO has been extended to several optimization areas, including machine learning. This section provides a general description of the clustering problem—including a discussion of issues about the Kadaptive problem within clustering—and it discusses ACO applications in clustering and the related classification task.
2.1 The clustering problem
There are also several statistical techniques that have been applied to clustering problems, such as EM (Dempster et al. 1977). This approach uses the likelihood of the cluster selection to guide the search, and it is able to apply different statistical estimators depending on the problem. The most frequent estimator for EM is a Gaussian mixture model, where the user defines one Gaussian distribution per cluster and the process optimizes the mean and variance of each distribution in order to generate a good clustering distribution reducing some cost function.

Centroids are determined by averaging the coordinate values of the data in each cluster, while medoids are representative members of the data: centroids are not suitable when the average cannot be defined (e.g. clustering of face images, time series or gene expression data);

Centroids are more sensitive to outliers: an instance that is far away from the rest of the cluster produces an important modification in the centroid position. This does not happen with medoids because they are a relevant instance of the datasets.
One of the main challenges around the clustering problem is how to choose a good number of clusters (Tibshirani et al. 2001). The majority of clustering algorithms require the specification of the number of clusters a priori as a parameter of the algorithm. An alternative to having the number of clusters fixed is based on the use of a metric to evaluate the clusters’ quality, allowing an algorithm to test a variable number of clusters. The most relevant metric used in the literature is the silhouette (Rousseeuw 1987; see Sect. 3.2). This metric represents a balance between the number of clusters and the cluster separation, which can be used to evaluate the tradeoff between the number of clusters and their dissimilarity. Different algorithms have been proposed to optimize the silhouette measure. The most relevant are PAMK (Kaufman and Rousseeuw 2009; an extended version of PAM allowing the number of clusters to vary) and Clues (Wang et al. 2007; an iterative algorithm focused on the silhouette optimization).
2.2 Ant colony optimization in clustering
ACO has already been applied to clustering (Jafar and Sivakumar 2010) and classification (Martens et al. 2011). The advantage of applying ACO algorithms to these problems is that ACO performs a global search in the solution space, which is less likely to get trapped in local minima and, thus, has the potential to find more accurate solutions.
The most popular bioinspired approaches that deal with the clustering problem are focused on evolutionary algorithms (Menéndez et al. 2014). Hruschka et al. (2009) presents a survey of clustering algorithms from different evolutionary approaches. In the context of antbased approaches, researchers have explored mainly two different strategies. There are antbased approaches that focus on the cooperative selforganization characteristics of ant algorithms. Handl et al. (2006) present an adaptive clustering algorithm, called ATTA, based on the clustering of corpses behaviour of ants. An interesting aspect of ATTA is its ability to adapt the total number of clusters k during the search, although at the same time this is viewed as a limitation, since the algorithm does not allow the specification of k for problems where the number of clusters is known a priori. More examples can be found in Fernandes et al. (2008), Herrmann and Ultsch (2008). These approaches can also be characterized based on the way data are manipulated by ants: antbased approaches can be based on a grid, where ants move data to define the clusters mimicking a behaviour observed in nature (e.g. the way ants move their brood or their waste) or based on the association of each data instance to an ant (Hamdi et al. 2010). Other antbased approaches involve the use of an ACO procedure, where the clustering problem in modelled as an optimization problem and pheromone is used to guide the search towards better solutions. Kao and Cheng (2006) designed a centroidbased ACO clustering algorithm, where ants assign each data instance to one of the available clusters and cluster centroids are adjusted based on this assignment. França et al. (2008) introduce a biclustering algorithm. Ashok and Messinger (2012) focused their work on graphbased clustering of spectral imagery, where the data are represented as a graph and an ACO procedure is used to find long paths through the data. Several other approaches are discussed in Jafar and Sivakumar (2010).
3 Medoidbased ACO clustering algorithms
This section presents the proposed medoidbased ACO clustering algorithms. Both algorithms employ an ACO procedure to select an optimal medoid set to determine the clusters. The first algorithm, called MEdoid seT ACO Clustering algorithm (METACOC), is similar to the PAM algorithm, where the goal of the algorithm is to choose the best k medoids (data instances) based only on distance information—where k is the predefined number of clusters. The second algorithm, called Kadaptive MEdoid seT ACO Clustering algorithm (METACOCK), is an extension of METACOC that enables the algorithm to automatically adjust the number of clusters—useful for problems where the number of cluster is not known a priori.
3.1 METACOC: a medoid set ACO clustering algorithm
The METACOC algorithm is based on several ants looking for the best path in the construction graph. The construction graph is composed by all data instances. Solutions are generated by choosing medoids (data instances) and assigning remaining data instances deterministically to them, according to their distance in relation to the selected medoids. The medoids selection is illustrated in Fig. 2. The rationale is that once the medoids are determined, there is a deterministic optimal cluster allocation based on the similarity/dissimilarity values.

a list of visited data instances (\(tb_a\));

a set of chosen medoids \(M_a\), which is initially empty.
 1.
Initialize the pheromone matrix \(\tau _0\).
 2.
Initialize each ant a: set the chosen medoids \(M_a = \emptyset \) and the visited data instances \(tb_a = \emptyset \).
 3.For each ant, check if all instances have been visited (\(tb_a == n\)) or all medoids have been chosen (\(M_a == k\)). If not:
 (a)
select the next data instance i.
 (b)
choose a search strategy;
 (c)
if i is selected as a medoid add it to \(M_a\);
 (d)
add i to the list of visited data instances \(tb_a\).
 (a)
 4.Assign each data instance to its closest medoid and calculate the objective function value for each ant a:where \(x_i\) represents a data instance and \(m_j\) represents a medoid in \(M_a\).$$\begin{aligned} J^a = \sum _{i=1}^n \min _{j=1}^{M_a} d(x_i,m_j^a), \end{aligned}$$(5)
 5.Choose the best solution:
 (a)
rank the ants solutions;
 (b)
if an ant has less medoids than k it is eliminated from the ranking;
 (c)
choose the best ant \(a^*\) (iterationbest solution);
 (d)
compare \(a^*\) with the bestsofar solution \(a^{**}\) and update this value with the maximum between them.
 (a)
 6.Update the pheromone trails (global updating rule). Only the r best ants add pheromone:where \(\rho \) is the pheromone evaporation rate, (\(0 < \rho < 1\)), t is the iteration counter, r is the number of elitist ants and \(J^h\) is the quality of the solution created by ant h.$$\begin{aligned} \tau _{t+1}(i,j) = (1 \rho )\tau _t(i,j) + \sum _{h=1}^r \varDelta \tau _t(i,j)^h , \quad \varDelta \tau _t(i,j)^h = \frac{1}{J^h}, \end{aligned}$$(6)
 7.Check termination condition:
 (a)
if the number of iterations is greater than the maximum number of iterations, it finishes choosing the bestsofar solution \(a^{**}\);
 (b)
otherwise, go to step 2.
 (a)
In terms of computational complexity, we can assume that all data instances are visited during the search process—although in practice this is not frequent—which takes O(A n) (where A is the number of ants and n is the number of data instances). The algorithm also includes a step that assigns each data instance to its closest medoid, which takes O(A n k) (where k is the number of medoids). The evaluation involves calculating the similarity of each data instance to its assigned medoid, which takes O(A n). Finally, the ranking of solutions takes \(O(A \log A)\) and the pheromone update uses r elitist ants and visits all data instances, which takes O(r n). Since these steps are repeated T iterations, the total complexity is \(O(T A n) + O(T A n k) + O(T A \log A) + O(T r n)\)—as \(O(T A n k) \ge O(T A n) \ge O(T r n) \ge O(T A \log A)\), the complexity is simplified to O(T A n k).
3.2 METACOCK: a kadaptive extension of METACOC
The proposed METACOC algorithm cannot choose the number of clusters, but requires as input a value for k. This section presents the METACOCK algorithm, which allows the estimation of the number of clusters using METACOC as a starting point.

each ant can have a different number of clusters;

the quality metric is designed to balance between the number of clusters and the cluster assignment cost.
 1.Selection of the number of clusters:
 (a)
during the ant initialization (step 2 in METACOC), it additionally chooses uniformly at random the number of clusters in the range \([k_{\mathrm{min}}, k_{\mathrm{max}}]\); the solution is then created using the same procedure as in METACOC.
 (a)
 2.Solution evaluation:
 (a)
candidate solutions are evaluated using the average silhouette (Eq. 7), which evaluates the balance between the number of clusters and the cluster assignment cost (step 4 in METACOC).
 (a)
4 Computational experiments
This section presents the experiments that were carried out to measure the performance of the proposed algorithms: METACOC and METACOCK. METACOC was compared against Kmeans, ACOC and PAM as nonadaptive algorithms (i.e. algorithms that required a fixed number of clusters), whereas METACOCK was compared against EMBIC, Clues and PAMK as adaptive algorithms (i.e. algorithms that do not required a fixed number of clusters).
4.1 Datasets

synthetic dataset 1: This dataset corresponds to points in a twodimensional Euclidean space, where nine clusters of points, each derived from a twodimensional Gaussian distribution, were generated. There are three Gaussians which are closer than the rest. This dataset has 450 instances, and it is illustrated in the topleft plot in Fig. 4;

synthetic dataset 2: This second dataset is generated analogously to dataset 1 (nine clusters of points), but with additional noisy data in the background. This dataset has 550 instances, and it is illustrated in the topright plot in Fig. 4;

synthetic dataset 3: This dataset is composed of three twodimensional Gaussian distributions, which are well separated. This dataset has 150 instances, and it is illustrated in the bottomcentre plot in Fig. 4.
Description of the UCI datasets used in the experiments
Name  Attributes  Classes  Instances 

Breast cancer (BC)  9  2  699 
Breast tissue (BT)  9  6  106 
Ecoli (Ec)  7  6  336 
Glass (Gl)  9  6  214 
Haberman (Hb)  3  2  306 
Hayes (Hy)  5  6  132 
Hepatitis (Hp)  19  2  155 
Ionosphere (Io)  34  2  351 
Iris (Ir)  4  3  150 
Lenses (Le)  4  3  24 
Libras (Li)  90  15  360 
Lung cancer (LC)  56  3  32 
Mammographic (Mm)  5  2  961 
Musk (Mu)  166  2  476 
Onehr (Oh)  28  2  1867 
Page blocks (PB)  10  5  5473 
Seeds (Se)  7  3  210 
Sonar (So)  60  2  208 
Vertebral column (VC)  6  3  310 
Wine (Wi)  13  3  178 
Description of the UCR datasets used in the experiments
Name  Attributes  Classes  Instances 

ArrowHead (AH)  251  3  211 
BirdChicken (BC)  512  2  40 
CBF (CB)  128  3  930 
Coffee (Co)  286  2  56 
ECGFive (EF)  136  2  884 
Ham (Ha)  431  2  214 
Herring (He)  512  2  128 
ItalyPowerDemand (IP)  24  2  1106 
Lighting2 (Lt)  637  2  121 
SonyAIBORobot (SA)  70  2  621 
4.2 Experimental setup
This section briefly describes the selected algorithms used for comparison. ACOC (Kao and Cheng 2006) is an ACO clustering algorithm based on centroids. ACOC uses a pheromone matrix to store the relationship between the data instances and the centroid labels, where ants assign each data instance to one of the available clusters and cluster centroids are adjusted based on this assignment. Comparing ACOC to METACOC and METACOCK, both METACOC and METACOCK use a different construction graph, where an ant chooses whether an instance is a medoid or not (i.e. it is always a binary decision regardless of the number of clusters).
Kmeans (MacQueen 1967) is an iterative algorithm based on centroids, which are randomly selected at the beginning. The goal of the algorithm is to find the best centroid positions. It is executed in two steps: in the first step, it assigns the data to the closest centroid (cluster); in the second step, it calculates the new position of each centroid as the centroid of the data that have been assigned to it.
PAM (Kaufman and Rousseeuw 1987) is similar to Kmeans, but it uses medoids instead of centroids. PAM can work with a dissimilarity/similarity matrix, which is used to calculate the overall cost of a cluster. PAMK (Kaufman and Rousseeuw 2009) is an extension of PAM, which calculates the number of clusters using the silhouette as a decision metric.
EMBIC (Fraley and Raftery 2007) combines EM with the Bayesian information criterion (BIC). The EM algorithm tries to optimize the parameters of an estimator (in this case, Gaussian Mixture Models), and BIC adds a penalty to the likelihood based on the number of parameters. This is helpful when the number of clusters needs to be controlled. Finally, Clues (Wang et al. 2007) creates a cluster per data instance and merges the clusters according to the silhouette metric.
We used the R standard implementation^{2} of Kmeans, PAM, PAMK, EMBIC and Clues: for each algorithm, the number of iterations was set to 100 and the remaining parameters were used with their default values; the initial centroids for Kmeans were randomly chosen. The parameters of ACOC, METACOC and METACOCK algorithms have been set in a similar way as in the original work (Kao and Cheng 2006): the number on ants is 1000, the number of elitist ants is 10, the exploitation probability (\(q_0\)) is 0.0001, the initial pheromone values follow a uniform distribution in [0.7, 0.8], \(\beta = 2.0\) (only used by ACOC), \(\rho = 0.1\) and the maximum number of iterations is 1000.
The evaluation of the experiments has been focused on two different criteria: on one hand, the synthetic datasets have been evaluated according to the cluster discrimination and the performance of the algorithm to discriminate the original clusters in the noisy case; on the other hand, the realworld datasets have been evaluated using the silhouette metric, which is optimized directly by the PAMK, EMBIC, Clues and METACOCK algorithms, and indirectly by the remaining algorithms (Kmeans, PAM, ACOC and METACOC) when they optimize the cost function defined by the Euclidean metric.
4.3 Synthetic experiments
Average results of the application of the algorithms to the synthetic datasets in adjusted rand index terms, calculated over 100 executions (average \(\pm \) SD); no SD is shown for an algorithm when all values are lower than 0.001
Kmeans  ACOC  PAM  METACOC  

Synthetic 1  \(0.812 \pm 0.088\)  \(0.922 \pm 0.017\)  0.975  \(\mathbf{0.992} \pm 0.002\) 
Synthetic 2  \(0.783 \pm 0.080\)  \(0.892 \pm 0.030\)  0.955  \(\mathbf{0.963} \pm 0.005\) 
Synthetic 3  \(0.812 \pm 0.237\)  \(\mathbf{1.0} \pm 0.000\)  1.0  \(\mathbf{1.0} \pm 0.000\) 
EMBIC  Clues  PAMK  METACOCK  

Synthetic 1  0.985 (9)  0.892 (15)  0.975 (9)  \(0.967 \pm 0.041\) (9) 
Synthetic 2  0.667 (9)  0.959 (9)  0.928 (10)  \(0.954 \pm 0.011\) (9) 
Synthetic 3  1.0 (3)  0.293 (12)  1.0 (3)  \(\mathbf{1.0} \pm 0.000\) (3) 
Median value for the adjusted rand index on the synthetic datasets
Kmeans  ACOC  PAM  METACOC  

Synthetic 1  0.833  0.943  0.975  0.995 
Synthetic 2  0.810  0.914  0.955  0.963 
Synthetic 3  1.0  1.0  1.0  1.0 
EMBIC  Clues  PAMK  METACOCK  

Synthetic 1  0.985  0.892  0.975  0.995 
Synthetic 2  0.667  0.959  0.928  0.969 
Synthetic 3  1.0  0.293  1.0  1.0 
Table 3 shows that METACOC is the algorithm that is able to clearly discriminate the data in all three datasets, achieving the highest average adjusted rand index of all algorithms. METACOCK also performs well overall, although it seems to have more problems discriminating the cluster boundaries on the synthetic dataset 1. PAM and PAMK obtain similar performances, but PAMK has problems in identifying the correct number of clusters on synthetic dataset 2. This is also the case for EMBIC, which performs well on synthetic dataset 1 and synthetic dataset 3, but has problems on synthetic dataset 2. Clues is the algorithm that achieved the lowest average in synthetic dataset 3, since it generates several clusters—many more than the existing clusters in the data—during the discrimination process (12 cluster); it achieves a good performance in the remaining datasets. ACOC performs well overall, with the exception of synthetic dataset 2, where it has problems discriminating the cluster centres. Kmeans has problems in all three datasets: while it managed to discriminate the clusters in the majority of the runs, it seems to be more sensitive to the initial centroids’ positions, as can be noticed by its lower average and higher standard deviation values.
Highest value for the adjusted rand index on the synthetic datasets
Kmeans  ACOC  PAM  METACOC  

Synthetic 1  0.995  0.995  0.975  1.0 
Synthetic 2  0.955  0.947  0.955  0.972 
Synthetic 3  1.0  1.0  1.0  1.0 
EMBIC  Clues  PAMK  METACOCK  

Synthetic 1  0.985  0.892  0.975  1.0 
Synthetic 2  0.667  0.959  0.928  0.972 
Synthetic 3  1.0  0.293  1.0  1.0 
These results show that the proposed algorithms are able to find good results when compared with classical algorithms using synthetic datasets and in general achieved better results than ACOC.
4.4 Experiments with realworld datasets
Average results of the application of the nonadaptive algorithms to the UCI datasets in silhouette metric terms (average \(\pm \) SD)
Kmeans  ACOC  PAM  METACOC  

BC  \(0.755 \pm 0.000\)  \(\mathbf 0.756 \pm 0.001\)  \(0.754 \pm 0.000\)  \(0.754 \pm 0.001\) 
BT  \(0.625 \pm 0.014\)  \(\mathbf 0.676 \pm 0.060\)  \(0.633 \pm 0.009\)  \(0.635 \pm 0.050\) 
Ec  \(0.259 \pm 0.038\)  \(0.231 \pm 0.032\)  \(\mathbf 0.263 \pm 0.022\)  \(0.230 \pm 0.028\) 
Gl  \(\mathbf 0.537 \pm 0.115\)  \(0.317 \pm 0.132\)  \(0.281 \pm 0.075\)  \(0.250 \pm 0.091\) 
Hb  \(0.47 \pm 0.003\)  \(0.471 \pm 0.002\)  \(0.472 \pm 0.000\)  \(\mathbf 0.474 \pm 0.003\) 
Hy  \(\mathbf 0.668 \pm 0.000\)  \(0.668 \pm 0.001\)  \(0.665 \pm 0.000\)  \(0.667 \pm 0.001\) 
Hp  \(0.536 \pm 0.049\)  \(0.549 \pm 0.003\)  \(\mathbf 0.559 \pm 0.015\)  \(0.550 \pm 0.009\) 
Io  \(\mathbf 0.270 \pm 0.000\)  \(0.263 \pm 0.004\)  \(0.266 \pm 0.000\)  \(0.265 \pm 0.008\) 
Ir  \(0.562 \pm 0.013\)  \(0.562 \pm 0.004\)  \(\mathbf 0.564 \pm 0.007\)  \(0.562 \pm 0.005\) 
Le  \(\mathbf 0.175 \pm 0.033\)  \(\mathbf 0.175 \pm 0.011\)  \(0.124 \pm 0.012\)  \(0.129 \pm 0.031\) 
Li  \(\mathbf 0.236 \pm 0.017\)  \(0.115 \pm 0.035\)  \(0.211 \pm 0.007\)  \(0.199 \pm 0.030\) 
LC  \(\mathbf 0.081 \pm 0.014\)  \(0.073 \pm 0.020\)  \(0.041 \pm 0.010\)  \(0.037 \pm 0.000\) 
Mm  \(\mathbf 0.608 \pm 0.002\)  \(\mathbf 0.608 \pm 0.001\)  \(\mathbf 0.608 \pm 0.001\)  \(\mathbf 0.608 \pm 0.002\) 
Mu  \(0.394 \pm 0.037\)  \(0.393 \pm 0.012\)  \(0.382 \pm 0.021\)  \(\mathbf 0.399 \pm 0.022\) 
Oh  \(\mathbf 0.372 \pm 0.002\)  \(0.365 \pm 0.026\)  \(0.361 \pm 0.000\)  \(\mathbf 0.372 \pm 0.003\) 
PB  \(\mathbf 0.649 \pm 0.030\)  \(0.561 \pm 0.035\)  \(0.551 \pm 0.015\)  \(0.565 \pm 0.021\) 
Se  \(0.529 \pm 0.024\)  \(0.53 \pm 0.001\)  \(0.524 \pm 0.008\)  \(\mathbf 0.531 \pm 0.004\) 
So  \(0.187 \pm 0.000\)  \(0.194 \pm 0.006\)  \(\mathbf 0.215 \pm 0.000\)  \(\mathbf 0.215 \pm 0.001\) 
VC  \(\mathbf 0.398 \pm 0.021\)  \(0.382 \pm 0.006\)  \(0.348 \pm 0.011\)  \(0.377 \pm 0.013\) 
Wi  \(0.629 \pm 0.004\)  \(0.636 \pm 0.002\)  \(\mathbf 0.637 \pm 0.002\)  \(0.636 \pm 0.003\) 
Average results of the application of the adaptive algorithms to the UCI datasets in silhouette metric terms (average \(\pm \) SD); no standard deviation is shown for an algorithm when all values are lower than 0.001
EMBIC  Clues  PAMK  METACOCK  

BC  0.037 (9)  0.045 (21)  0.754 (2)  \(0.728 \pm 0.010\) (2) 
BT  0.855 (2)  0.878 (2)  0.792 (4)  \(\mathbf 0.926 \pm 0.000\) (2) 
Ec  0.391 (5)  0.193 (7)  0.444 (3)  \(\mathbf 0.451 \pm 0.028\) (3) 
Gl  0.033 (5)  0.129 (5)  0.675 (3)  \(\mathbf 0.697 \pm 0.051\) (\(2.60 \pm 0.49\)) 
Hb  0.193 (2)  0.173 (6)  0.472 (2)  \(\mathbf 0.609 \pm 0.053\) (2) 
Hy  0.523 (1)  0.574 (4)  0.701 (2)  \(\mathbf 0.701 \pm 0.000\) (2) 
Hp  0.163 (6)  0.218 (3)  0.549 (2)  \(\mathbf 0.774 \pm 0.000\) (2) 
Io  0.138 (9)  0.029 (5)  0.369 (3)  \(0.265 \pm 0.019\) (\(2.08 \pm 0.34\)) 
Ir  0.707 (2)  0.557 (3)  0.709 (2)  \(\mathbf 0.711 \pm 0.036\) (2) 
Le  0.013 (1)  0.112 (1)  0.134 (3)  \(\mathbf 0.311 \pm 0.000\) (2) 
Li  0.111 (1)  0.216 (7)  0.268 (10)  \(0.195 \pm 0.034\) (2) 
LC  0.011 (2)  0.032 (1)  0.057 (6)  \(\mathbf 0.109 \pm 0.004\) (2) 
Mm  0.121 (8)  0.193 (32)  0.608 (2)  \(\mathbf 0.612 \pm 0.012\) (2) 
Mu  0.398 (3)  0.111 (14)  0.399 (3)  \(\mathbf 0.403 \pm 0.013\) (\(2.06 \pm 0.23\)) 
Oh  0.008 (4)  0.050 (38)  0.361 (2)  \(\mathbf 0.548 \pm 0.051\) (2) 
PB  0.293 (7)  0.275 (113)  0.842 (2)  \(\mathbf 0.851 \pm 0.032\) (2) 
Se  0.325 (5)  0.529 (3)  0.602 (2)  \(\mathbf 0.610 \pm \) 0.004 (2) 
So  0.033 (1)  0.201 (3)  0.215 (2)  \(0.191 \pm 0.026\) (2) 
VC  0.108 (4)  0.194 (5)  0.472 (2)  \(\mathbf 0.567 \pm 0.019\) (2) 
Wi  0.553 (4)  0.578 (4)  0.700 (2)  \(\mathbf 0.728 \pm 0.004\) (2) 
Highest value for the silhouette metric on the UCI datasets
Kmeans  ACOC  PAM  METACOC  

BC  0.755  0.757  0.754  0.754 
BT  0.705  0.753  0.642  0.720 
Ec  0.362  0.292  0.285  0.335 
Gl  0.558  0.534  0.355  0.539 
Hb  0.477  0.475  0.472  0.477 
Hy  0.668  0.670  0.665  0.669 
Hp  0.774  0.557  0.574  0.550 
Io  0.270  0.274  0.266  0.270 
Ir  0.599  0.566  0.571  0.570 
Le  0.230  0.181  0.136  0.134 
Li  0.279  0.152  0.218  0.250 
LC  0.105  0.112  0.051  0.037 
Mm  0.613  0.610  0.609  0.613 
Mu  0.398  0.393  0.403  0.401 
Oh  0.376  0.368  0.361  0.372 
PB  0.810  0.564  0.566  0.567 
Se  0.530  0.536  0.532  0.541 
So  0.187  0.206  0.215  0.221 
VC  0.487  0.394  0.359  0.396 
Wi  0.643  0.636  0.639  0.640 
EMBIC  Clues  PAMK  METACOCK  

BC  0.037  0.045  0.754  0.743 
BT  0.855  0.878  0.792  0.926 
Ec  0.391  0.193  0.444  0.453 
Gl  0.033  0.129  0.675  0.717 
Hb  0.193  0.173  0.472  0.663 
Hy  0.523  0.574  0.701  0.702 
Hp  0.163  0.218  0.549  0.774 
Io  0.138  0.029  0.369  0.294 
Ir  0.707  0.557  0.709  0.712 
Le  0.013  0.112  0.134  0.311 
Li  0.111  0.216  0.268  0.235 
LC  0.011  0.032  0.057  0.126 
Mm  0.121  0.193  0.608  0.635 
Mu  0.398  0.111  0.399  0.403 
Oh  0.008  0.050  0.361  0.653 
PB  0.293  0.275  0.842  0.862 
Se  0.325  0.529  0.602  0.614 
So  0.033  0.201  0.215  0.282 
VC  0.108  0.194  0.472  0.629 
Wi  0.553  0.578  0.700  0.733 
Average results of the best Kmeans run computed over 30 restarts, and a single run of METACOC and METACOCK on the UCI datasets in silhouette metric terms (average \(\pm \) SD)
Kmeans  METACOC  METACOCK  

BC  \(0.755 \pm 0.000\)  \(\mathbf 0.758 \pm 0.002\)  \(0.728 \pm 0.010\) 
BT  \(0.635 \pm 0.004\)  \(0.639 \pm 0.051\)  \(\mathbf 0.926 \pm 0.000\) 
Ec  \(0.303 \pm 0.007\)  \(0.227 \pm 0.022\)  \(\mathbf 0.451 \pm 0.028\) 
Gl  \(0.541 \pm 0.002\)  \(0.286 \pm 0.083\)  \(\mathbf 0.697 \pm 0.051\) 
Hb  \(0.477 \pm 0.003\)  \(0.488 \pm 0.002\)  \(\mathbf 0.609 \pm 0.053\) 
Hy  \(0.668 \pm 0.000\)  \(0.669 \pm 0.002\)  \(\mathbf 0.701 \pm 0.000\) 
Hp  \(0.676 \pm 0.007\)  \(0.540 \pm 0.005\)  \(\mathbf 0.774 \pm 0.000\) 
Io  \(0.270 \pm 0.000\)  \(\mathbf 0.271 \pm 0.003\)  \(0.265 \pm 0.019\) 
Ir  \(0.577 \pm 0.004\)  \(0.557 \pm 0.004\)  \(\mathbf 0.711 \pm 0.036\) 
Le  \(0.200 \pm 0.002\)  \(0.117 \pm 0.027\)  \(\mathbf 0.311 \pm 0.000\) 
Li  \(\mathbf 0.243 \pm 0.001\)  \(0.194 \pm 0.028\)  \(0.195 \pm 0.034\) 
LC  \(0.095 \pm 0.011\)  \(0.047 \pm 0.000\)  \(\mathbf 0.109 \pm 0.004\) 
Mm  \(0.613 \pm 0.001\)  \(\mathbf 0.618 \pm 0.003\)  \(0.612 \pm 0.012\) 
Mu  \(0.398 \pm 0.002\)  \(0.401 \pm 0.020\)  \(\mathbf 0.403 \pm 0.013\) 
Oh  \(0.376 \pm 0.002\)  \(0.379 \pm 0.006\)  \(\mathbf 0.548 \pm 0.051\) 
PB  \(0.751 \pm 0.008\)  \(0.575 \pm 0.017\)  \(\mathbf 0.851 \pm 0.032\) 
Se  \(0.530 \pm 0.003\)  \(0.541 \pm 0.009\)  \(\mathbf 0.610 \pm 0.004\) 
So  \(0.187 \pm 0.000\)  \(\mathbf 0.215 \pm 0.001\)  \(0.191 \pm 0.026\) 
VC  \(0.417 \pm 0.007\)  \(0.367 \pm 0.017\)  \(\mathbf 0.567 \pm 0.019\) 
Wi  \(0.634 \pm 0.002\)  \(0.646 \pm 0.009\)  \(\mathbf 0.728 \pm 0.004\) 
Average results of the application of the adaptive algorithms to the UCR time series datasets in silhouette metric terms (average \(\pm \) SD); no SD is shown for an algorithm when all values are lower than 0.001
PAM  METACOC  

AH  \(21.88 \pm 0.010\)  \(\mathbf 22.20 \pm 0.018\)  
BC  \(\mathbf 34.10 \pm 0.000\)  \(\mathbf 34.10 \pm 0.000\)  
CB  \(13.56 \pm 0.025\)  \(\mathbf 14.55 \pm 0.018\)  
Co  \(\mathbf 28.78 \pm 0.013\)  \(\mathbf 28.78 \pm 0.000\)  
EF  \(\mathbf 40.31 \pm 0.001\)  \(40.30 \pm 0.002\)  
Ha  \(\mathbf 10.98 \pm 0.012\)  \(10.33 \pm 0.012\)  
He  \(32.44 \pm 0.022\)  \(\mathbf 32.66 \pm 0.004\)  
IP  \(63.08 \pm 0.044\)  \(\mathbf 63.25 \pm 0.003\)  
Lt  \(\mathbf 21.06 \pm 0.027\)  \(15.02 \pm 0.024\)  
SA  \(7.80 \pm 0.021\)  \(\mathbf 10.54 \pm 0.039\) 
Clues  PAMK  METACOCK  

AH  11.53 (5)  46.99 (2)  \(\mathbf 74.58 \pm 0.025\) (2) 
BC  0 (1)  35.57 (8)  \(\mathbf 35.58 \pm 0.012\) (2) 
CB  8.51 (19)  23.74 (2)  \(\mathbf 27.51 \pm 0.007\) (3) 
Co  0 (1)  28.78 (2)  \(\mathbf 32.03 \pm 0.047\) (2) 
EF  20.91 (20)  40.31 (2)  \(\mathbf 40.40 \pm 0.007\) (2) 
Ha  6.57 (4)  10.98 (2)  \(\mathbf 25.81 \pm 0.065\) (2) 
He  33.05 (2)  32.84 (2)  \(32.16 \pm 0.007\) (2) 
IP  14.32 (24)  63.08 (2)  \(\mathbf 64.14 \pm 0.002\) (2) 
Lt  9.73 (3)  21.06 (2)  \(\mathbf 21.37 \pm 0.017\) (2) 
SA  6.16 (13)  15.83 (4)  \(\mathbf 16.56 \pm 0.062\) (2) 
Table 7 shows the experimental results for the datasets when the adaptive algorithms are considered. This table shows that METACOCK obtains statistically significantly better results than PAMK in 15 of the 20 datasets, while achieving statistically significantly worse results in only 4. When METACOCK is compared with the rest of the adaptive algorithms, it obtains better results than both EMBIC and Clues—with the exception of the So and Li datasets, where Clues obtains better results.
Average computational time (average \(\pm \) SD) in seconds taken by METACOC and METACOCK on the UCI datasets
METACOC  METACOCK  

BC  \(\mathbf 10.11 \pm 0.042\)  \(17.52 \pm 0.073\) 
BT  \( \mathbf 1.41 \pm 0.001\)  \( 1.95 \pm 0.010\) 
Ec  \( \mathbf 4.88 \pm 0.018\)  \(11.37 \pm 0.042\) 
Gl  \( \mathbf 2.33 \pm 0.005\)  \( 2.89 \pm 0.015\) 
Hb  \( \mathbf 4.20 \pm 0.012\)  \( 9.31 \pm 0.033\) 
Hy  \( \mathbf 1.87 \pm 0.001\)  \( 1.99 \pm 0.004\) 
Hp  \( \mathbf 1.92 \pm 0.001\)  \( 3.01 \pm 0.006\) 
Io  \( \mathbf 5.06 \pm 0.008\)  \(12.27 \pm 0.062\) 
Ir  \( \mathbf 2.11 \pm 0.003\)  \( 2.31 \pm 0.007\) 
Le  \( \mathbf 0.31 \pm 0.000\)  \( 0.45 \pm 0.000\) 
Li  \( \mathbf 3.98 \pm 0.009\)  \( 8.81 \pm 0.031\) 
LC  \( \mathbf 0.40 \pm 0.000\)  \( 0.51 \pm 0.000\) 
Mm  \(21.20 \pm 0.029\)  \(\mathbf 20.40 \pm 0.068\) 
Mu  \( \mathbf 8.53 \pm 0.011\)  \( 18.20 \pm 0.082\) 
Oh  \(\mathbf 21.70 \pm 0.029\)  \( 49.10 \pm 0.101\) 
PB  \(\mathbf 45.10 \pm 0.081\)  \(100.40 \pm 0.192\) 
Se  \( \mathbf 1.95 \pm 0.002\)  \( 3.33 \pm 0.005\) 
So  \( \mathbf 2.55 \pm 0.003\)  \( 2.72 \pm 0.003\) 
VC  \( {} \mathbf 5.22 \pm 0.009\)  \( 7.89 \pm 0.026\) 
Wi  \( {} \mathbf 2.33 \pm 0.001\)  \( 2.51 \pm 0.005\) 
We also compared the best results of Kmeans against a single run of METACOC and METACOCK. This comparison presents a balance between the computational time and the performance of the algorithms, given that the proposed algorithms use a more timeconsuming ACO procedure where multiple candidate solutions are evaluated, while Kmeans employs a faster local search strategy. The results are presented in Table 9. A value in the table corresponds to the average of the best Kmeans value over 30 executions (where the best value is determined over 30 restarts for each execution) and a single execution of METACOC and METACOCK. The results show that METACOCK is the best of the ACObased algorithms, achieving statistically significantly better results than Kmeans in 14 of the 20 datasets and statistically significantly worse results in only one dataset; in the remaining 5 datasets, no statistically significant differences were detected. In this case is evident the advantage of the ACO procedure, since it leads to the creation of high quality solutions. The results obtained by METACOC are mixed: Kmeans is statistically significantly better than METACOC in 9; Kmeans is statistically significantly worse than METACOC in 5 datasets; and they have similar performances in 4 datasets. Given the stochastic nature of the ACO search, better results might be obtained by multiple executions of METACOC, at the cost of a higher computational time.
Overall, we consider the results presented in Tables 6, 7, 8 and 9 positive. In summary, METACOC shows statistically significant improvements over PAM; METACOCK, the proposed algorithm that can adapt the number of clusters, obtains the highest results of all the algorithms in 17 of the 20 datasets. More importantly, it statistically significantly outperforms PAMK in 15 of the 20 datasets.
4.5 Time series experiments
In this section, we present a set of experiments focused on a specific domain where medoidbased approaches have been successful: time series analysis (Liao 2005). We have selected ten datasets from the UCR Time Series Classification Archive (Chen et al. 2015). Details of these datasets are presented in Table 2. The similarity matrix derived from the alignment between two time series is generated applying the Dynamic Time Wrapping distance (Keogh and Ratanamahatana 2005). Table 10 shows the experimental results for the medoidbased algorithms: PAM, METACOC, Clues, PAMK and METACOCK. The values in this table represent the average and standard deviation (average \(\pm \) SD) over 100 executions; no standard deviation is shown for an algorithm when all values are lower than 0.001 (Clues and PAMK results).
4.6 Computational time
Table 11 shows the average computational time (average \(\pm \) SD) in seconds taken by METACOC and METACOCK on the UCI datasets over a fixed number of iterations. The algorithms are around 10 times slower than Kmeans, 6 times slower than PAM and Clues, 4 times slower than PAMK and similar to EMBIC. Overall, METACOC is faster than METACOCK. We were expecting a higher computational time for METACOCK, since the algorithm explores solutions with different values for k and it uses a more complex evaluation function. In our observations, both METACOC and METACOCK are generally faster than ACOC. We attribute this to the simplified construction process compared to ACOC. As soon as the algorithm selects k medoids (where k is the number of clusters), the solution construction process stops, while ACOC must visit all instances of the dataset to create a solution.
Figure 5 illustrates the convergence of METACOC and METACOCK. It is interesting to note that METACOCK converges faster than METACOC, while being slower than METACOC over the same number of iterations. This suggests that the computational time of METACOCK can be improved by using a smaller number of iterations to reduce its overall computation time, without negative impact on its performance.
5 Conclusions and future work
In this paper, we proposed two medoidbased ACO clustering algorithms, METACOC and METACOCK. Medoidbased clustering algorithms only need the distances/similarities among data to find a solution and they are more robust to outliers. One of the main advantages of medoidbased algorithms is that they can directly be applied to problems where the features of data cannot be easily represented in a multidimensional space. The first algorithm, called METACOC, uses an ACO procedure to determine an optimal medoid set (METACOC algorithm). The second algorithm, called METACOCK, uses an automatic selection of the number of clusters, useful for problems where the number of cluster is not known a priori.
We compared the proposed algorithms against classical clustering algorithms, both centroid and medoidbased, in synthetic and realworld datasets. METACOC results were positive, statistically significantly outperforming PAM in 8 out of 20 realworld datasets and achieving competitive results against (centroidbased) Kmeans and ACOC algorithms, while using only the information about the distance among the data instances. METACOCK results were also positive: it statistically significantly outperformed PAMK in 15 out of the 20 realworld datasets. METACOCK was also the algorithm that consistently achieved the best results in the realworld datasets in the experiments optimizing the silhouette metric. Concerning the time series datasets, METACOC shows better performance compared with PAM overall, achieving statistically significantly better results in two datasets and statistically significantly worse results in only one dataset; METACOCK achieved statistically significantly better results than Clues in 9 out of 10 datasets and than PAMK in 6 out of 10 datasets, with no statistically significant differences detected in the remaining datasets.
There are several future research directions. Both METACOC and METACOCK do not employ heuristic information during the construction process—it would be interesting to investigate whether the search can be further improved by such information. Exploring the use of different cluster evaluation measures to improve the number of clusters selection in METACOCK is also another interesting research direction—this can be evaluated in an automatic configuration setting (LópezIbáñez et al. 2011). At the moment, the selection of the number of clusters is not part of the construction graph, and therefore, it is not influenced by pheromone values—adding the selection to the construction graph might improve the search. Finally, the application of the algorithms in largescale data analysis tasks is also a research direction worth further exploration.
Footnotes
Notes
Acknowledgments
The authors would like to thank the anonymous reviewers and the associate editor for their valuable comments and suggestions. This work is supported by the Spanish Ministry of Science and Education under Project Code TIN201456494C44P, Comunidad Autonoma de Madrid under project CIBERDINE S2013/ICE3095, United Kingdom government by the EPSRC project SeMaMatch EP/K032623/1 and Savier—an Airbus Defence & Space project (FUAM076914 and FUAM076915).
References
 Ashok, L., & Messinger, D. W. (2012). A spectral image clustering algorithm based on ant colony optimization. In Proceedings of Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVIII (SPIE 8390) (pp. 1–10). International Society for Optics and Photonics.Google Scholar
 Cao, L. (2010). Domaindriven data mining: Challenges and prospects. IEEE Transactions on Knowledge and Data Engineering, 22(6), 755–769.CrossRefGoogle Scholar
 Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., & Batista, G. (2015). The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/.
 Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39(1), 1–38.MathSciNetMATHGoogle Scholar
 Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Machine Learning Research, 7, 1–30.MathSciNetMATHGoogle Scholar
 Dorigo, M., & Gambardella, L. (1997). Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1), 53–66.CrossRefGoogle Scholar
 Dorigo, M., & Stützle, T. (2004). Ant colony optimization. Cambridge, MA: MIT Press.MATHGoogle Scholar
 Fernandes, C., Mora, A., Merelo, J., Ramos, V., Laredo, J., & Rosa, A. (2008). KANTS: Artifical ant system for classification. In M. Dorigo, M. Birattari, C. Blum, M. Clerc, T. Stützle, & A. Winfield (Eds.) Ant Colony Optimization and Swarm Intelligence: 6th International Conference (ANTS 2008), LNCS (vol. 5217, pp. 339–346). Springer.Google Scholar
 Fraley, C., & Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and modelbased clustering. Journal of Classification, 24(2), 155–181.MathSciNetCrossRefMATHGoogle Scholar
 França, F., Coelho, G., & Zuben, F. (2008). bicACO: An ant colony inspired biclustering algorithm. In M. Dorigo, M. Birattari, C. Blum, M. Clerc, T. Stützle, & A. Winfield (Eds.), Ant colony optimization and swarm intelligence, LNCS (Vol. 5217, pp. 401–402). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
 Frank, A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml.
 Hamdi, A., Antoine, V., Monmarché, N., Alimi, A., & Slimane, M. (2010). Artificial ants for automatic classification. In N. Monmarché, F. Guinand, & P. Siarry (Eds.), Artificial ants: From collective intelligence to real life optimization and beyond, Chapter 13 (pp. 265–290). London: ISTEWiley.Google Scholar
 Handl, J., Knowles, J., & Dorigo, M. (2006). Antbased clustering and topographic mapping. Artificial Life, 12(1), 35–62.CrossRefGoogle Scholar
 Herrmann, L., & Ultsch, A. (2008) The architecture of antbased clustering to improve topographic mapping. In M. Dorigo, M. Birattari, C. Blum, M. Clerc, T. Stützle, & A. Winfield (Eds.) Ant colony optimization and swarm intelligence: 6th international conference (ANTS 2008), LNCS (vol. 5217, pp. 379–386). Springer.Google Scholar
 Hruschka, E., Campello, R., Freitas, A., & de Carvalho, A. (2009). A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 39(2), 133–155.CrossRefGoogle Scholar
 Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefMATHGoogle Scholar
 Jafar, O. M., & Sivakumar, R. (2010). Antbased clustering algorithms: A brief survey. International Journal of Computer Theory and Engineering, 2(5), 787–796.CrossRefGoogle Scholar
 Kao, Y., & Cheng, K. (2006) An ACObased clustering algorithm. In M. Dorigo, L. Gambardella, M. Birattari, A. Martinoli, R. Poli, & T. Stützle (Eds.) Ant colony optimization and swarm intelligence: 5th international conference (ANTS 2006), LNCS (vol. 4150, pp. 340–347). Springer.Google Scholar
 Kaufman, L., & Rousseeuw, P. (1987). Clustering by means of medoids. No. 87 in Reports of the Faculty of Mathematics and Informatics. Delft University of Technology.Google Scholar
 Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: An introduction to cluster analysis (Vol. 344). New Jersey: Wiley.Google Scholar
 Keogh, E., & Ratanamahatana, C. A. (2005). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358–386.CrossRefGoogle Scholar
 Larose, D. T. (2005). Discovering knowledge in data. New Jersey: Wiley.Google Scholar
 Liao, T. (2005). Clustering of time series data—A survey. Pattern Recognition, 38(11), 1857–1874.CrossRefMATHGoogle Scholar
 LópezIbáñez, M., DuboisLacoste, J., Stützle, T., & Birattari, M. (2011). The irace package: Iterated racing for automatic algorithm configuration. Technical Report No. TR/IRIDIA/2011004, IRIDIA, Université Libre de Bruxelles. http://iridia.ulb.ac.be/IridiaTrSeries/IridiaTr2011004.pdf.
 MacQueen, J. B. (1967). Some methods of classification and analysis of multivariate observations. In Proceedings of the fifth berkeley symposium on mathematical statistics and probability (pp. 281–297). University of California Press.Google Scholar
 Martens, D., Baesens, B., & Fawcett, T. (2011). Editorial survey: Swarm intelligence for data mining. Machine Learning, 82(1), 1–42.MathSciNetCrossRefGoogle Scholar
 Menéndez, H., Barrero, D., & Camacho, D. (2014). A coevolutionary multiobjective approach for a Kadaptive graphbased clustering algorithm. In 2014 IEEE congress on evolutionary computation (CEC) (pp. 2724–2731). Piscataway, NJ: IEEE Press.Google Scholar
 Menéndez, H., BelloOrgaz, G., & Camacho, D. (2013). Extracting behavioural models from 2010 FIFA world cup. Journal of Systems Science and Complexity, 26(1), 43–61.CrossRefGoogle Scholar
 Menéndez, H. D., Barrero, D. F., & Camacho, D. (2014). A genetic graphbased approach for partitional clustering. International Journal of Neural Systems, 24(3), 1–19.CrossRefGoogle Scholar
 Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.CrossRefMATHGoogle Scholar
 Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423.MathSciNetCrossRefMATHGoogle Scholar
 Wang, X., Qiu, W., & Zamar, R. H. (2007). CLUES: A nonparametric clustering method based on local shrinking. Computational Statistics & Data Analysis, 52(1), 286–298.MathSciNetCrossRefMATHGoogle Scholar
 Witten, H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). Morgan Kaufmann.Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.