1 Introduction

In the last decades, many research studies focus their attention on how a group of subjects ranks a list of objects according to each personal preference (the rank vector). Hence, a rank vector is a permutation of the positive integers 1,…,k. More rigorously, given k items to rank, labelled 1,…,k, a ranking π is a mapping function from the set of items 1,…,k to the set of ranks 1,…,k, where π(i) is the rank given to item i. If \(\pi (i)< \pi (i^{\prime })\), then item i is said to be preferred to item \(i^{\prime }\). An alternative format for ranking data is the ordering vector that is, more simply, the inverse function of ranking π so that the generic π− 1(j) denotes the item ranked in the jth position.

A judge can assign a distinct value to each of the k items defining a complete ranking or he can specify only m < k items, defining a partial ranking. This is the case where the judge specifies only his most-liked m < k items, leaving the remaining ones undefined. Different is the case of ties, when a judge assigns the same integer value to two or more items (He evaluates them equally).

Therefore, we could be interested, for example, to rank candidates in an election or to rank shopping items in order to identify the pattern of preferences as well as the most representative permutation. More generally, involving the concept of preference, this kind of data has a broad field of application: from Marketing, Risk and Credit scoring to Psychology, Sociology, Politics, and so on.

Preference analysis literature mainly focuses on modelling the probability for certain preference structures under the assumption of a homogeneous population. The relative statistical models can be classified in four classes (Critchlow et al., 1991; Marden, 1995): order statistics models (Thurstone, 1927), rankings induced by paired comparisons (Bradley, 1984; Bradley & Terry, 1952), distance-based models (Smith, 1950; Mallows, 1957), and multistage models (Luce, 1959; Plackett, 1975; Fligner & Verducci, 1988).

In this work, we assume the heterogeneity in the population of judges and, therefore, the goal is to identify subgroups of judges that are homogeneous in terms of their preferences following a model-free clustering approach in the fuzzy framework.

We use the Kemeny distance (Kemeny & Snell, 1962) to enhance flexibility and model applicability. Unlike Kendall distance, it is a metric also in the more general case of tied rankings while the Kendall distance satisfies the triangular inequality only with full rankings (Emond & Mason, 2002). However, as pointed out by Heiser and D’Ambrosio (2013), in the latter case, the two distances become equivalent being the Kemeny distance twice the Kendall distance. But handling preference data under the assumption of a heterogeneous population is only one of the main goals of this work; in this sense, our methodology has a twofold scope: to adapt the fuzzy C-Medoids (FCMd) method to the case of ranking data and to account for noisy data and outliers.

To the best of our knowledge, there is no extensive literature about clustering techniques for preference data using a model-free approach; the most similar approach is that proposed, recently, by D’Ambrosio and Heiser (2019) in which a K-Median Cluster Component Analysis based on the Kemeny distance is presented. In the work of Müllensiefen et al. (2018), there is an application of the Partitioning Around Medoids (PAM) algorithm for clustering rankings based on a weighted footrule distance. Other interesting works based on the distance between rankings in hierarchical cluster analysis can be found in Brentari et al. (2016) and Bonanomi et al. (20172019). In contrast, most works concern a mixture model–based approach. In the class of distance-based models, the extension to mixture models for heterogeneous populations is due to Murphy and Martin (2003), Meila and Chen (2010), and Lee and Yu (2012). Other mixture model–based clustering algorithms to analyse ranking data have been proposed by Biernacki and Jacques (2013), Jacques and Biernacki (2014), and Mollica and Tardella (2017).

Franczak et al. (2016) well demonstrated the advantages of a model-based imputation procedure that simultaneously accounts for heterogeneity while imputing. They proved that their model-based approach is able to recover the group structure and key features of the datasets also in presence of a huge amount of outliers.

It is worth noting that, to the best of our knowledge, there is no literature in the field of clustering and fuzzy clustering techniques that proposes a robust metric for rankings. In order to identify homogeneous groups of judges even when some contamination, due to outliers or more generally noises, is present in the data, we propose two robust fuzzy clustering techniques both based on the same suitable exponential transformation of the Kemeny distance. They differ in the specification of the fuzziness.

In detail, the first proposal is an extension of the well-known FCMd method, which uses the “m” exponent to transform the matrix of the crisp assignment to that of membership degree (Krishnapuram et al., 1999; Krishnapuram et al., 2001) while the second one is an extension of the fuzzy c-Means clustering technique with entropy regularization (Li and Mukaidono 1995, 1999; Miyamoto & Mukaidono 1997), i.e. the Shannon entropy is introduced in the objective function to obtain a fuzzy partition avoiding the use of the exponent “m”. Both techniques are extended to deal with rankings (full and tied) and, moreover, noisy data; indeed, robustness is achieved by defining a suitable exponential transformation of the Kemeny distance. As far as the latter model is concerned, it is worth noticing that the extension to the fuzzy medoids–based version is also introduced.

We named, henceforth, the first proposal as the “Exponential Kemeny-based FCMd method” while the second one as the ”Exponential Kemeny-based FCMd method with entropy regularization”.

We argue that the proposed clustering methods could be very useful in many contexts, above all in marketing research; we believe they can be a valid instrument for market segmentation, whose goal is that of producing different strategies for different customer segments in order to offer them appropriate products and/or services. Moreover, the detection of clustered outliers could help to identify niche markets, customer segments with very particular needs and tastes.

The outline of the article is as follows. After a brief introduction to the literature about the fuzzy clustering techniques in Section 2, our methodological proposals are introduced and described in Section 3; simulation results are shown in detail in Section 4 while the application to two real datasets is proposed in Section 5. In the last section (Section 6), we address some conclusions and open research problems.

2 Fuzzy Clustering Techniques

Cluster analysis based on fuzzy theory (Zadeh, 1965) allows units to belong to more than one cluster simultaneously as opposed to the classic clustering approach whose partitions are characterized by non-empty and pairwise disjoint subsets, providing a crisp assignment of the units to the clusters.

In more detail, when an object is almost equally distant from two or more clusters, the fuzzy approach relaxes the requirement of a crisp assignment, replaced by the notion of degree of membership in a clusterFootnote 1, particularly useful in case of overlapping or when the goal is to group complex objects characterized by an unavoidable and intrinsic “imprecision”.

The interest for fuzzy clustering methods quickly grew over time so that many fuzzy clustering algorithms were proposed; seminal papers were due to Bellman et al. (1966) and Ruspini (196919701973) even if the most representative one was the fuzzy c-Means (FCM) clustering method (Dunn 1973; Bezdek 1974, 1981), widely applied to this day. The “fuzzification” is obtained by rising each unknown degree of membership to an “m” exponent which is named fuzziness coefficient because it controls the extent of fuzziness degree of the partition.

Another variant of the c-Means clustering method in a fuzzy perspective was introduced by Li and Mukaidono (19951999) and Miyamoto and Mukaidono (1997) as an alternative to the fuzzy methods based on the “m” exponent; they answer to the criticism moved by some researchers towards the role of the fuzziness coefficient often considered an artificial device, unnatural and with no physical meaning.

To overcome this limit, in the latter class of methods, fuzziness is controlled by including in the objective functionFootnote 2 an entropy regularization term (Miyamoto & Mukaidono, 1997), i.e. the Shannon entropy which, when applied to the degrees of membership, can be called fuzzy entropy (Coppi & D’Urso, 2006).

Following this approach, the total functional is optimized by both maximizing the internal cohesion and the given measure of entropy, thus maximizing the total amount of information.

Other interesting prototype-based models, which are variants of the FCM, have also been proposed in the literature and we refer to the medoids-based clustering technique that, in a non-fuzzy framework, was due to Vinod (1969), Church (1978), Mulvey and Crowder (1979), Rao (1971), and Kaufman and Rousseeuw (1987)(the latter proposed the well-known Partitioning Around Medoids method (PAM)) while, staying within a fuzzy context, we can refer to the seminal papers of Krishnapuram et al. (19992001).

The fuzzy C-Medoids (FCMd) clustering techniques group objects around representative prototypes observed in the dataset, i.e. the medoids, that synthesize the structural information of each cluster. Therefore, the medoids are objects whose overall distance from all other objects in the same cluster is minimal: this prevents prototypes from being “virtual”, as in the case of the centroids in the c-Means algorithm. While a first advantage certainly relates to practical applications for which the identification of representative non-fictitious prototypes can be very interesting and useful for the interpretation of the clusters, the main strength comes from the consideration that the FCMd method is more robust than the FCM if noise or outliers occur in the data, the medoid being less influenced by such extreme values than the mean. Readers interested in a deeper and more detailed dissertation on fuzzy clustering may refer to D’Urso (2015) and references therein.

In the next section, we will move on to the description of the proposed methods, which are methodological extensions of the fuzzy medoids–based methods previously described; the purpose is to adapt them to the case where one is interested in the group’s structure of particular data such as preference data, considering the possibility that outliers or noisy data may mask the real composition of the groups themselves, both in terms of their number and in terms of membership of the units to the groups.

3 Robust FCMd Clustering for Preference Data

Clustering ranking data aims at identifying groups of individuals characterized by similar preferences or choices with respect to a set of items.

In clustering preference data, the features of permutations themselves suggest the use of a fuzzy approach as a natural way to cope with the uncertainty of assigning a judge (i.e. a permutation) to each cluster.

In this context, the FCMd clustering technique seems the most appropriate choice since the identification of an observed permutation acting as the cluster prototype could improve the interpretation of the selected cluster.

As far as the distance matrix is concerned, we argue that the choice of the appropriate metric is one of the most important issues of any fuzzy and non-fuzzy clustering method.

In this context, we propose the use of the Kemeny distance (Kemeny and Snell, 1962) that could be seen as a suitable extension of the Kendall distance (Kendall 1938, 1948) for the case of two tied rankings \( \boldsymbol{\pi}\) and \( \boldsymbol{\pi}\) over the set [k] (Li et al., 2017). Therefore, given two rankings \( \boldsymbol{\pi}\) and \( \boldsymbol{\pi}\) and using the representation of each ranking by means of a k × k matrix as proposed by Kendall (1938) according to each element

$$\pi(i,j) = \begin{cases} \quad1 & \text{if object i is ranked before the object j}\\ \,-1 & \text{if object i is ranked behind object j}\\ \quad0 & \text{if the objects are tied or i=j},\\ \end{cases}$$

the Kemeny distance between two rankings \( \boldsymbol{\pi}\) and \( \boldsymbol{\pi}\) is defined as:

$$d(\boldsymbol{\pi},\boldsymbol{\pi}^{\boldsymbol{*}})=\frac{1}{2}\sum\limits_{i,j=1}^{k}|\pi(i,j)-\pi^{*}(i,j)|,$$

where the maximum value is equal to k × (k − 1). As previously said, the Kemeny distance is equivalent to the Kendall distance for full rankings, i.e. the former is twice the latter (Heiser and D’Ambrosio, 2013) but this relation does not hold in the case of tied rankings. Moreover, while the Kemeny distance is a metric also with tied rankings, the Kendall one does not satisfies the third condition of the triangular inequality.

In this study, we propose a suitable exponential transformation of the above metric to both define a robust version of the standard FCMd and the FCMd with entropy regularization.

Both models have a twofold scope: to handle preference data by grouping individuals according to their personal choices and to improve the robustness of the standard methods when dealing with outlying permutations and noisy data.

In the next paragraph, we will provide the mathematical formalization of the exponential transformation of the Kemeny distance.

3.1 The Exponential Transformation of the Kemeny Distance

As already mentioned, one of the relevant issues in clustering is to neutralize the negative effects of noisy data as well as the outliers, that are, the latter, units that markedly deviate from the rest of the data.

Moreover, as well pointed out by García-Escudero et al. (2008), “the precise detection of the outliers is an important task due to the serious troubles they introduce in standard clustering procedures as well as the appealing interest that outliers could have by themselves after explaining why they depart from general behaviour”.

The outliers could be a group of observations, smaller than the natural clusters, that differ markedly from them (i.e. clustered outliers) or, alternatively, could be represented by isolated points, each forming its own group (i.e. radial outliers) (García-Escudero et al., 2003). In this context, the outlier is identified essentially according to its distance with respect to the centers of the clusters, both embedding the radial and clustered types while noisy datum is intended as the permutation generated by the uniform distribution, given a rankings generative scheme.

As pointed out by García-Escudero and Gordaliza (2005), the FCMd method is only a “timid” robustification of clustering in presence of outliers and, therefore, we propose two “robust” versions of the FCMd method based on a suitable exponential transformation of the Kemeny distance.

Therefore, as suggested in literature by Wu and Yang (2002) and D’Urso and De Giovanni (2014), the following exponential transformation, which lies in the interval [0,1], has been applied to the Kemeny distance:

$$d^{2}_{exp}(\boldsymbol{\pi}_{l},\boldsymbol{\pi}_{t})=1-exp \lbrace - \upbeta\cdot d^{2}(\boldsymbol{\pi}_{l}, \boldsymbol{\pi}_{t})\rbrace_{l \neq t} \quad with \quad l,t=1 {\ldots} n,$$
(1)

where \(d^{2}(\boldsymbol {\pi }_{l},\boldsymbol {\pi }_{t})\) is the squared Kemeny distance between the ranking representation given by the l-th judge and the ranking representation given by the t-th judge, respectively.

The parameter β is a positive constant usually chosen as the inverse of a measure of data variability, as pointed out by Wu and Yang (2002). While the mathematical definition of β provided in this work will be described in Section 3.2.1, its effect on the exponential transformation (1) is shown in Fig. 1: it reaches more rapidly 1 the higher is the value of β.

Fig. 1
figure 1

The Effect of β parameter on \(d^{2}_{exp}(\boldsymbol {\pi }_{l},\boldsymbol {\pi }_{t})\)

When using the exponential transformation in the fuzzy clustering, the membership degrees associated with the outliers are approximately equal to 1/C (where C is the number of clusters), thus dealing with outlying units as fuzzy ones.

It is worth noting that, in the case of well separated groups, i.e. in presence of low variability, the behaviour of the clustering algorithm based on the exponential transformation tends to assign approximately equal membership degree values to all units far from the medoids. Each unit not close to medoids becomes a candidate outlier.

On the contrary, in case of overlapping clusters or well-separated clusters but with a large amount of outlying units, the algorithm tends to assign approximately equal membership degree values to the units that are only slightly separated from the bulk of data (D’Urso et al., 2018).

3.2 Exponential Kemeny-Based FCMd Method

The exponential transformation of the Kemeny distance (1) is used to develop the so-called Exponential Kemeny-based fuzzy C-Medoids clustering method (Exp-FCMdK); the goal is to find the C prototypes, i.e. the subset of medoids (\( \boldsymbol{\pi}\)1,...,\( \boldsymbol{\pi}\)C), where C is the number of clusters, and the Un×C matrix of fuzzy coefficients, by minimizing the following objective function:

$$\left \{ \begin{array}{rl} min: \quad &{\sum}_{l=1}^{n} {\sum}_{c=1}^{C} u_{lc}^{m} d^{2}_{exp}(\boldsymbol{\pi}_{l},\boldsymbol{\widetilde{\pi}}_{c})=\\ &{\sum}_{l=1}^{n} {\sum}_{c=1}^{C} u_{lc}^{m} \left\lbrace 1-exp\left[ -\upbeta \left(\frac{1}{2}{\sum}_{i,j=1}^{k}|\pi_{l}(i,j)-\widetilde{\pi}_{c}(i,j)| \right)^{2}\right] \right\rbrace \\ s.t. \quad & {\sum}_{c=1}^{C} u_{lc}=1, u_{lc}\geq 0. \end{array} \right.$$
(2)

In detail, \(\left (\frac {1}{2}{\sum }_{i,j=1}^{k}|\pi _{l}(i,j)-\widetilde {\pi }_{c}(i,j)| \right )^{2}\) is the squared Kemeny distance between the l-th unit and the medoid of the c-th cluster.

ulc ∈ [0,1] denotes the membership degree of the l-th unit to the c-th cluster while \(\widetilde {\boldsymbol {\pi }}_{c}\) is the permutation medoid for the cluster c. The “m” parameter (with m > 1) controls the fuzziness of the partition and therefore usually named as “fuzziness parameter” (for further insight on the role of m see D’Urso (2015)).

The solution for each ulc is:

$$u_{lc} = \frac{1}{{\sum}_{c^{\prime}=1}^{C} \left[ \frac{ 1-exp\left[ -\upbeta \left(\frac{1}{2}{\sum}_{i,j=1}^{k}|\pi_{l}(i,j)-\widetilde{\pi}_{c}(i,j)| \right)^{2}\right] }{1-exp\left[ -\upbeta \left(\frac{1}{2}{\sum}_{i,j=1}^{k}|\pi_{l}(i,j)-\widetilde{\pi}_{c^{\prime}}(i,j)| \right)^{2}\right]}\right]^{\frac{1}{m-1}}}$$
(3)

The proofs of the iterative solutions (3) are provided in the Appendix A while the computational steps of the proposed robust clustering method are described in the Algorithm 1.

Algorithm 1
figure a

Exponential Kemeny-based FCMd algorithm

In the next paragraph, we describe the beta’s formulation used in the above method.

3.2.1 The Choice of β Parameter

Following Wu and Yang (2002) and D’Urso et al. (2018), we choose β as the inverse of a suitable measure of variability. In particular, we propose the following β’s formulation:

$$\upbeta\equiv{\upbeta}_{1}= \left[ \frac{{\sum}_{l=1}^{n}d^{2}(\boldsymbol{\pi}_{l},\tilde{\boldsymbol{\pi}}_{q})}{n} \right]^{- 1}$$
(4)

where \(\tilde {\boldsymbol {\pi }}_{q}: q=arg Median_{1 \leq l\leq n}{\sum }_{l^{\prime }=1}^{n}d^{2}(\boldsymbol {\pi }_{l},\tilde {\boldsymbol {\pi }}_{l^{\prime }})\).

3.3 Exponential Kemeny-Based FCMd Method with Entropy Regularization

Here, the Exponential transformation of the Kemeny distance (1) is used to develop the so-called Exponential Kemeny-based fuzzy C-Medoids clustering method with entropy regularization (Exp-FCMdKent); the goal is to find the C prototypes, i.e. the subset of medoids (\( \boldsymbol{\pi}\)1,...,\( \boldsymbol{\pi}\)C), where C is the number of clusters, and the Un×C matrix of fuzzy coefficients, by minimizing the following objective function:

$$\left \{ \begin{array}{rl} min:& \quad {\sum}_{l=1}^{n} {\sum}_{c=1}^{C} u_{lc} d^{2}_{exp}(\boldsymbol{\pi}_{l},\boldsymbol{\widetilde{\pi}}_{c})+p {\sum}_{l=1}^{n} {\sum}_{c=1}^{C} u_{lc} log(u_{lc})=\\ &{\sum}_{l=1}^{n} {\sum}_{c=1}^{C} u_{lc}\left\lbrace 1-exp\left[ -\upbeta \left(\frac{1}{2}{\sum}_{i,j=1}^{k}|\pi_{l}(i,j)-\widetilde{\pi}_{c}(i,j)| \right)^{2}\right] \right\rbrace +p {\sum}_{l=1}^{n} {\sum}_{c=1}^{C} u_{lc} log(u_{lc}) \\ s.t. \quad & {\sum}_{c=1}^{C} u_{lc}=1, u_{lc}\geq 0, \end{array} \right.$$
(5)

where \(\left (\frac {1}{2}{\sum }_{i,j=1}^{k}|\pi _{l}(i,j)-\widetilde {\pi }_{c}(i,j)| \right )^{2}\) is the squared Kemeny distance between the l-th unit and the medoid of the c-th cluster.

ulc denotes the membership of the l-th unit to the c-th cluster while \(\boldsymbol {\widetilde {\pi }}_{c}\) is the permutation medoid for the cluster c. The second addend in the objective function represents the entropy regularization term: the Shannon entropy is weighted by a factor p, called degree of fuzzy entropy given that the higher is p the higher is the degree of fuzziness. It controls the contribution of the regularization function to the clustering criterion and acts as the “m” exponent in the previous fuzzy method so that ulc ∈ [0,1].

Since the aim is to both maximize the Shannon entropy and the internal cohesion, the total function is optimized by subtracting the regularization term to the clustering criterion leading to (5).

The solution for each ulc is:

$$u_{lc} = \frac{1}{{\sum}_{c^{\prime}=1}^{C} \left[\frac{exp\left(\frac{1}{p}{\left[1-exp\left\lbrace -\upbeta \left(\frac{1}{2}{\sum}_{i,j=1}^{k}|\pi_{l}(i,j)-\widetilde{\pi}_{c}(i,j)| \right)^{2}\right\rbrace \right]}\right) }{exp\left(\frac{1}{p}{\left[1-exp\left\lbrace -\upbeta \left(\frac{1}{2}{\sum}_{i,j=1}^{k}|\pi_{l}(i,j)-\widetilde{\pi}_{c^{\prime}}(i,j)| \right)^{2}\right\rbrace \right]}\right)}\right]}$$
(6)

The proofs of the iterative solutions (6) are provided in the Appendix B while the computational steps of the proposed robust clustering method are described in the Algorithm 2.

Algorithm 2
figure b

Exponential Kemeny-based FCMd algorithm with entropy regularization

By substituting in the objective function (5) \(d^{2}_{exp}(\boldsymbol {\pi }_{l},\widetilde {\boldsymbol {\pi }}_{c})\) with its square root, that is:

$$d_{exp}(\boldsymbol{\pi}_{l},\widetilde{\boldsymbol{\pi}}_{c})=\sqrt{1-exp\left[ -\upbeta \left(\frac{1}{2}\sum\limits_{i,j=1}^{k}|\pi_{l}(i,j)-\widetilde{\pi}_{c}(i,j)| \right)^{2}\right]},$$

we also define the model henceforth called Exp-\(FCMdK_{ent_{root}}\).

For these entropy-based methods, β is defined as D’Urso et al. (2018):

$$\upbeta\equiv{\upbeta}_{2}= \left[ \frac{{\sum}_{l=1}^{n}d^{2}(\boldsymbol{\pi}_{l},\tilde{\boldsymbol{\pi}}_{q})}{n} \right]^{- 1}$$
(7)

where \(\tilde {\mathbf {\pi }}_{q}: q=arg min_{1 \leq l\leq n}{\sum }_{l^{\prime }=1}^{n}d^{2}(\boldsymbol {\pi }_{l},\tilde {\boldsymbol {\pi }}_{l^{\prime }})\).

Remark 1

By replacing \(d^{2}_{exp}(\boldsymbol {\pi }_{l},\widetilde {\boldsymbol {\pi }}_{c})\) with \(d^{2}(\boldsymbol {\pi }_{l},\widetilde {\boldsymbol {\pi }}_{c})\) in the objective functions (2) and (5), the methods reduce to the FCMd methods based on the squared Kemeny distance, i.e. to their non-robust versions; henceforth, we denote them with FCMdK and FCMdKent, respectively.

3.4 The Choice of Number of Clusters

In general, the partitional clustering algorithms require the researcher to set the number of clusters C to be generated.

In this study, to select the optimal number of clusters C we adopt the Fuzzy Silhouette index (Campello and Hruschka, 2006), one of the most known cluster internal validity criteria, the mathematical definition of which is given below.

Consider a data object \(j\in \left \lbrace 1, 2, \ldots ,n \right \rbrace\) belonging to cluster \(p\in \left \lbrace 1, \ldots , C\right \rbrace\). Let apj be the average distance of object j to all other objects belonging to the same cluster p and let dqj be the average distance of j to all objects belonging to another cluster q, different than p. Denote with bpj the minimum dqj over q = 1,…,C, (qp), that is the dissimilarity of object j with respect to its closest cluster. Campello and Hruschka (2006) defined the silhouette of object j as:

$$s_{j}=\frac{(b_{pj}-a_{pj})}{max(b_{pj}-a_{pj})}.$$
(8)

The Fuzzy Silhouette (FS) is, then, defined as:

$$FS=\frac{{\sum}_{j=1}^{N}(\mu_{pj}-\mu_{qj})^{\alpha}s_{j}}{{\sum}_{j=1}^{N}(\mu_{pj}-\mu_{qj})^{\alpha}}.$$
(9)

μpj and μqj correspond to the first and second largest element of the j-th column of the fuzzy partition matrix, respectively, while α ≥ 0 is a weighting coefficient.

Therefore, it is a weighted average of the individual silhouettes sj. A higher value of FS means a better assignment of the units to the clusters which implies that, simultaneously, the intra-cluster distance is minimized while the inter-cluster distance is maximized.

3.5 Advantages of the Proposed Clustering Methods

The proposed clustering approaches inherit all the theoretical advantages related to the fuzzy framework, the Partitioning Around Medoids (PAM) technique, the robustness of clustering procedures and the Kemeny distance as briefly described in the following paragraphs.

3.5.1 Advantages Connected to Fuzzy Clustering

Fuzzy clustering is particularly suitable in many real applications since identifying unambiguous boundaries between clusters is, often, very difficult (McBratney & Moore, 1985; Wedel & Kamakura, 1998). Moreover, compared with hard clustering, the membership degrees matrix arising from fuzzy methods also provides a useful indication of the existence of a second best cluster to consider (Everitt & Landau, 2001). In addition, fuzzy clustering is computationally efficient and works well with distribution-free methods (Hwang et al., 2007).

3.5.2 Advantages Connected to Fuzzy Partioning Around Medoids (FPAM) Procedure

The FCMd clustering is the extension of the PAM technique in a fuzzy context. Unlike fuzzy c-Means, it is able to identify an observed prototype rather than a virtual one that is particularly suitable in the case of permutations. It also reduces computational complexity compared to the fuzzy c-Means algorithm because the distance matrix is computed once during the iterative clustering procedure (D’Urso et al., 2018).

3.5.3 Advantages Connected to Robustness of Clustering Process

The robustness against outliers is achieved by using the exponential transformation (Wu & Yang, 2002) of the Kemeny distance. It guarantees that clustering procedure is able to neutralize the disruptive effect of the outliers, by recovering the structure of natural groups.

3.5.4 Advantages Connected to Kemeny Distance

The Kemeny distance is one of the most common chosen distances, especially for rankings with ties; it is equivalent to the Kendall distance in the case of complete rankings but differs from the latter in the way of handling tied rankings. It allows great flexibility since tied rankings are widely used in most applications.

4 Simulation Study

4.1 Simulation Plan

To assess the performance of the proposed clustering methods Exp-FCMdK, Exp-FCMdKent and Exp-\(FCMdK_{ent_{root}}\), a simulation plan was defined following the scheme below.

4.1.1 Generative Models for Rankings

As the first ranking generator process, we chose the well-known Mallows model (Mallows, 1957), which is an exponential model defined by a central permutation \( \boldsymbol{\pi}\)0 and a spread parameter 𝜃. For positive values of 𝜃, \( \boldsymbol{\pi}\)0 is the mode of the distribution, the most likely permutation. The probability associated with any other permutation decreases exponentially as a function of its distance from the central permutation and it is defined as

$$p(\boldsymbol{\pi})=\frac{exp(-\theta d(\boldsymbol{\pi},\boldsymbol{\pi}_{0}))}{\psi(\theta))},$$

where ψ(𝜃) is a normalization constant. When 𝜃 = 0, all rankings have a uniform probability to be sampled. The rmm() function of the R package PerMallows (Irurozki et al., 2016) was used.

As the second ranking generator process, we used the Insertion Sorting Rank (ISR) model for which a ranking is a result of a sorting process of the k objects, based on paired comparisons and proposed by Biernacki and Jacques (2013) according to, given a ranking \( \boldsymbol{\pi}\), a central ranking \( \boldsymbol{\pi}\)0 and a parameter υ ∈ [0.5,1]:

$$p(\boldsymbol{\pi},\boldsymbol{\pi}_{\boldsymbol{0}},\upsilon)=\frac{1}{k!} \sum\limits_{\boldsymbol{\sigma} \in \mathcal{P}_{k}} \upsilon(\boldsymbol{\pi}|\boldsymbol{\sigma},\boldsymbol{\pi}_{\boldsymbol{0}},\upsilon)=\frac{1}{k!} \sum\limits_{\boldsymbol{\sigma} \in \mathcal{P}_{k}} p^{G(\boldsymbol{\pi},\boldsymbol{\sigma},\boldsymbol{\pi}_{\boldsymbol{0}})} (1-\upsilon)^{A(\boldsymbol{\pi},\boldsymbol{\sigma})-G(\boldsymbol{\pi},\boldsymbol{\sigma},\boldsymbol{\pi}_{\boldsymbol{0}})},$$

where the sum over \(\boldsymbol {\sigma } \in \mathcal {P}_{k}\) corresponds to all the possible initial presentation orders of the objects to rank while G(\( \boldsymbol{\pi}\),σ,\( \boldsymbol{\pi}\)0) is equal to the number of good paired comparisons during the sorting process leading to return \( \boldsymbol{\pi}\) when the presentation order is σ. With A(\( \boldsymbol{\pi}\),σ), they denote the total number of paired comparisons. The closer υ is to one, the more the distribution of rankings is tightened around \( \boldsymbol{\pi}\)0 while when υ = 0.5 a uniform distribution is assumed. The simulISR() function of the R package Rankcluster (Jacques et al., 2014) was used.

4.1.2 Number of Clusters, Items, and Outliers

For each ranking generative model, we simulated six scenarios with increasing complexity due to the contamination of the natural groups with clustered and radial outliers and then noisy data as summarized in Table 1.

Table 1 The six considered simulated scenarios

We considered 2 and 3 groups, equally sized so that, in the former case, the group size is 30 while in the latter 20 and generated by the following central permutations:

  • 2 groups: 1\( \boldsymbol{\pi}\)0 = {1,2,3,4,5} and 2\( \boldsymbol{\pi}\)0 = {3,1,2,5,4}

  • 3 groups: 1\( \boldsymbol{\pi}\)0 = {1,2,3,4,5,6,7}, 2\( \boldsymbol{\pi}\)0 = {1,2,4,7,6,5,3} and 3\( \boldsymbol{\pi}\)0 = {1,5,2,6,3,4,7}.

When the generative model is the Mallows model with Kendall distance, 𝜃 = 1.5 while, when it is the ISR model, υ = 0.9. Outliers have been generated by the following central permutations:

  • 2 groups: Out\( \boldsymbol{\pi}\)0 = {5,4,3,2,1}

  • 3 groups: Out\( \boldsymbol{\pi}\)0 = {7,6,5,4,3,2,1}

but with different spread parameters to simulate the different types of outliers. Therefore, clustered outliers were generated by the above central permutations setting 𝜃 = 2 for the Mallows model with two groups and 𝜃 = 1.5 with three groups while we set υ = 0.9 for the ISR model for both two and three groups.

To simulate radial outliers, we generated 100 permutations fixing 𝜃 = 0 and υ = 0.5 respectively, i.e. we considered a uniform distribution for rankings in the universe of possible permutations, and then we sampled only 12 among them for which the probability of being generated by the centers of the clusters was very low. For noisy data, we considered the first 20 permutations simulated by the same uniform distribution as above.

In Fig. 2, examples of clustered outliers with three well-separated natural groups, radial outliers with two less-separated natural groups and noisy data with three well-separated natural groups are shown using the Sammon projection in a two dimensional space.

Fig. 2
figure 2

An example of clustered and radial outliers and noisy data

For each combination of generative model and number of groups, we simulated 30 artificial dataset leading to 120 baseline datasets but considering all six scenarios for which a certain amount of outliers was added, the number of different datasets evaluated increases to 720. The simulated natural groups must be the same for comparative purposes in terms of robustness against outliers and noisy data.

We argue that all modal permutations used to generate outliers were chosen so that their Kemeny distance from each modal permutation has been equal or close to the maximum value of the Kemeny distance (e.g. for two permutations of length 5 the maximum distance is equal to 20 while it is 42 for two permutations of length 7). Moreover, we simulated only full rankings since the R functions used to simulate rankings do not allow to generate from central permutations with ties.

The proposed robust methods were compared with the corresponding non robust baseline methods FCMdK and FCMdKent and the mixture model–based approach proposed by Jacques and Biernacki (2014) which is the extension of the Insertion Sorting Rank (ISR) model. The heterogeneity of the rank population is modelled by a mixture of ISR. Maximum likelihood estimation is performed through a SEM-Gibbs algorithm, allowing partial rankings too. For the Mixture ISR model (henceforth, M-ISR), the membership degree matrix is represented by the conditional probabilities estimates for the observed ranks to belong to each cluster. The model is implemented in the R package Rankcluster (Jacques et al., 2014).

The additional model used for comparison purposes is the K-Median Cluster Component Analysis (henceforth, CCA) proposed by D’Ambrosio and Heiser (2019), the most similar to our models in that it is a probabilistic-distance clustering model based on the same Kemeny distance. As pointed out by D’Ambrosio and Heiser (2019), the method allows each ranking to be assigned to all C clusters by a membership probability, thus mimicking a fuzzy clustering with exponent m = 2. The model is implemented in the R package ConsRankClass (D’Ambrosio, 2021).

For each simulated setup, the 30 fuzzy partitions related to the 60 permutations (only those belonging to the natural groups excluding the outliers and the noisy data) are compared with the reference crisp partition using the Adjusted Concordance Index (ACI) proposed by D’Ambrosio et al. (2021); this is an external validation criterion that corrects the normalized degree of concordance (NDC) introduced by Hullermeier et al. (2011) for the agreement that may be due to chance. Lying in the range [− 1,1], it is equal to 1 in the case of perfect correspondence between the two partitions; thus, the higher the value, the better the agreement between the two partitions.

4.2 Simulation Results

In this section, simulation results are provided for each scenario of interest. We set C according to the number of simulated groups in each configuration; then, we applied the clustering methods varying m ∈{1.3,1.5,2} for the FCMdK and Exp-FCMdK methods while p ∈{0.05,0.1,0.2} for the Exp-FCMdKent and Exp-\(FCMdK_{ent_{root}}\) ones. For the FCMdKent we considered p ∈{0.01,0.02,0.04}Footnote 3.

We ran all algorithms considering 100 random restarts with an execution time of a few seconds. The associated code has been implemented in R.

For the CCA method, we used the branch-and-bound algorithm to find the median ranking fixing the number of replications to 10 (as suggested by the authors) while for the M-ISR mixture model, we set the maxTry to 10 and the run to 2. Simulation results for both ranking generative models and all six scenarios are shown in Figs. C.1C.8, in Appendix C, in which we plotted the violin plots of the Adjusted Concordance index for each method. The subscript in the label provides the value of the corresponding m and p parameters depending on the model specification.

As a first piece of evidence, simulation results are very promising and confirm what is already known in the literature: the FCMd method is only a “timid robustification” of clustering against outliers (García-Escudero and Gordaliza, 2005). It is worth specifying that the methods are not directly comparable in terms of performance as fuzziness increases since there is no direct correspondence between the value of the “p” parameter in the entropy-based method and that of the “m” exponent in the other fuzzy-type method.

Direct comparability in terms of performance as the fuzziness increases is possible within the methods based on “m” and between them and the CCA model as well as between the 2 robust entropy-based methods.

In general, as expected, the performances of all methods, in terms of ACI values, tend to decrease as the fuzziness increases because we compared a crisp partition with a fuzzy one.

We argue that the choice of the best value of the fuzziness parameter strictly depends on the scaling of the distance used as well as on the degree of separation among groups; thus, in the practical applications, we recommend setting it taking into account all these issues.

From the simulation results based on the ISR scheme for 2 and 3 groups, one can arise the following considerations. If it was reasonable to expect that, assuming the ISR generative scheme, the mixture ISR clustering model based on the same generative process would perform well, even in the presence of outliers, we cannot fail to emphasize the good performance of our robust clustering methods.

In particular, the Exp-\(FCMdK_{ent_{root}}\) entropy-based robust method has the best performance; in the case of two simulated natural groups (see Figs. C.1 and C.2), its good behaviour, when p <= 0.10 becomes evident especially when the amount of radial outliers is consistent or noisy data are present. Compared with the Exp-FCMdKent robust method, the Exp-\(FCMdK_{ent_{root}}\) is indeed more stable especially when radial outliers contaminates the natural clusters showing a ACI distribution that is tighter.

As far as the M-ISR model is concerned, it seems however less robust against radial outliers and noisy permutations when these increase in number (see Fig. C.2).

The contribution of our clustering methods is particularly evident when we go to compare their performance with that of the CCA model. Even when we consider the Exp-FCMdK with m = 2, our fuzzy method always outperforms CCA. Looking at the results in more detail, since CCA is a median-based model, it seems, on the one hand, to be able to identify outliers by assigning them equal membership degrees (1/C), like the Exp-FCMdK methods, but on the other hand, it produces, in general, too blurred partitions. This leads to a low performance in terms of values of the ACI, especially when outliers and noisy data occur. Then, it is fairly evident how the possibility to tune the level of fuzziness adds considerable flexibility to this class of methods.

When we consider the same sampling scheme but with three natural groups (see Figs. C.3 and C.4), the MISR performs quite well again but in the case of clustered outliers.

We argue that, however, looking at its violin plots, its levels of performance decrease in general as the amount of outliers or noisy data increases.

Regarding our robust methods, the best performance is associated with the Exp-\(FCMdK_{ent_{root}}\) if we compare the ACI distributions when a large number of radial outliers or noisy data are present (and p is low).

Looking at the violin plots of the CCA method, although they have more or less the same shape regardless of the type of contamination, they show very low performance due to the tendency of the method to provide partitions that are too blurred.

When a Mallows scheme and two groups are considered (see Figs. C.5 and C.6), the Exp-\(FCMdK_{ent_{root}}\) outperforms the other methods especially if we look at the ACI distribution in the case of radial outliers and noisy data. We point out that a good performance is also associated with the Exp-FCMdK method.

If we consider the Mallows scheme with three groups (see Figs. C.7 and C.8), the very good performances of our robust methods (except for high values of the p and m parameters) are more evident. Moreover, we notice that, in this simulated setup, the M-ISR model has lower performances, followed by CCA.

Summing up, this simulation plan proved the excellent behaviour of the Exp-\(FCMdK_{ent_{root}}\) method in particular, which becomes evident especially in the case of radial outliers and noisy data. It is less sensitive to the value of the fuzziness parameter showing higher values of the Adjusted Concordance index as well as tightened distributions of the same index.

The M-ISR is, however, a good competitor while the CCA method produces too blurred partitions thus drastically reducing its performance especially when the Mallows sample scheme is assumed.

5 Application to Real Data

In this section, we applied the above methods to two real datasets: the Gaming Platforms dataset due to Fok et al. (2012) and available in the R package PLMIX (Mollica and Tardella, 2017) and the University rankings dataset, including tied rankings and available in the R package ConsRankClass (D’Ambrosio, 2021).

5.1 The Gaming Platforms Dataset

The Gaming Platforms dataset contains the results of a survey conducted on a sample of 91 Dutch students who rank, from the most-liked (Rank 1) to the least-liked (Rank 6), 6 gaming platforms : 1 = X-Box, 2 = PlayStation, 3 = PSPortable, 4 = GameCube, 5 = GameBoy and 6 = Personal Computer (PC). The best solution for each robust method has been chosen based on the combination of C and the fuzziness parameter (m or p depending on the method) that maximizes the Fuzzy Silhouette indexFootnote 4.

In particular, we varied \(C\in \left \lbrace 2,3,4,5,6,7,8\right \rbrace\), \(m\in \left \lbrace 1.3,1.5,1.7,2\right \rbrace\) and \(p\in \left \lbrace 0.05, 0.10,0.15\right \rbrace\)

For the M-ISR model, the best solution has been chosen according to the BIC criterion while for the CCA model according to the Fuzzy Silhouette.

All clustering methods identified two groups but for the Exp-\(FCMdK_{ent_{root}}\) and CCA which identified three groups. The centers are shown in Table 2 while the membership degrees matrices are shown in Table D.4.

As first evidence, it can be seen that, within each method, the centers of the groups differ mainly in the podium, assigned to the Personal Computer or the PlayStation depending on the case; the X-Box is almost always ranked as the second best gaming platform while the GameBoy and GameCube are the least preferred ones.

In detail, the Exp-FCMdK and Exp-FCMdKent methods differ only in the second medoid: their Kemeny distance is equal to 2, so they are very similar; indeed the preferences are the same except for the inversion of the ranking between the PSPortable and the Personal Computer. We also argue that the medoid 2 of the Exp-FCMdK is also equal both to the medoid 3 of the Exp-\(FCMdK_{ent_{root}}\) and the third median ranking of the CCA. The M-ISR model has in common with the latter the second center while the center 1 identifies the group of subjects for which the best platform is the X-Box followed by the PlayStation and Personal Computer.

The main interesting evidence is that the Exp-\(FCMdK_{ent_{root}}\) and the CCA share the second and third center, but differ in the first one. Looking at the medoid 1 identified by the Exp-\(FCMdK_{ent_{root}}\) algorithm, it can be seen that the preferred platform is again the Personal Computer, but with a rather different ranking of the other gaming platforms. The CCA model does not identify this group, while it identifies the group of subjects whose median ranking has an opposite liking to that of the other two groups, assigning to the PlayStation and Personal Computer the last positions. We argue that students belonging to this small group are the same as those to which the Exp-\(FCMdK_{ent_{root}}\) method assigns equal membership degrees in all three clusters, thus those identified by the robust method as outliers (see Table D.4). The presence of the third natural group detected by the Exp-\(FCMdK_{ent_{root}}\) but not by the others could be due to the ability of this method to neutralize the disruptive effect of outliers and discover an additional cluster.

The M-ISR model seems to be the most affected by the presence of noisy data since all scattered permutations have been included in one group.

Table 2 Gaming Platforms dataset: the centers associated with each clustering method

5.2 The University Rankings Dataset

The dataset on university rankings concerns a survey of 303 students attending the Vienna University of Economics. They were asked to indicate preferences for the following 6 universities: London, Paris, Milan, St. Gallen, Barcelona, and Stockholm. This dataset is interesting because it contains tied rankings. We applied the Exp-\(FCMdK_{ent_{root}}\) robust method and, for comparison purposes, the CCA method based on the same distance. The Fuzzy Silhouette index suggested 2 groups for both methods and, as far as the Exp-\(FCMdK_{ent_{root}}\) is concerned, we chose the value of p equal to 0.10. Table 3 shows the two centers for each method while their membership degrees matrices are reported in Table D.5. Looking at the centers, one can notice that the two clustering methods lead to the same centers but for the first one which only differs in assigning, in the case of the Exp-\(FCMdK_{ent_{root}}\), rank 2 to Paris and rank 3 to St. Gallen while the CCA model assigns to them the same second best position.

Looking at the membership degree matrices in Table D.5, as expected by simulations, the main difference between the two methods is that the CCA method provides a partition fuzzier than the Exp-\(FCMdK_{ent_{root}}\). Based on the latter method and the crisp assignment of the units to the clusters using a cut-off value of 0.7, the number of students per group is 71 and 78 respectively, while the fuzzy units are 63Footnote 5.

Table 3 University Rankings dataset: the centers associated with the Exp-\(FCMdK_{ent_{root}}\)and the CCA clustering methods

The medoids associated with the Exp-\(FCMdK_{ent_{root}}\) are representative of one group for which, after London and Paris, the best choice is St. Gallen followed by the remaining ones in last position and another group according to which, keeping the first two places unchanged, the third best study location is Barcelona and then Milan while St. Gallen is ranked last here together with Stockholm.

This dataset contains eight background binary covariates concerning the following aspects: Stud (main discipline of study with categories commerce-other), Eng (knowledge of English with categories good-poor), French (knowledge of French with categories good-poor), Spanish (knowledge of Spanish with categories good-poor), Italian (knowledge of Italian with categories good-poor), Work (full-time employment while studying with categories no-yes), Degree (intention to take an international degree with categories no-yes) and Gender (sex with categories female-male).

Figure 3 shows the relative frequency distribution of these variables in the groups (considering only the non-fuzzy units) compared with the same frequency distribution in the whole sample of students. The group of those preferring Barcelona to St. Gallen is characterized by a percentage of students studying commerce lower than that of the second group and the whole sample but with a better knowledge of Spanish and Italian. This group is also characterized by a higher share of worker-students and a lower percentage of students who are intentioned to take an international degree. As far as the gender is concerned, the first group seems characterized by a slightly higher percentage of female students.

Summing up, the language knowledge and the international profile of the degree are the main drivers conditioning the preference among St. Gallen and Barcelona (and Milan).

Fig. 3
figure 3

Gaming Platforms dataset: the associated covariates

6 Concluding Remarks

This work focuses on the definition of two robust fuzzy clustering techniques with the twofold aim of detecting homogeneous groups of judges, according to their preferences on a set of items, and neutralizing the effects of possible outliers or noisy data during the clustering procedure. The use of the FCMd method according to the two different approaches to deal with fuzziness and based on the exponential transformation of the Kemeny distance accomplishes all these goals simultaneously.

Moreover, the use of the Kemeny distance has the added value of making the method more flexible, extending its field of application thanks to its ability to handle tied rankings too.

Simulation results are very promising and show the good performance of our proposals, especially that of the Exp-\(FCMdK_{ent_{root}}\), which is robust against the most insidious type of outliers, the radial ones, and against noisy permutations that often occur when handling real data. It seems less sensitive to the value of the fuzziness parameter showing higher values of the Adjusted Concordance index as well as a more tightened distribution of the same index.

Our proposals could be seen as valid alternatives to the M-ISR model and, in particular, to the CCA one. As already said, our models have the further advantage of tuning the fuzziness parameter allowing us to obtain the right centers of the clusters and at the same time a much less fuzzy partition, ensuring robustness against noisy data and anomalous permutations.

Furthermore, we also showed that our model-free approach works well under different ranking generative schemes.

As a further development of this work, to increase the flexibility of this class of methods, we will focus on defining a weighted distance-based fuzzy clustering method that allows to use different weights for different ranks, believing that the weights can significantly improve the performance of these methods (Lee & Yu, 2012).