Advertisement

Soft Computing

, Volume 19, Issue 10, pp 2751–2767 | Cite as

Distributed proximity-based granular clustering: towards a development of global structural relationships in data

  • Witold PedryczEmail author
  • Rami Al-Hmouz
  • Ali Morfeq
  • Abdullah Saeed Balamash
Open Access
Methodologies and Application

Abstract

The study is focused on a development of a global structure in a family of distributed data realized on a basis of locally discovered structures. The local structures are revealed by running fuzzy clustering (Fuzzy C-Means), whereas building a global view is realized by forming global proximity matrices on a basis of the local proximity matrices implied by the partition matrices formed for the individual data sets. To capture the diversity of local structures, a global perspective at the structure of the data is captured in terms of a granular proximity matrix, which is built by invoking a principle of justifiable granularity with regard to the aggregation of individual proximity matrices. The three main scenarios are investigated: (a) designing a global structure among the data through building a granular proximity matrix, (b) refining a local structure (expressed in the form of a partition matrix) by engaging structural knowledge conveyed at the higher level of the hierarchy and provided in the form of the granular proximity matrix, (c) forming a consensus-building scheme and updating all local structures with the aid of the proximity dependences available at the upper layer of the hierarchy. While the first scenario delivers a passive approach to the development of the global structure, the two others are of an active nature by facilitating a structural feedback between the local and global level of the hierarchy of the developed structures. The study is illustrated through a series of experiments carried out for synthetic and publicly available data sets.

Keywords

Fuzzy clustering Proximity matrix Global structure Granular proximity Granular clustering Distributed data Consensus formation 

1 Introduction

Clustering is about revealing a structure in a single data set. Distributed clustering is concerned with the same problem present in situations where there is a family of data sets for which clustering is carried out separately. The term distributed clustering is quite often encountered in the literature. Furthermore, this type of clustering quite often comes with a remarkable variety of terminology, methods, evaluation measures, extensions (Corsini et al. 2005; Pedrycz and Rai 2008; Pedrycz 2005) and applications (Coppi et al. 2010; Graves et al. 2012; Peters 2011). The results of clustering are conveniently interpreted as information granules in the sequel benefiting from a wealth of conceptual developments of Granular Computing (Apolloni et al. 2006; Pedrycz 2013, 2007). Likewise the distributed nature of the data may imply cases when objects (patterns) are described in different feature spaces or all data sets are in the same feature space (Mali and Mitra 2003; Pedrycz and Rai 2008). There are also combinations of these two alternatives (Pedrycz and Rai 2008).

There are two essential aspects that have to be raised with regard to distributed clustering:
  1. (i)

    a way of communicating local findings (viz. the structure of the local data). The locally formed structures of the data (expressed in terms of partition matrices and prototypes) need to be compared. If we are concerned with different patterns, however, all of them are expressed in the same feature space, the structural signatures of the data that can be compared across the data come in the form of the prototypes of the clusters. For the same data (patterns), which are formed in different feature spaces, the local partition matrices are viable constructs using which we can build more abstract constructs (proximity matrices), which in the sequel can initiate some sharing findings across the sets of data. We stress that a direct comparison of partition matrices is not feasible as there are different numbers of clusters and there is no correspondence among the clusters constructed for the individual data sets.

     
  2. (ii)

    a way of building a global view at the data. There are two variants. Depending whether locally available results are refined or left intact we distinguish between active and passive approaches.

     
In this study, we are concerned with a category of objective function-based clustering in which the results of clustering come in the form of partition matrices and prototypes (or other numeric representatives). K-Means and Fuzzy C-Means (FCM) (Bezdek 1981) are highly visible alternatives of the clustering methods falling under this category. To facilitate a sound comparison of local structures, one has to look at them from a more general and abstract point of view than the one being conveyed by partition matrices. Proximity matrices (Bezdek 1981; Pedrycz 2005) come here as a viable alternative as they are abstracted from the number of clusters. Their dimensionality is N\(\times \)N meaning that all locally formed proximity matrices can be compared (matched) as their dimensionality does not depend explicitly upon the number of clusters. The usage of proximity matrices in clustering problems has been reported in so-called proximity-based clustering, see (Graves et al. 2012; Pedrycz et al. 2004; Pedrycz 2004).

The main objective of this study is to develop a general concept of distributed clustering based on the principles of Granular Computing (Pedrycz et al. 1998) and their constructs (Apolloni et al. 2006; Pedrycz et al. 2004). Here our focus is on the data described in different feature spaces, which implies that a communication vehicle is established in terms of proximity matrices. Based on this form of interaction, we discuss three main conceptual settings. The one is of a passive nature, which concentrates on a granular characterization of proximity-based structure with the invocation of granular proximity matrices of a global character. The two other active-like alternatives invoke some structural feedback to refine local structures on a basis of the global result (viz. a granular proximity matrix).

Our investigations come with several well-articulated aspects of originality. The formulation of the problem is original: although some facets of collaborative clustering have been investigated in the literature, those approaches focus on the passive mode meaning that the results of clustering are aggregated, however, an active facet is not considered at all meaning that no mechanisms adjusting local clustering findings were developed given some global findings. Let us recall that a passive mode implies that the locally available clustering results are provided and some aggregation mechanism is invoked, which gives rise to a general (global) view of the results. In this process, irrespectively of the result obtained at the global level, the local structures (clusters) are not modified (affected). In contrast, when talking about an active mode, a feedback loop is being formed so that in an iterative fashion the local results give rise to some global results. In the sequel, those are contrasted with the results available at the lower level and as a result the local results are modified following a certain adjustment strategy so that in the next iteration there is some improvement observed at the level of the global results.

It is worth noting that there have been some interesting earlier studies on interval-valued clustering, cf. (Souza and Carvalho 2004; Gacek and Pedrycz 2013; Hathaway et al. 1996; Pedrycz et al. 1998; Hwang and Rhee 2007; Mali and Mitra 2003; Wong and Hu 2013; Zhang et al. 2014). There is, however, an important difference between the undertaken research and the previous line of investigation. Here we are concerned with numeric data whereas information granules come as a result of reconciliation of results and are reflective of the diversity of the local findings. The previous studies were focused on granular clustering, more specifically, interval-valued data

The paper is structured as follows. In Sect. 2, we highlight an essence of the problem and identify a role of information granularity being played in this setting. In the sequel, we briefly outline the essence of the main classes of problems (Sect. 3). In Sect. 4, all associated optimization problems are formulated and solved. More specifically, we discuss a way of forming granular proximity matrices through the use of the principle of justifiable granularity (Pedrycz 2013), and look at the techniques of refining local partition matrices based on the gradient-based optimization and particle swarm optimization (PSO) as well as a hybrid of these two techniques. Two ways of characterization of granular proximity matrices are discussed. Numeric studies are covered in Sect. 5.

In the study, we adhere to the standard notation encountered in pattern recognition, clustering and system modeling. To emphasize the origin of the locally available data and the resulting constructs, we use indexes placed in square brackets, say \(c\)[ii], \(U\)[ii], \(u_{ik}\)[ii], etc.

2 The essence of the problem and underlying facet of information granularity

Let us consider a collection of \(p\) data sets \(\mathbf{D}_{1}, \mathbf{D}_{2}, {\ldots }, \mathbf{D}_{p}\) originating from a certain problem. For instance, those sets could be data describing a certain system for which formed are individual, local views associated with their local data. The data originated from different collections may be described in different feature spaces \(\mathbf{F}_{1}, \mathbf{F}_{2}, {\ldots ,} \mathbf{F}_{p}\). In general, we also assume that some data points are shared among the data sets meaning that an intersection of them is nonempty, namely \(\mathbf{D} = \mathbf{D}_{1} \cap \mathbf{D}_{2} \cap {\cdots }\cap \mathbf{D}_{p}\) card \((\mathbf{D}) =N\). Formally speaking, a selected object o\(_{k}\) belonging to the intersection of \(\mathbf{D}_{1}, \mathbf{D}_{2},{\ldots }\), and D \(_{p}\) comes with its own vectors of features x \(_{k}\)[1], x \(_{k}\)[2], ...,x \(_{k}\)[\(p\)] defined in the corresponding feature spaces. The data \({\{}\mathbf{x}_{k}[ii]{\}}\), \(k=\) 1, 2, ...\(N\) forming the data set D \(_{ii}\) are clustered in the corresponding feature space resulting in the corresponding partition matrix \(U\)[ii]. Clustering is completed for other data sets subsequently giving rise to partition matrices \(U\)[1], \(U\)[2],..., and \(U\)[\(p\)], respectively. In general, the number of clusters associated with these partition matrices, namely \(c\)[1], \(c\)[2],.., and \(c\)[\(p\)], could vary from one data set to another. The partition matrices formed in this way exhibit a local character, viz. they are concerned with the findings being confined to the given feature space and produced for the particular locally available data set. Our key objective of this study is to discover (or reconcile) a global structure in the data based on the reconciliation of the local views conveyed through the already constructed partition matrices.

There are two essential and general observations to be made here in the context of the problem under study:
  1. (a)

    It is apparent that any aggregation of the partition matrices is not feasible because of the fact that the number of clusters could vary from one partition matrix to another. To proceed with any comparison of partition matrices, this process cannot be realized directly but through comparing proximity matrices induced by the corresponding partition matrices.

     
  2. (b)

    as the resulting proximity matrices exhibit an evident diversity, we may contemplate to use an aggregation mechanism that fully reflects and quantifies this diversity. This, in turn, brings a concept of granular proximity matrices as the constructs capturing this facet of the existing variety among the local proximity matrices.

     
In the study, when developing a structure of a global nature, we rely on the fundamental concept of proximity matrices implied by partition matrices. Recall that for any partition matrix \(U\)[ii]= [\(u_{ik}\)[ii]], \(i=\) 1, 2,..., \(c\)[ii], \(k=\) 1, 2, ..., \(N\), the corresponding proximity matrix \(P\)[ii] \(=\) [\(p_{kl}\)[ii]] comes in the following form
$$\begin{aligned} p_{kl} \left[ {ii} \right] =\sum \limits _{i=1}^{c[ii]} {\hbox {min}(u_{ik} [ii],u_{il} [ii])} \end{aligned}$$
(1)
\(k\), \( l =\) 1, 2,..., \(N\).
It is instructive to provide a brief example to highlight the essence of the approach, motivate its origin, and shed light at the multistep processing dwelling on a formation of proximity matrices. The five two-dimensional data positioned in several feature subspaces are shown in Fig. 1.
Fig. 1

Example data positioned in three subspaces

It is apparent that the structures vary from one feature space to another. Assume that the partition matrices \(U\)[1], \(U\)[2], and \(U\)[3] have the following entries which reflect a distribution of the data. Likewise, by visualizing the structures of the data, the number of clusters varies from 2 to 3.
$$\begin{aligned} U\left[ 1 \right] =\left[ {{\begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} {0.9} &{}0.8&{}0.3&{}0.1&{}0 \\ {0.1} &{}0.2&{}0.7&{}0.9&{}1 \\ \end{array} }} \right] \end{aligned}$$
$$\begin{aligned} U\left[ 2 \right] =\left[ {{\begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} 1.0 &{}0.1&{}0.9&{}0.0&{}0.0 \\ {0.0} &{}0.8&{}0.1&{}0.0&{}0.0 \\ {0.0} &{}0.1&{}0.0&{}1.0&{}1.0 \\ \end{array} }} \right] \end{aligned}$$
$$\begin{aligned} U\left[ 3 \right] =\left[ {{\begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} {1.0} &{}0.1&{}0.2&{}0.3&{}1 \\ {0.0} &{}0.9&{}0.8&{}0.7&{}0 \\ \end{array} }} \right] \end{aligned}$$
In light of the varying dimensionality of these matrices, they cannot be compared directly. Instead, we have to consider some constructs built at the higher level of generality whose representation does not explicitly involve the number of clusters. Here, the corresponding proximity matrices are formed. These proximity matrices are determined following (1) and come in the form
$$\begin{aligned} P\left[ 1 \right] =\left[ {{\begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} {1.0} &{}{0.3}&{}0.4&{}0.2&{}0.1\\ &{}{1.0}&{}{0.5}&{}0.3&{}0.2 \\ &{}&{}{1.0}&{}0.8&{}0.7 \\ &{}&{}&{}{1.0}&{}0.9 \\ &{}&{}&{}&{}{1.0} \\ \end{array} }} \right] \end{aligned}$$
$$\begin{aligned} P\left[ 2 \right] =\left[ {{\begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} {1.0} &{}{0.1}&{}0.0&{}0.0&{}0.0\\ &{}{1.0}&{}{0.2}&{}0.1&{}0.1 \\ &{}&{}{1.0}&{}0.0&{}0.0 \\ &{}&{}&{}{1.0}&{}0.0 \\ &{}&{}&{}&{}{1.0} \\ \end{array} }} \right] \end{aligned}$$
$$\begin{aligned} P\left[ 3 \right] =\left[ {{\begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} {1.0} &{}{0.1}&{}0.2&{}0.3&{}0.0\\ &{}{1.0}&{}{0.9}&{}0.8&{}0.1 \\ &{}&{}{1.0}&{}0.9&{}0.2 \\ &{}&{}&{}{1.0}&{}0.3 \\ &{}&{}&{}&{}{1.0} \\ \end{array} }} \right] \end{aligned}$$
The proximity matrices exhibit different entries and as a result of their summarization produces the construct at the next level of generality, namely, a granular proximity matrix.

The granular proximity matrix visualizes an emergence of groups of data that are kept close to each other. One can observe a jump in the values of the entries pointing at the occurrence of the well-formed clusters.

3 Main classes of problems

We can distinguish three general categories of problems where the fundamental ideas outlined so far can be fully exploited. The essence of these problems is visualized through a series of figures that help contrast different tasks being studied here.

3.1 Formation of a general description of data

The underlying objective of this problem is to describe the data at the global level by aggregating individual findings conveyed by the locally available proximity matrices, see Fig. 2.
Fig. 2

Building a general description of structure—from local proximity matrices \(P_{1}, P_{2},{\ldots }, P_{p}\) to granular proximity matrix of global character \(G(P)\)

The granular character of the proximity matrix formed at the upper level of hierarchy quantifies the diversity of the local structures. The entries of the granular proximity matrix \(G(P) =[p_{kl}^{-}, p_{kl}^{+}], k, l =1, 2, {\ldots }, N\), which are intervals with the lower and upper bounds \(p_{kl}^{-}\) and \(p_{kl}^{+}\), quantify a strength of linkage occurring between a pair of data, say \(k\) and \(l\).

3.2 Refinement of the locally discovered structure of data

In this situation, we are focused on a certain locally available data and attempt to reconcile the findings by augmenting (affecting) the already constructed partition matrix by the additional source of structural knowledge formed at the upper level of the hierarchy, see Fig. 3. In light of the structure of the auxiliary knowledge, this development falls under an umbrella of so-called knowledge-based clustering,
Fig. 3

Updating partition matrix \(U_{i}\) by invoking a feedback loop involving a global proximity matrix at the upper level of the hierarchy

The optimization mechanism is the one discussed in depth in Sect. 4.2.

3.3 Building consensus

In this scenario, we are concerned with a reconciliation of the structural findings obtained locally. The consensus building is an iterative process as illustrated in Fig. 4.
Fig. 4

Building consensus on a basis of a collection of local proximity matrices: observe a feedback loop involving all partition matrices and the global granular proximity matrix

The result formed at the global level is used to adjust the individual partition matrices \(U_{1}, U_{2}, {\ldots }, U_{p}\). The optimization is realized following the scheme outlined in the previous section. The updated partition matrices are used to build proximity matrices and those give rise to the granular proximity matrix. In turn, the new proximity matrix offers a navigation of optimization of the local partition matrices and the iterations are continued. The convergence of the process becomes critical and for this a suitable index needs to be established.

4 Associated optimization problems, their solutions and characterizations

The three categories of problems outlined in the previous section call for a certain way of formulating the ensuing optimization problem, building its solution and finally characterizing the quality of the obtained solutions. We come up with some design procedures and show their arrangement when solving the three categories of problems formulated above.

4.1 Development of granular proximity matrix

A formulation of a granular proximity matrix is a common task encountered in the three classes of problems discussed above. Given a collection of proximity matrices \(P[1], P[2], {\ldots }, P[p]\) we realize a granular proximity matrix \(G(P)\) so that the inherent granularity of this construct is captured and quantified. Proximity matrices exhibit an inherent diversity. Being cognizant of this fact, we can assume that the aggregation result is a granular proximity matrix \(G(P)\) whose entries are intervals located in the [0,1] interval. In other words, the granular proximity matrix comes with interval-values entries, \(G(P)= [g_{kl}^{-}, g_{kl}^{+}], k, l=1, 2,{\ldots }, N\) The interval-valued character of the construct is reflective of the variability in the local findings.

The construction of the granular proximity matrix \(G(P)\) realized for individual entries of the matrix is realized by invoking a principle of justifiable granularity (Pedrycz 2013). In a nutshell, this principle states that when aggregating some numeric experimental evidence, in the face of the diversity of the existing pieces of evidence, the result is a certain information granule (instead of another numeric outcome) such that it is supported enough by the experimental data while simultaneously demonstrating sufficient specificity thus coming with a well-conveyed semantics. The principle is applied to the individual entries of the proximity matrices. Let us consider the (\(k,l)\)-th entry of the proximity matrices \(P[1], P[2],{\ldots }, P[p]\), namely consider a set \(\mathbf{P} = {\{}p_{kl}[1], p_{kl}[2],{\ldots }, p_{kl}[p]{\}}\). We also assume that some initial numeric representative (say mean or median) of P is provided. For the (\(k,l)\)-th entry we denote it by \(m_{kl}\).

In the simplest scenario, the principle of justifiable granularity gives rise to an interval representation [\(g_{kl}^{-}\), \(g_{kl}^{+}\)] of the data through the maximization of the coverage of the data (the requirement of sufficient experimental evidence) and the specificity of the information granule (semantics constraint) where these two fundamental requirements are expressed as follows (below we are concerned with the upper bound of the interval; the determination of the lower bound is realized in the same manner).
$$\begin{aligned} \begin{array}{l@{\quad }l} \hbox {experimental evidence}&{}\\ \quad ({\hbox {coverage of data}})&{} f_1 ( {p_{kl} ^+})\!=\!\hbox {card}\left\{ {p_{kl} \vert p_{kl} >m_{kl} } \right\} \\ \hbox {specificity requirement}&{}f_2 ( {p_{kl} ^+})\!=\!\hbox {exp}( {-\alpha \left| {m_{kl} \!-\!p_{kl}} \right| }),\alpha \!\ge \!0 \\ \end{array} \end{aligned}$$
(2)
It is apparent that these two requirements are in conflict; any improvement of one of them deteriorates the performance of another one. A compromise is set up when the product of the terms \(V=f_{1}*f_{2}\) attains its maximal value. In other words, the optimal upper bound of the interval, say \(g_{kl}^{+}=\) arg Max\(_{g} V(g)\) is achieved.
In the sequel, the optimal bound, \(b_{opt}\), is obtained as a result of the following optimization problem
$$\begin{aligned} b_{\hbox {opt}} =\hbox { arg Max }V \end{aligned}$$
(3)
The range of possible values of \(\alpha \) requires some clarification. The smallest value of \(\alpha \) is equal to 0; in this case, the corresponding optimal bounds of a and b are the lowest and highest values of \(z_{k}\), \(a_\mathrm{opt }\) = arg min \({\{}z_{1}, z_{2}, {\ldots },z_{N}{\}}\) and \(b_\mathrm{opt }\) = arg max \({\{}z_{1}, z_{2}, {\ldots },z_{N}{\}}\). The higher the value of \(\alpha \), the more specific the resulting interval. In other words, the maximal value of \(\alpha \) is the one, which results in the shortest interval. To come up with the detailed computing, let us consider a subset of the original data \({\{}z_{1}, z_{2}, {\ldots },z_{N}{\}}\) whose elements are greater than the numeric representative. Furthermore arrange them in an increasing order which yields a set \({\{}p_{1}, p_{2}, {\ldots } p_{M}{\}}\) where \(r_{1 }< r_{2}<{\ldots }\) and \(M<N\). Likewise, the associated sequence of the weights is given as \(w_{1}, w_{2}, {\ldots }w_{M}\). The maximal value of \(\alpha , \alpha _\mathrm{max }\) is then the one, which satisfies all inequalities listed below.
$$\begin{aligned} \begin{array}{l} \hbox {exp}(-\alpha \vert m-r_1 \vert )>(w_1 +w_2 )*\hbox {exp}(-\alpha \vert m-r_2 \vert ) \\ w_1 *\hbox {exp}(-\alpha \vert m-r_1 \vert )>(w_1 +w_2 +w_3 )*\hbox {exp}(-\alpha \vert m-r_3 \vert ) \\ \ldots . \\ w_1 *\hbox {exp}(-\alpha \vert m-r_1 \vert )>(w_1 +w_2 +w_3 +\ldots +w_M )\\ \quad *\hbox {exp}(-\alpha \vert m-r_M \vert ) \\ \end{array} \end{aligned}$$
(4)
The same process is realized for the value of \(\alpha \) associated with the lower bound; the result is denoted by \(\alpha _\mathrm{max }\)’. Now we can realize normalization by admitting a unified [0,1] range of values of \(\alpha \) which helps us form a series of intervals being formed by a single value of a for their lower and upper bounds. In other words for any value of \(\alpha \) it is transformed to its internal value by scaling it to \(\alpha *\alpha _\mathrm{max }\) and \(\alpha *\alpha _\mathrm{max }\)’, respectively. It is worth noting that these intervals indexed by successive values of \(\alpha \) are \(\alpha \)-cuts of a certain fuzzy set. In other words, here the result of the principle of justifiable granularity becomes a fuzzy set.

This procedure is directly applicable to the construction of granular partition matrix \(G(P)\).

The two main steps are envisioned here:
  1. (i)

    formation of clusters and partition matrices for \(\mathbf{D}_{1}, \mathbf{D}_{2},\) \( {\ldots }\mathbf{D}_{p}\), viz. \(U\)[1], \(U\)[2] ,..., \(U\)[\(p\)].

     
  2. (ii)

    building proximity matrices \(P(U[1]), P(U[2]), {\ldots }\) \( P(U[p])\). Note that they are produced for all pairs of the data belonging to D.

     
  3. (iii)

    use of the principle of justifiable granularity to form the granular construct \(G(P)\). The interval-valued proximity matrix is built for a certain predetermined value of \(\alpha \).

     

4.2 Refinement of local partition matrix

The crux of this scenario has been captured in Fig. 4. From the optimization perspective, we first form a granular proximity matrix \(G(P)\) and afterwards use it in the refinement of some locally constructed partition matrix.

For some given partition matrix \(U\)[ii], we proceed with its modifications (adjustments) in such a way that \(P(U\)[ii]) is “contained” in \(G(P)\) to the highest extent. The adjustments are made possible by engaging an idea of optimal allocation of information granularity. The underlying idea is to adjust the entries of \(U[ii], ii=1, 2, {\ldots },p\) in such away that the modified partition matrix produces a proximity matrix whose values are included in the interval-valued entries of \(G(P)\).

In what follows we elaborate on a detailed algorithm. As the method is the same for any local partition matrix, we omit the index (ii) and use a simplified notation \(U =[u_{ik}], i=1, 2{\ldots } c, k=1, 2, ..N\) (as noted earlier we are concerned with the data belonging only to the intersection of the local data, namely D). The entries of \(G(P)\) are intervals denoted as \([g_{kl}^{-}, g_{kl}^{+}], k, l=1, 2,{\ldots },N\). To express the request of inclusion we introduce the following criterion
$$\begin{aligned} V(U)=\vert \vert P( U) \in G( P)\vert \vert \end{aligned}$$
(5)
Here \(\vert \vert a\in A\vert \vert \) stands for a degree of inclusion of numeric value “\(a\)” in the interval \(A\). Obviously one can study here Boolean (0-1) predicates however its multivalued counterpart of the inclusion predicate is more suitable for optimization purposes as it offers some desirable aspects of continuity assuming truth values ranging from 0 to 1.
Let us rewrite (5) using a multivalued inclusion predicate (\(\phi \))
$$\begin{aligned} V( U)=\sum \limits _{k,l=1}^N {(p_{kl} \varphi } g_{kl}^+ )(g_{kl}^- \varphi p_{kl}) \end{aligned}$$
(6)
where \(p_{kl}\) is the entry of the proximity matrix for the pair of (\(k\), \(l)\) data. In more detail the inclusion predicate shown above is defined in the following form
$$\begin{aligned} a\phi b=\left\{ {{\begin{array}{l@{\quad }l@{\quad }l} 1 &{} \hbox {if}&{} a\le b \\ b/a, &{} \hbox {if}&{} a>b \\ \end{array}}} \right. \end{aligned}$$
(7)
\(a\), \(b\, \in \, \)[0,1].
The adjustment of the entries of the partition matrix is done in an iterative fashion by following the gradient-based optimization scheme
$$\begin{aligned} u_{st} ( \mathrm{iter +1})=u_{st} (\mathrm{iter })-\xi \frac{\partial }{\partial u_{st} }V( U) \end{aligned}$$
(8)
\(\xi > 0\) is a certain learning rate controlling intensity of the learning process .
Proceeding with the detailed formulas of the gradient we complete the following calculations
$$\begin{aligned} \frac{\partial V}{\partial u_{st} }=\sum \nolimits _{k,l} {\left[ \frac{\partial (p_{kl} \varphi g_{kl}^+ )}{\partial u_{st} }(g_{kl}^- \varphi p_{kl} )+\frac{\partial (g_{\hbox {kl}}^- \varphi p_{kl} )}{\partial u_{st} }(p_{kl} \varphi g_{kl}^+ )\right] }\nonumber \\ \end{aligned}$$
(9)
\(s=1, 2, {\ldots }, c\) and (\(k\), \(l\)) pertains top the pair of data in D. Proceeding with the detailed calculations, we note that \(\frac{\partial (p_{kl} \varphi g_{kl}^+ )}{\partial u_{st} }=\frac{\partial (p_{kl} \varphi g_{kl}^+ )}{\partial p_{kl} }\frac{\partial p_{kl} }{\partial u_{st} }\). This yields
$$\begin{aligned} \frac{\partial (p_{kl} \varphi g_{kl}^+ )}{\partial p_{kl} }=\frac{\partial }{\partial \hbox {p}_{\hbox {kl}} }\left\{ {{\begin{array}{l@{\quad }l} 1,&{} \hbox {if }p_{kl} \le g_{kl}^+ \\ \frac{g_{kl}^+ }{p_{kl} },&{}\hbox {if } p_{kl} >g_{kl}^+ \\ \end{array} }} \right. \!=\!\left\{ {{\begin{array}{l@{\quad }l} 0,&{} \hbox {if }p_{kl} \le g_{kl}^+ \\ \frac{-g_{kl}^+ }{p_{_{kl} }^2 }, &{} \hbox {if }\, p_{kl} >g_{kl}^+ \\ \end{array} }} \right. \nonumber \\ \end{aligned}$$
(10)
$$\begin{aligned} \frac{\partial p_{kl} }{\partial u_{st} }=\frac{\partial }{\partial u_{st} }\sum \limits _{w\hbox {=1}}^c {\hbox {min(}u_{wk} \hbox {,}u_{wl} } \hbox {)}=\sum \limits _{w\hbox {=1}}^c {\frac{\partial }{\partial u_{st} }\hbox {min(}u_{wk} \hbox {,}u_{wl} } \hbox {)}\nonumber \\ \end{aligned}$$
(11)
and in the sequel we have
$$\begin{aligned}&\frac{\partial }{\partial \hbox {u}_{\hbox {st}} }\hbox {min(u}_{\hbox {wk}}, \hbox {u}_{\hbox {wl}} )\nonumber \\&\quad =\left\{ \begin{array}{l@{\quad }l} 1 &{} \hbox {if (u}_{\hbox {wk}} \le \hbox {u}_{\hbox {wl}} \hbox {,w=s, k=t) or (u}_{\hbox {wl}} \le \hbox {u}_{\hbox {wk}} \hbox {,w=s, k=t) } \\ 0, &{} {\hbox {otherwise}} \\ \end{array} \right. \nonumber \\ \end{aligned}$$
(12)
Proceeding with the second part of the (9), the detailed formula reads as follows
$$\begin{aligned} \frac{\partial (g_{kl}^- \varphi p_{kl} )}{\partial p_{kl} }\!=\!\frac{\partial }{\partial \hbox {p}_{\hbox {kl}} }\left\{ \begin{array}{l@{\quad }l} \hbox {1} &{} \hbox {if } g_{kl}^- \le p_{kl} \\ \frac{p_{kl} }{g_{_{kl} }^- }, &{} \hbox {if }g_{kl}^- >p_{kl} \\ \end{array} \right. \!=\!\left\{ \begin{array}{l@{\quad }l} \hbox {0}, &{} \hbox {if } g_{kl}^- \le p_{kl} \\ \frac{1}{g_{kl}^- }, &{} \hbox {if }g_{kl}^- >p_{kl} \\ \end{array} \right. \nonumber \\ \end{aligned}$$
(13)
Note that to retain the values of the partition matrix in the unit interval, a clipping operation, if required, is invoked that is at any iteration the values of \(u_{st}\)(iter+1) are kept within the [0,1] interval. Furthermore, we complete an additional normalization operation to keep the sum of the values of \(u_{st}\) (being summed over “\(s\)” for any “\(t\)”) equal to 1. The initial point of the iteration scheme is the original partition matrix locally available for each data.

The gradient-based mechanism can be considered as a stand-a-lone optimization scheme or could be considered in conjunction with more advanced population-based optimization such as Particle Swarm Optimization (PSO) and establishes a hybrid optimization scheme in which both of these optimization mechanisms are arranged in a certain sequence with an ultimate intent of avoiding local minima.

With regard to the optimization, several hybridizations of the generic optimization mechanisms are worth investigating, namely, tandems of PSO-gradient method and gradient optimization- PSO where we capitalize on the key properties of these techniques. PSO as the population-based technique is beneficial in realizing a global-oriented search, whereas the gradient-oriented method comes with a very detailed search capabilities, however, it is also prone to being stuck in possible local minima. A hybrid scheme of the form of PSO followed by the gradient-based technique comes as a sound alternative emphasizing the advantages of the contributing methods.

4.3 A general scheme of consensus building

In contrast to the two previously outlined processes in which granular proximity matrices are involved, consensus building is an iterative process and its dynamics comes into play. We proceed in an iterative fashion by forming a granular proximity matrix \(G(P)\) on the basis of locally formed partition matrices (proximity matrices) and then update each \(U[1], U[2],{\ldots }, U[p]\) as discussed in the second scheme. Then these updated partition matrices lead to the proximity matrices and subsequently the new granular proximity matrix is produced. This complete iterative loop is repeated. The process is monitored with respect to its convergence. Some parameters of the method, especially the values of \(\alpha \) can impact the convergence process and their impact can be assessed in an experimental fashion.

4.4 Characterization of granular proximity matrices

There are several indicators that can be used as sound descriptors of the produced granular proximity matrix supporting also the quality of the convergence process encountered in consensus building.

4.4.1 Linkage analysis

As the matrix \(G(P)\) is of granular character, its quantification of content is realized as interval-valued strength of linkage or a fuzzy set of linkage. Moving at the more synthesized level of description by summing the elements of the \(k\)-th row of the matrix, namely
$$\begin{aligned} \left[ {a_k ,b_k } \right]&= \frac{1}{N-1}\sum \limits _{\begin{array}{c} l=1 \\ l\ne k \\ \end{array}}^N {\left[ {g_{kl} ^-,g_{kl} ^+} \right] }\nonumber \\&= \left[ \frac{1}{N-1}\sum \limits _{\begin{array}{c} l=1 \\ l\ne k \\ \end{array}}^N {g_{kl}^- } ,\frac{1}{N-1}\sum \limits _{\begin{array}{c} l=1 \\ l\ne k \\ \end{array}}^N {g_{kl}^+ } \right] \end{aligned}$$
(14)
we can identify data that are potential outliers—those are those data points for which the interval [\(a_{k}\), \(b_{k}\)] becomes located close to zero. Furthermore, the length of the interval, namely \(\vert b_{k}-a_{k}\vert \), becomes reflective of the diversity of the evaluations of proximity values of the \(k\)-th data delivered locally. This allows us to rank the data in terms of their associations with other data and tag outliers.

4.4.2 Overall granularity of granular proximity matrix

This index is useful when building consensus in an iterative process. The updated partition matrices are used to build proximity matrices and those give rise to the granular proximity matrix. In turn, the new proximity matrix offers a navigation of optimization of the local partition matrices and the iterations are continued. The convergence of the process can be monitored by an overall level of granularity of the granular partition matrix, denoted by Gran produced in the successive iteration steps,
$$\begin{aligned} \mathrm Gran =\frac{1}{N^2-N}\sum \limits _{\begin{array}{c} k\hbox {,l} \\ k\ne \hbox {l} \\ \end{array}} {(g_{kl}^+ -g_{kl}^- \hbox {)}} \end{aligned}$$
(15)
The decreasing values of this index are indicative of the reduced diversity of the locally formed structures and in this way they point at the increasing agreement among different locally produced views.

5 Numeric studies

In this section, a series of experiments involving both synthetic and real-world data are presented to illustrate how different schemes discussed above operate and a form of the results formed. In all experiments, we consider Fuzzy C-Means (FCM) algorithm (Bezdek 1981) run with the fuzzification coefficient set to 2, \(m=\) 2.

5.1 Synthetic data

Here, we consider 15 synthetic data sets, both two- and three-dimensional ones, coming as mixtures of data governed by Gaussian distributions with some mean vectors \(m\) and covariance matrices \(\Sigma \), N(\(m\), \(\Sigma \)). The statistical characteristics of the data are summarized in Table 1 while Fig. 5 displays their distribution in the corresponding feature spaces. Each set consists of 50 data points. As we assume the knowledge of the structure of the data, the number of clusters was set as \(c_{p}[1], c_{p}[2]{\ldots }.c_{p}[ii],ii=1,2.. c[p]\)
Table 1

Statistical characteristics of synthetic two- and three-dimensional data

Set

\(c\)[\(p\)]

    

D \(_{1}\)

2

m \(=\) [\(-\)3 9] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.2} &{} 0 \\ 0 &{} {0.6} \\ \end{array} }} \right] \)

m \(=\) [\(-\)2 4] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.4} &{} 0 \\ 0 &{} {0.8} \\ \end{array} }} \right] \)

  

D \(_{2}\)

2

m \(=\) [2 \(-\)9 1] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {2.1} &{} 0 &{} 0 \\ 0 &{} {1.7} &{} 0 \\ 0 &{} 0 &{} {0.4} \\ \end{array} }} \right] \)

m \(=\) [8 6 4] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {2.2} &{} 0 &{} 0 \\ 0 &{} {1.7} &{} 0 \\ 0 &{} 0 &{} {0.7} \\ \end{array} }} \right] \)

  

D \(_{3}\)

3

m \(=\) [9 1] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.3} &{} 0 \\ 0 &{} {1.9} \\ \end{array} }} \right] \)

m \(=\) [\(-\)10 \(-\)6] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.0} &{} 0 \\ 0 &{} {0.1} \\ \end{array} }} \right] \)

m \(=\) [\(-\)1 \(-\)6] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.8} &{} 0 \\ 0 &{} 1 \\ \end{array} }} \right] \)

 

D \(_{4}\)

3

m \(=\) [2 4] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.2} &{} 0 \\ 0 &{} {0.1} \\ \end{array} }} \right] \)

m \(=\) [\(-\)2 \(-\)3] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.8} &{} 0 \\ 0 &{} {0.9} \\ \end{array} }} \right] \)

m \(=\) [10 2] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.6} &{} 0 \\ 0 &{} {0.5} \\ \end{array} }} \right] \)

 

D \(_{5}\)

4

m \(=\) [8 9] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.0} &{} 0 \\ 0 &{} {0.5} \\ \end{array} }} \right] \)

m \(=\) [6 \(-\)3] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.4} &{} 0 \\ 0 &{} {0.6} \\ \end{array} }} \right] \)

m \(=\) [2 2] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {0.9} &{} 0 \\ 0 &{} {0.1} \\ \end{array} }} \right] \)

m \(=\) [\(-\)3 \(-\)1] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.9} &{} 0 \\ 0 &{} {1.9} \\ \end{array} }} \right] \)

D \(_{6}\)

2

m \(=\) [3 0 \(-\)10] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {1.9} &{} 0 &{} 0 \\ 0 &{} {0.4} &{} 0 \\ 0 &{} 0 &{} {0.1} \\ \end{array} }} \right] \)

m \(=\) [\(-\)10 1 5] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {2.6} &{} 0 &{} 0 \\ 0 &{} {1.8} &{} 0 \\ 0 &{} 0 &{} {0.2} \\ \end{array} }} \right] \)

  

D \(_{7}\)

4

m \(=\) [10 \(-\)4] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.0} &{} 0 \\ 0 &{} {1.2} \\ \end{array} }} \right] \)

m \(=\) [\(-\)9 \(-\)9] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.5} &{} 0 \\ 0 &{} {1.8} \\ \end{array} }} \right] \)

m \(=\) [\(-\)9 7] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.4} &{} 0 \\ 0 &{} {1.6} \\ \end{array} }} \right] \)

m \(=\) [\(-\)2 \(-\)4] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {0.1} &{} 0 \\ 0 &{} {0.4} \\ \end{array} }} \right] \)

D \(_{8}\)

2

m \(=\) [6 10 \(-\)5] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {0.5} &{} 0 &{} 0 \\ 0 &{} {0.3} &{} 0 \\ 0 &{} 0 &{} {0.9} \\ \end{array} }} \right] \)

m \(=\) [1 8 \(-\)5] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {2.9} &{} 0 &{} 0 \\ 0 &{} {0.3} &{} 0 \\ 0 &{} 0 &{} {0.6} \\ \end{array} }} \right] \)

  

D \(_{9}\)

2

m \(=\) [\(-\)8 1 0] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {2.1} &{} 0 &{} 0 \\ 0 &{} {0.1} &{} 0 \\ 0 &{} 0 &{} {0.8} \\ \end{array} }} \right] \)

m \(=\) [8 \(-\)7 0] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {1.7} &{} 0 &{} 0 \\ 0 &{} {1.7} &{} 0 \\ 0 &{} 0 &{} {0.1} \\ \end{array} }} \right] \)

  

D \(_{10}\)

2

m \(=\) [1 \(-\)4] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.2} &{} 0 \\ 0 &{} {0.1} \\ \end{array} }} \right] \)

m \(=\) [\(-\)7 8] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.7} &{} 0 \\ 0 &{} {0.6} \\ \end{array} }} \right] \)

  

D \(_{11}\)

2

m \(=\) [\(-\)8 7] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.8} &{} 0 \\ 0 &{} {1.7} \\ \end{array} }} \right] \)

m \(=\) [\(-\)6 \(-\)1] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.1} &{} 0 \\ 0 &{} {0.4} \\ \end{array} }} \right] \)

  

D \(_{12}\)

4

m \(=\) [\(-\)5 5] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {2.9} &{} 0 \\ 0 &{} {0.6} \\ \end{array} }} \right] \)

m \(=\) [\(-\)5 6] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {0.6} &{} 0 \\ 0 &{} {0.4} \\ \end{array} }} \right] \)

m \(=\) [8 \(-\)5] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {0.5} &{} 0 \\ 0 &{} {1.7} \\ \end{array} }} \right] \)

m \(=\) [\(-\)9 10] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l} {1.2} &{} 0 \\ 0 &{} {1.6} \\ \end{array} }} \right] \)

D \(_{13}\)

2

m \(=\) [6 1 8] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {0.9} &{} 0 &{} 0 \\ 0 &{} {1.8} &{} 0 \\ 0 &{} 0 &{} {0.5} \\ \end{array} }} \right] \)

m \(=\) [\(-\)3 8 8] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {2.9} &{} 0 &{} 0 \\ 0 &{} {1.3} &{} 0 \\ 0 &{} 0 &{} {0.8} \\ \end{array} }} \right] \)

  

D \(_{14}\)

4

m \(=\) [6 10 \(-\)5] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {0.5} &{} 0 &{} 0 \\ 0 &{} {0.3} &{} 0 \\ 0 &{} 0 &{} {0.9} \\ \end{array} }} \right] \)

m \(=\) [1 8 \(-\)5] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {2.9} &{} 0 &{} 0 \\ 0 &{} {0.3} &{} 0 \\ 0 &{} 0 &{} {0.6} \\ \end{array} }} \right] \)

m \(=\) [\(-\)9 \(-\)6 5] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {2.0} &{} 0 &{} 0 \\ 0 &{} {0.1} &{} 0 \\ 0 &{} 0 &{} {0.2} \\ \end{array} }} \right] \)

m \(=\) [\(-\)9 1 8] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {1.7} &{} 0 &{} 0 \\ 0 &{} {1.9} &{} 0 \\ 0 &{} 0 &{} {0.9} \\ \end{array} }} \right] \)

D \(_{15}\)

2

m \(=\) [2 6 \(-\)6] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {1.5} &{} 0 &{} 0 \\ 0 &{} {0.2} &{} 0 \\ 0 &{} 0 &{} {0.7} \\ \end{array} }} \right] \)

m \(=\) [\(-\)3 \(-\)7 9] \(\Sigma =\left[ {{\begin{array}{l@{\quad }l@{\quad }l} {2.9} &{} 0 &{} 0 \\ 0 &{} {2.0} &{} 0 \\ 0 &{} 0 &{} {0.3} \\ \end{array}}} \right] \)

  
Fig. 5

Plots of synthetic data sets

In the sequel, we elaborate on the three models of building global structures of data or reconciling structural characteristics of the local nature.

Formation of granular proximity matrix

The objective is to build a granular proximity matrix of a general nature positioned at the upper level of the hierarchy. As discussed, we first form a collection of proximity matrices and afterwards form their granular generalization. For illustration, Fig. 6 visualizes a proximity matrix obtained for D \(_{1}\).
Fig. 6

Granular proximity matrix for D \(_1\). A sudden jump in the levels of brightness (resemblance values) occurring along the 30th data point is a result of the occurence of two well-formed clusters, see Fig. 8. The first one involves the first 30 data points while the rest of the data (from 31 to 50) form the second cluster

Figure 6 Granular proximity matrix for D \(_{1}\). A sudden jump in the levels of brightness (resemblance values) occurring along the 30th data point is a result of the occurence of two well-formed clusters, see Fig. 8. The first one involves the first 30 data points while the rest of the data (from 31 to 50) form the second cluster.

The granular proximity matrix visualizes an emergence of groups of data that are kept close to each other. One can observe a jump in the values of the entries pointing at the occurrence of the well-formed clusters.

The formation of granular proximity matrix was considered for several values of \(\alpha \). Proceeding with the link analysis, we obtain the results shown in Fig. 7. The link analysis helps us visualize the points that weakly linked with the rest of the data as well as highlight those for which the length of the interval is excessively large.
Fig. 7

Bounds of the linkage levels obtained for the granular proximity matrix for a \(\alpha \)=0.3, b \(\alpha \)=0.6 and c \(\alpha \)=1

Fig. 8

Performance index \(V\) in the course of optimization (reported in successive generations of PSO or iterations of the gradient-based optimization) (\(\alpha =\) 0.3) using a gradient-based method, b PSO, c gradient- PSO and d PSO-gradient

These results are reported for several values of \(\alpha \) (namely, 0.3, 0.6, and 1). There is a visible tendency of a stronger and visible revealing of the outliers when the levels of binding locally available structures are made stronger. For instance, as shown in Fig. 7c, it is apparent that there are some collections of data (those indexed as 1–5 and 35–50) which are different with regard to the locally present structures.

Involvement of global granular results in the enhancements of the local partition matrix. Here, as discussed earlier, the optimization strategy involves plain gradient-based method and PSO as well as several of their hybrid approaches combining these generic methods.

The results are reported in terms of the performance index \(V\) whereas the initial learning rate was set to \(\xi \) = 0.01.

The hybrid optimization method PSO-gradient, Fig. 8d, outperforms other optimization schemes. The partition matrix \(U\) generated with the use of the PSO method, Fig. 8b, serves as a sound initial condition for the gradient-based method, which is helpful in carrying out fine-tuning of the entries of \(U\). For the other hybrid method (gradient-PSO) Fig. 8c, it is clearly shown that PSO does not produce further improvement for \(V\) as it is eventually stuck in some local maxima.

It is noted that the value of the learning rate \(\xi \) was made quite low with intent of making the process stable when it comes to the fine-tuning phase of the entries of \(G(P)\). Furthermore, we have adopted some dynamic changes of the values of the learning rate. If the value of \(V\) decreases in a certain iteration, this iteration is ignored and the value of \(\xi \) is decreased, say being made 0.75*\(\xi \). Next we continue the learning with this new value of \(\xi \) until \(V\) decreases again, etc.

The original partition matrix (formed locally) and its refinement produced by the involvement of the granular proximity matrix are shown in Fig. 9.
Fig. 9

Partition matrix for D \(_{1}\) a obtained locally, b refinement realized when \(\alpha =0.3\) and c refinement realized when \(\alpha =0.6\)

While the local structure is quite apparent, the refinements guided by the globally produced structure lead to some changes of the structure. This is not surprising as some global structure is considered and its impacts become clear.

The results obtained after the detailed refinements of partition matrix are visualized in Fig. 10.
Fig. 10

Proximity matrix of after refinement partition matrix (\(\alpha =0.3\)) obtained for D \(_{1}\)

Figure 9 reveals some interesting relationships. When only local data are considered, there is a well-delineated structure, which points at two clusters. For the increasing values of \(\alpha , \alpha \) = 0.3 and 0.6, we witness an increasingly influential impact of the global structure so the clusters are not as distinct as in the first case. Obviously, this is not surprising, as now we have started accommodating a global view (structure), which might not be in full agreement with the local topology of the data. Furthermore, the partition matrices displayed in this figure identify data points, which are mostly impacted by the global structure. This is a useful insight into the nature of the individual data, which helps pinpoint the elements, which are the least compatible with the global structure revealed at the higher level of the hierarchy.

Consensus building The most essential aspect of this process is concerned with the iterative process of forming consensus and its convergence. The results obtained when running the hybrid method (PSO-gradient option) are included in Fig. 11.
Fig. 11

Overall granularity obtained in successive iterations of the consensus-building process; results are shown for two selected values of \(\alpha \) = 0.3 and 0.8

The results of link analysis coming as a result of consensus formation are visualized in Fig. 12.
Fig. 12

Bounds of the linkage levels of the granular proximity matrix for produced as a result of consensus formation with \(\alpha \) = 0.3

It is also instructive to show how partition and proximity matrices changed once the consensus-building process has been completed; the results provided in the form of the gray scaled image are reported in Fig. 13.
Fig. 13

proximity matrix for D \(_{1}\) as a result of consensus-building process (\(\alpha \) = 0.3)

The plots shown in Fig. 14 are useful in flagging the elements whose membership grades change significantly (in comparison with other data points). They could be candidates for further examination as their changed cluster membership may stipulate characteristics that become revealed only in the presence of some external sources of knowledge.
Fig. 14

Partition matrix obtained for D \(_{1}\) as a result of consensus-building process (\(\alpha \) = 0.3)

Experiments with real-world data

We study here one of the publicly available real-world data (https://archive.ics.uci.edu/ml/datasets/Breast+Tissue), namely Breast tissues. This data set is concerned with electrical impedance measurements of freshly excised tissue samples from the breast. The features include the following:
  1. 1.

    I0 Impedivity (ohm) at zero frequency

     
  2. 2.

    PA500 phase angle at 500 KHz

     
  3. 3.

    HFS high-frequency slope of phase angle

     
  4. 4.

    DA impedance distance between spectral ends

     
  5. 5.

    AREA area under spectrum

     
  6. 6.

    A/DA area normalized by DA

     
  7. 7.

    MAX IP maximum of the spectrum

     
  8. 8.

    DR distance between I0 and real part of the maximum frequency point

     
  9. 9.

    P length of the spectral curve

     
For the purpose of this experiment, we split the data into four data sets by choosing several subsets of features in which the features are naturally related; each set is composed from 106 samples with the following subsets of features
  • D \(_{1}\): 1, 7

  • D \(_{2}\): 4, 8

  • D \(_{3}\): 2, 3, 9

  • D \(_{4}\): 5, 6

Proceeding with the fuzzy clustering realized with the use of the FCM algorithm, we determine local structures. The number of clusters is determined by inspecting the behavior of the minimized objective function being treated as a function of “\(c\)” and determining a “knee” point of the curve; refer to Fig. 15.
Fig. 15

Objective functions produced when clustering individual locally available data

By inspecting the plots of the obtained objective function, we can choose a suitable value of the number of clusters, it is the one at which a knee point of the relationship is visible. Adhering to this visual criterion, we select the numbers of clusters reported in Table 2.
Table 2

Number of clusters based on the inspection of the objective function

 

D \(_{1}\)

D \(_{2}\)

D \(_{3}\)

D \(_{4}\)

\(c\)

5

7

5

4

The partition and proximity matrices are visualized in Figs. 16 and 17.
Fig. 16

Partition matrices obtained for the local data sets

Fig. 17

Proximity matrices formed for the locally formed partition matrices

Considering \(\alpha \) = 0.4 when running the principle of justifiable granularity, we obtain the results shown in Fig. 18.
Fig. 18

Bounds of the linkage levels of the granular proximity matrix for \(\alpha \) = 0.4 for public set

The result reported in terms of overall granularity, partition matrices, proximity matrices and linkage associated with the granular proximity matrix are shown in Figs. 19, 20, 21 and 22.
Fig. 19

Overall granularity reported in successive iterations; \(\alpha \) = 0.4

Fig. 20

Partition matrices as a result of consensus formation

Fig. 21

Proximity matrices for partition matrices in Fig. 19, as result of consensus formation

Fig. 22

Bounds of the linkage levels of the granular proximity matrix for \(\alpha \) = 0.4 for public set, as a result of consensus formation

6 Conclusions

In this study, we have conceptualized, developed the algorithmic setting, and experimented with granular proximity matrices. It has been demonstrated that granularity of these matrices plays an important role in the realization of collaborative processes of forming views at the global structures not only facilitating this process, but also quantifying the diversity of locally available structures through the associated level of information granules of the granular proximity matrix. The guidance offered by global granular proximity matrices is an example of a realization of a structural feedback loop which augments the clustering processes by auxiliary sources of knowledge.

There are two open directions, which are worth further investigations:

Formation of structures exhibiting a higher type of granularity Higher level structures such as granular\(^{2}\) proximity matrices (if a hierarchy having three levels is present) can be discussed.

Exploration of various formal ways of realizations of granular proximity matrices. While in this study, we are concerned with interval-valued proximity matrices (and this has been done for illustrative purposes), detailed considerations could involve other formalisms such as, e.g., fuzzy sets, rough sets and shadowed sets.

Notes

Acknowledgments

This study was funded by King Abdulaziz University (KAU), under Grant No. (4-135-1434/HiCi). The authors, therefore, acknowledge technical and financial support of KAU.

References

  1. Apolloni B, Brega A, Malchiodi D, Palmas G, Zanaboni AM (2006) Learning rule representations from data. IEEE Trans Syst, Man Cybern, Part A: Syst Humans 36(5):1010–1028CrossRefGoogle Scholar
  2. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New YorkCrossRefzbMATHGoogle Scholar
  3. Coppi R, D’Urso P, Giordani P (2010) A fuzzy clustering model for multivariate spatial time series. J Classif 27:54–88CrossRefMathSciNetGoogle Scholar
  4. Corsini P, Lazzerini B, Marcelloni F (2005) A new fuzzy relational clustering algorithm based on the fuzzy C-means algorithm. Soft Comput 9:439–447CrossRefGoogle Scholar
  5. de Souza RMCR, de Carvalho F (2004) Clustering of interval data based on city-block distances. Pattern Recognit Lett 25(3):353–365CrossRefGoogle Scholar
  6. Gacek A, Pedrycz W (2013) Description, analysis, and classification of biomedical signals: a computational intelligence approach. Soft Comput 17:1659–1671Google Scholar
  7. Graves D, Noppen J, Pedrycz W (2012) Clustering with proximity knowledge and relational knowledge. Pattern Recognit 45(7):2633–2644Google Scholar
  8. Hathaway R, Bezdek JC, Pedrycz W (1996) A parametric model for fusing heterogeneous fuzzy data. IEEE Trans Fuzzy Syst 4:270–281 Google Scholar
  9. Hwang C, Rhee FC-H (2007) Uncertain fuzzy clustering: interval type-2 fuzzy approach to C-Means. IEEE Trans Fuzzy Syst 15(1):107–120CrossRefGoogle Scholar
  10. Mali K, Mitra S (2003) Clustering and its validation in a symbolic framework. Pattern Recognit Lett 24(14):2367–2376CrossRefzbMATHGoogle Scholar
  11. Pedrycz W, Bezdek JC, Hathaway RJ, Rogers GW (1998) A non- parametric model for fusing heterogeneous data. IEEE Trans Fuzzy Syst 6:411–425Google Scholar
  12. Pedrycz W (2013) Granular computing: analysis and design of intelligent systems. CRC Press/Francis Taylor, Boca RatonCrossRefGoogle Scholar
  13. Pedrycz W, Rai P (2008) Collaborative clustering with the use of Fuzzy C-Means and its quantification. Fuzzy Sets Syst 15:2399–2427CrossRefMathSciNetGoogle Scholar
  14. Pedrycz W (2007) Granular computing—the emerging paradigm. J Uncertain Syst 1(1):38–61Google Scholar
  15. Pedrycz W, Loia V, Senatore S (2004) P-FCM: a proximity-based fuzzy clustering. Fuzzy Sets Syst 148(1):21–41CrossRefzbMATHMathSciNetGoogle Scholar
  16. Pedrycz W (2005) Knowledge-based fuzzy clustering. John Wiley, New YorkCrossRefGoogle Scholar
  17. Pedrycz W (2004) Fuzzy clustering with a knowledge-based guidance. Pattern Recognit Lett 25(4):469–480CrossRefMathSciNetGoogle Scholar
  18. Peters G (2011) Granular box regression. IEEE Trans Fuzzy Syst 19(6):1141–1152CrossRefGoogle Scholar
  19. Wong H, Hu BQ (2013) Application of interval clustering approach to water quality evaluation. J Hydrol 491:1–12Google Scholar
  20. Zhang L, Pedrycz W, Lu W, Liu X, Zhang L (2014) An interval weighed fuzzy c-means clustering by genetically guided alternating optimization. Expert Syst Appl 41(13):5960–5971CrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Authors and Affiliations

  • Witold Pedrycz
    • 1
    • 2
    • 3
    Email author
  • Rami Al-Hmouz
    • 2
  • Ali Morfeq
    • 2
  • Abdullah Saeed Balamash
    • 2
  1. 1.Department of Electrical and Computer EngineeringUniversity of AlbertaEdmontonCanada
  2. 2.Department of Electrical and Computer Engineering, Faculty of EngineeringKing Abdulaziz UniversityJeddahSaudi Arabia
  3. 3.Systems Research InstitutePolish Academy of SciencesWarsawPoland

Personalised recommendations