1 Introduction

Cluster analysis belongs to the field of multivariate statistical analysis and is an important branch of unsupervised pattern classification in statistical pattern recognition. Its task is to divide an unlabeled sample set into several subsets according to certain criteria with the target of grouping similar samples into one class and dissimilar samples in different classes. This analysis method can quantitatively determine the relationship between research objects, so as to achieve reasonable classification and analysis. Researchers have proposed many clustering algorithms from different aspects, such as partition-based methods, hierarchical-based methods, grid-based methods, etc [1,2,3]. As a powerful auxiliary tool, cluster analysis technology has played an important role in scientific research, social services, and other fields [4,5,6].

Given a dataset, different clustering algorithms, or even the same algorithm with different initializations and parameters, can lead to different clustering results. However, without prior knowledge, it is difficult to decide which algorithm is suitable for a given clustering task. Even if a clustering algorithm is given, it is still difficult to find suitable parameters for it [7, 8]. Different clusters produced by different algorithms may reflect different views of the data. Using complementary and rich information in multiple clusters, the clustering ensemble techniques have been widely used in data clustering and attracted increasing attentions in recent years [9,10,11,12,13,14,15,16,17,18,19]. For the first time, Strehl and Ghosh [9] proposed the clustering ensemble method whose purpose is to combine multiple clusters to obtain potentially better and more robust clustering results. This method can perform better in discovering singular clusters, dealing with noise and integrating clustering solutions from multiple distributed sources.

The process of clustering ensemble is shown in Fig. 1. First, M clustering results are obtained by running the M clustering algorithm. Each clustering result is regarded as a clustering member or a partition of clustering ensemble. The set of all clustering results is denoting \(P=\left\{ P^{(1)},\ldots , P^{(M)}\right\} \), this step is called clustering member generation. Then P is taken as input, they are combined, and the final clustering result is output, this step is called clustering composition/ensemble/fusion, also known as consensus function design. In terms of clustering member generation, there are usually three methods: (1) Use the same clustering algorithm. Since the randomly initialized K-means will get different clustering results each time it is run, clustering members can be generated through multiple runs. (2) Cluster different data subsets, such as random projection, projection to different subspaces, different sampling techniques, selection of different feature subsets, etc. (3) Adopt different number of clusters. For example, setting multiple different values of k or randomly selecting k in a specified interval. In the design of consensus function, since object labels are unknown in the process of cluster analysis, there is no explicit correspondence between cluster labels obtained by different cluster members. In addition, cluster members may contain different number of clusters, which makes the cluster label mapping problem very challenging. According to whether to explicitly solve the problem of cluster label correspondence, clustering ensemble methods can be divided into the following two categories: (1) Pair-wise approach, introducing the adjacency matrix H of hypergraph will represent the pair-wise relationship between objects, effectively avoiding the problem of cluster label correspondence. (2) Re-labeling approach, including cumulative voting, alternative voting, etc.

Fig. 1
figure 1

The general framework for clustering ensemble

The quality of the basic clusterings plays a crucial role in the consistency of the clustering process. To improve consensus performance, researchers have attempted to evaluate base clusterings and assign them with different weights. These methods with weighting partitions do increase the accuracy of clustering ensemble to a certain extent [20,21,22,23, 28]. However, they are developed based on an implicit assumption that all of the clusters in the same base clustering have the same reliability, which cannot reflect the realistic situations of real-world datasets. They typically treat each base clustering as an individual and assign a global weight to each base clustering regardless of the diversity of the clusters inside it. To address this deficiency, references [29,30,31,32, 34] further consider the local diversity of ensembles and deal with the different reliability of clusters. Moreover, recent studies [36, 37] have shown that point can change its neighbors in different partitions and different points have different relationship stability. This difference shows that points may have different contributions to the detection of underlying data structure. Generally, a partition is made up of one or more clusters and a cluster is made up of one or more points. Therefore, the quality of the cluster directly affects the quality of the partition and the quality of the points directly affects the quality of the cluster. Therefore, there is a need to comprehensively consider the importance of points, clusters and partitions. However, very few of existing studies address the problem of how to uniformly weight these three layers together.

Aiming at the above problem, this paper proposes a three-layer weighted clustering ensemble method named PCPA based on three factors, i.e., point, cluster and partition. Firstly, PCPA adopts k-means algorithm to generate the base clusterings by iteratively running it for M times. Secondly, the base clusterings matrix is transformed into a hypergraph adjacency matrix H, and the CA matrix is fine-tuned by three-level weighted of point, cluster and partition. Finally, the average link (AL) method is used to obtain the consensus clustering.

For clarity, the contributions of this paper can be summarized as follows.

  • A clustering ensemble method, PCPA, based on point-cluster-partition three-layer weighted is proposed. To the best of our knowledge, PCPA is a three-layer weighted architecture by comprehensively consider the three factors for the first time.

  • The weighted effect of PCPA is compared with the following seven conditions: unweighted, point weighted, cluster weighted, partition weighted, point-cluster weighted, point-partition weighted and cluster-partition weighted. The results show that compared with the other 7 cases, the effect of three-layer weighting is always better than the effect of unweighted, and the ranking has little fluctuation, which proves that three-layer weighted has good stability.

  • The PCPA is compared with the more authoritative and popular weighting methods in recent years. The experimental results show that the accuracy of the proposed PCPA method is highter than seven other methods, which ranks the top two under the three commonly used evaluation indicators.

2 Related Work

Currently, different types of clustering ensemble methods have been proposed according to different applications. Representative methods include pairwise similarity methods, graph-based methods, relabeling-based methods, and feature-based methods [17]. However, a common limitation of most existing methods is that they generally treat all clusters and all base clusterings equally in clustering ensemble, and lower-quality clusters or lower-quality base clusterings may appear.

In order to avoid the influence of low-quality base clusterings, researchers have carried out some work. Yu et al. [19] focused on the selection of the base clusterings in clustering ensemble, and selected only some of the base clusterings from the base clustering set for integration according to the evaluation index. Yang et al. [20] combined some different clustering evaluation indicators to give a weight to the base clustering to obtain a better result. Huang et al. [21] used the normalized mutual information (NMI) [9] to measure the similarity between partitions firstly. Then they used them as the weights to weight the partitions and obtained the weighted CA matrix. Finally, three weighted evidence accumulation clustering (WEAC) methods based on hierarchical link were proposed: WEAC-AL, WEAC-CL, and WEAC-SL. Rouba et al. [22] used Rand index to measure the similarity between partitions and designed weights based on that. Then, three clustering algorithms (hierarchic clustering, k-means, k-medoids) was used to get the consensus results. Bai et al. [23] calculated the similarity between classes by using information entropy, and used it as a weight for the base clustering weighted. Song et al. [24] proposed a weighted ensemble Sparse Latent representation (subtype-WESLR) to detect cancer subtypes in heterogeneous omics data. This method used the weighted set strategy to fuse the base clusters obtained by different methods as prior knowledge, and the weights could be applied to different base clusters adaptively. Huang et al. [25] proposed a novel multidiversified ensemble clustering approach, which used an entropy-based cluster validity strategy to evaluate and weight each base clustering by considering the distribution of clusters in the entire ensemble. Wan et al. [26] proposed a short-text cluster integration method based on convolutional neural networks. The Gini coefficient was used to measure the reliability of base clusters, and then weighted them. Finally, hierarchical clustering was used for integration. Banerjee et al. [27] devised a polynomial heuristics that judiciously selected a subset of clusterings from the ensemble that contributed positively in forming the consensus to yield a quality consensus clustering.

However, most of these methods treat each base clustering as a whole and assign the weight to each base clustering without considering the diversity of its internal clusters. Iam-On et al. [28] presented a new link-based approach to improve the conventional matrix. It achieves that by using the similarity between clusters that are estimated from a link network model of the ensemble. And then three new link-based algorithms were proposed for the underlying similarity assessment. The final clustering result was generated from the refined matrix using two different consensus functions of feature-based and graph-based partitioning. Experimental results shows that weighted connected-triple (WCT) is more effective than the others. Huang et al. [29] introduced the concept of information entropy to calculate the uncertainty of each cluster. By calculating the uncertainty of each cluster under all base partitions, an ensemble-driven cluster index (ECI) was constructed. Then they used ECI as the weight value to weight the original CA matrix, and then integrated. Vo and Nguyen [30] first calculated the ratios of the distance from the points to the center of the cluster and the maximum distance within the cluster, and then used this as the weight to obtain a weighted object-cluster association-based (WOCA) matrix. The final clustering was finally derived by executing k-means on WOCA matrix. Rashidi et al. [31] evaluated a cluster undependability based on an information theoretic measure. Then they proposed two approaches: a cluster-wise weighted evidence accumulation and a cluster-wise weighted graph partitioning. Najafi et al. [32] obtained each cluster’s dependability by calculating the entropy and an exponential transform, which represented amount of the cluster’s distribution across various clusters of a partition in a reference set. Then two weight calculation methods were proposed: the first dependability measure (FDM) and the second dependability measure (SDM). Finally, they obtained the cluster weighed CA matrix based on FDM or SDM, and AL was used to get the consensus clustering. They call those methods ALFDM and ALSDM respectively, and experimental results show that ALSDM is superior to ALFDM. Shen et al. [33] proposed a text clustering ensemble method based on entropy criteria, which was used to evaluate the uncertainty of clusters. Two indexes were proposed according to the uncertainty of clusters, and then high-quality base clusterings were selected for integration.

In recent years, some scholars have tried to combine cluster weighting and partition weighting. Banerjee et al. [34] explored to derive a cluster-level weight from the principles of both agreement and disagreement of clusters to define reliable clusters in the ensemble. And then computed the weight of the partition by accumulating the proposed cluster-level weights of the constituent clusters. Zhang et al. [35] proposed a two-stage semi-supervised clustering ensemble framework which considerd both ensemble member selection and the weighting of clusters. Experimental results on various datasets showed that this framework outperforms most of clustering algorithms. However, the above methods still have a drawback, that is, all samples in the same cluster are given the same weight. It is to say that each sample in the same cluster is treated equally. Obviously, if on this way, the contribution of the sample cannot be accurately evaluated. Zhong et al. [36] first weights the points. The weights of points in the same cluster are related to their Euclidean distance and the distance between the two farthest points in the cluster. Then the normalized stability of each cluster is obtained by solving the average weights of all points in each cluster. Finally, the values of elements in CA matrix are defined as the product of the weights of points and the normalized stability of their clusters. Ren et al. [37] obtained a CA matrix by calculating the basic clusterings, and then the CA matrix was used to describe the difficulty of clustering around each sample and assigned corresponding weights. The idea of its weight setting is to use the idea of Boosting to assign greater weights to those samples that are difficult to divide. Then he presented three algorithms: weighted-object meta-clustering algorithm (WOMC), weighted-object similarity partitioning algorithm (WOSP), and weighted-object hybrid bipartite graph partitioning algorithm (WOHB). Li et al. [38] determined cluster centers by calculating the stability of samples, assigned sample points to clusters with the highest similarity, and finally used the single link algorithm for integration to verify its effectiveness on multiple datasets containing text datasets. Niu et al. [39] proposed a novel multi-view ensemble clustering approach using joint affinity matrix, which basic partitions were described with a sample-level weight, and the influence of incorrectly-partitioned data objects was decreased, whereby data objects could be effectively assigned to the correct partition.

Although people have proven the importance of weighting from various aspects, most people focus on weighting a certain aspect in the clustering ensemble process, such as point weighted or cluster weighted, there is still a lack of a three-layer combination of point, cluster, and partition. In this paper, we propose a weighted method based on three layers of point, cluster, and partition. Extensive experiments on various datasets show that our method (PCPA) has significant advantages in clustering accuracy and stability.

3 PCPA Weighted Method

The proposed approach first transforms the base clusterings into a hypergraph adjacency matrix \({{\varvec{H}}}\). The matrix \({{\varvec{H}}}\in {(0,1)}^{n\times k_l}\), summarizes the cluster-point relations occurring in the ensemble [40], in which n and \(k_l\) denote the number of points and clusters in the ensemble respectively, and \(k_l=k_1+k_2+\cdots +k_m+\cdots +k_M\), where \(k_m\) is the number of clusters in the m-th partition, M is the total number of partitions. Denoted by \(x_i (1\le i\le n)\) and \(C_l (1\le l\le k_l)\) i-th data point in the tested dataset and the l-th cluster in the ensemble. If \(x_i\) belongs to cluster \(C_l\), then \({{\varvec{H}}}(i,l)\)=1, otherwise \({{\varvec{H}}}(i,l)\)=0. The point-cluster-partition three-layer weighted and weights transfer are all carried out on the basis of \({{\varvec{H}}}\). Then, the weighted adjacency matrix of hypergraph is transformed into the CA matrix. Finally, the consensus clustering is obtained by AL. In this section, we will show how to set the values of three layers of weighting. The weight setting scheme, the weight value transmission scheme layer by layer, the method that converts the weighted \({{\varvec{H}}}\) into a weighted CA matrix, the process of the proposed PCPA algorithm, and complexity analysis are as follows.

3.1 Weights Setting Scheme

3.1.1 Point-Layer Weights Setting

The basic idea of Boosting technology [37] is to focus on the sample points that are difficult to be divided in the process of clustering ensemble. Motivated by the successful applications of the technology in classification and clustering, this paper assigns higher weights to these points so that they have a higher priority to be divided in subsequent clustering integration. The values of the weights are calculated by a CA matrix, which are constructed by base clustering results. The detailed implementation processes can be given by Eqs. (1)–(7).

First, construct the CA matrix A:

$$\begin{aligned} A={a_{ij}}_{n\times n}, \end{aligned}$$
(1)

where \(a_{ij}\) can be calculated by

$$\begin{aligned} a_{ij}=\frac{1}{M}\sum _{m=1}^{M}\delta _{ij}^m, \end{aligned}$$
(2)

in which \(\delta _{ij}^m\) is shown in Eq. (3).

$$\begin{aligned} \delta _{ij}^m=\left\{ \begin{aligned} 1,&\quad \text{ if }\quad P^{(m)}(x_i)=P^{(m)}(x_j) \\ 0,&\quad \text{ otherwise } \\ \end{aligned} \right. , \end{aligned}$$
(3)

where \(P^{(m)}\left( x_i\right) \) denotes that, in the m-th \((1\le m\le M)\) partition, the cluster that \(x_i\) belongs to.

When \(a_{ij}=0\) or \(a_{ij}=1\), each basic clustering result has a high consensus on the division of \(x_{i}\) and \(x_{j}\). On the contrary, when \(a_{ij}\)=0.5, it is difficult to divide them. In this paper, a quadratic function \(y(x) = x(1-x)\), \(x \in [0, 1]\) is used to solve this problem. The uncertainty of \(x_{j} \)and \(x_{j}\) can be defined as:

$$\begin{aligned} confusion\left( x_i,x_j\right) =a_{ij}\left( 1-a_{ij}\right) . \end{aligned}$$
(4)

When \(a_{ij}= 0.5\), the confusion index reaches the maximum value 0.25. When \(a_{ij}= 0\) or \(a_{ij}= 1\), confusion index reaches the minimum value 0, so \(confusion(x_i,x_j)\in [0, 0.25]\). The larger the \(confusion(x_i,x_j)\) is, the more difficult it is to divide samples \(x_i\) and \(x_j\). Then, the confusion is used to calculate the weight of each point:

$$\begin{aligned} w_i^{\prime \prime }=\frac{4}{n}{\sum _{j=1}^{n}} confusion\left( x_i,x_j\right) , \end{aligned}$$
(5)

where \(\frac{4}{n}\) is a normalization factor to ensure \(w_i^{\prime \prime }\in \)[0, 1].

We can see that the value range of \(w_i^{\prime \prime }\) contains 0. In the probability model, the weight often represents the probability value, which should not be completely zero, because even a very unlikely event should have a non-zero probability to ensure the comprehensiveness of the model and the integrity of the probability. Also, in this algorithm, a weight of 0 will completely ignore some samples or features, which is not the intent of this algorithm design. Moreover, in subsequent normalization calculations, a weight of 0 will result in a denominator of zero, resulting in a result that is either infinitely large (Inf) or not a number (NaN). In view of the above considerations, we added a smoothing term:

$$\begin{aligned} w_i^\prime =\frac{w_i^{\prime \prime }+e}{1+e}, \end{aligned}$$
(6)

where e is a small positive number (according to [37], \(e=0.01\)).

After normalization, we have:

$$\begin{aligned} w_i=w_i^\prime /\sum _{i=1}^{n}w_i^\prime . \end{aligned}$$
(7)

3.1.2 Cluster-Layer Weights Setting

According to [29], this paper uses ECI as the weight of the cluster.

Given the ensemble \(\Pi \), the uncertainty of the l-th cluster \(C_l\) w.r.t. the base clustering \(P^{(m)} \in \Pi \) can be computed as

$$\begin{aligned} H^m\left( C_l\right) =-\sum _{s=1}^{k_m}{p\left( C_l,C_s^m\right) \log _2{p\left( C_l,C_s^m\right) }} \end{aligned}$$
(8)

with

$$\begin{aligned} p\left( C_l,C_s^m\right) =\frac{|C_l\cap C_s^m|}{|C_l |}, \end{aligned}$$
(9)

where \(C_s^m\) is the s-th cluster in the m-th partition, \( 1\le m \le M\), \(1\le s \le k_m\), \(\cap \) computes the intersection of two sets (or clusters), and \(|C_l |\) outputs the number of objects in \(C_l\).

Thus, the uncertainty of cluster \(C_l\) w.r.t. the entire ensemble \(\pi \) can be given by

$$\begin{aligned} H^\Pi \left( C_l\right) =\sum _{j=1}^{M}{H^j\left( C_l\right) }. \end{aligned}$$
(10)

Given an ensemble \(\pi \) with M base clusterings, the ensemble-driven cluster index (ECI) for a cluster \(C_l\) can be defined as

$$\begin{aligned} ECI\left( C_l\right) =e^{-\frac{H^\Pi \left( C_l\right) }{\theta \bullet M}}. \end{aligned}$$
(11)

After normalization, we get:

$$\begin{aligned} u_l=ECI\left( C_l\right) /\sum _{l=1}^{k_l}{ECI\left( C_l\right) }. \end{aligned}$$
(12)

3.1.3 Partition-Layer Weights Setting

The NMI value can effectively measure the similarity degree of clustering members. Obviously, the higher the similarity degree of \(P^{(m)}\) is, the greater its weight is. Therefore, this paper sets the weight \({v_m}^\prime \) of partition \(P^{(m)}\) to be in proportion to the sum of the NMI of \(P^{(m)}\) and other partitions. Considering that a partition is composed of multiple clusters, this paper further sets the weight \({v_m}^\prime \) of \(P^{(m)}\) to be in proportion to the sum of weights of all clusters contained in \(P^{(m)}\). That is, the weight of m-th partition is:

$$\begin{aligned} {v_m}^\prime =\left( \sum _{l} u_l^m\right) \times \sum _{q\ne m} NMI\left( P^{\left( m\right) },P^{\left( q\right) }\right) . \end{aligned}$$
(13)

After normalization, we get:

$$\begin{aligned} v_m={v_m}^\prime /\sum _{m=1}^{M}{v_m}^\prime . \end{aligned}$$
(14)

3.2 The Method of Transferring Weights Layer by Layer

Point-cluster-partition three-layer weighted (PCPTLW) architecture and layer-by-layer weighting process are shown in Fig. 2.

Fig. 2
figure 2

PCPTLW architecture and layer-by-layer weighting process

3.2.1 Weight the Point-Layer

Multiply the i-th row \((1 \le i \le n)\) of \({{\varvec{H}}}\) by the weight \(w_i\) of the point \(x_i\) to get the point layer weighted matrix:

$$\begin{aligned} H_{pt}=W\times H \end{aligned}$$
(15)

with

$$\begin{aligned} W=diag\left( w_1,\ldots ,w_n\right) . \end{aligned}$$
(16)

3.2.2 Weight the Cluster-Layer

Multiply the column corresponding to the l-th cluster \(C_l\) in \({{{\varvec{H}}}}_{{{\varvec{pt}}}}\) by its weight \(u_l(1\le l\le k_l)\) to get the cluster layer weighted matrix:

$$\begin{aligned} H_{cr}=H_{pt}\times U \end{aligned}$$
(17)

with

$$\begin{aligned} U=diag\left( u_1,\ldots ,u_{k_l}\right) . \end{aligned}$$
(18)

3.2.3 Weight the Partition-Layer

Multiply the submatrix corresponding to the m-th partition \(P^{(m)}\) in \({{{\varvec{H}}}}_{{{\varvec{cr}}}}\) by its weight \(v_m\) to obtain the partition layer weighted matrix:

$$\begin{aligned} H_{pn}=H_{cr}\times V \end{aligned}$$
(19)

with

$$\begin{aligned} V=diag\left( v_1,\ldots ,v_1,\ldots {,v}_m,\ldots {,v}_m,\ldots ,v_M,\ldots {,v}_M\right) , \end{aligned}$$
(20)

where the number of \(v_1\), \(v_m\), and \(v_M\) are \(k_1\), \(k_m\), and \(k_M\) in Eq. (20).

3.3 Convert \({{{\varvec{H}}}}_{{{\varvec{pn}}}}\) to a CA Matrix

After obtaining the \({{{\varvec{H}}}}_{{{\varvec{pn}}}}\) matrix, we transform it into a weighted point-cluster-partition CA (PCPCA) matrix. The calculation of PCPCA is as follows

$$\begin{aligned} PCPCA=\frac{H_{pn}\times {H_{pn}}^T}{M}, \end{aligned}$$
(21)

where \({H_{pn}}^T\) represents the transposed matrix of \(H_{pn}\).

3.4 The Algorithm PCPA Process

For clarity, the overall algorithm of PCPA is summarized as followed

The algorithm of PCPA:

Input: Dataset \(D=\left\{ x_1,x_2,\ldots ,x_n\right\} \), the true label of sample points and the number of true categories \(k^*\).

  1. 1.

    Generate M base clusterings:

    • for i=1: M (in this paper, M =100)

    • run the k-means algorithm, the cluster number \(k=[{2,\ 2k}^*]\)

    • end

  2. 2.

    Convert the base clusterings matrix into a hypergraph adjacency matrix \({{\varvec{H}}}\).

  3. 3.
    1. (a)

      Compute the points weights according to Eqs. (1)–(7);

    2. (b)

      Compute the clusters weights according to Eqs. (8)–(12);

    3. (c)

      Compute the partitions weights according to Eqs. (13) and (14).

  4. 4.
    1. (a)

      Weight point layer according to Eqs. (15) and (16);

    2. (b)

      Weight cluster layer according to Eqs. (17) and (18);

    3. (c)

      Weight partition layer according to Eqs. (19) and (20).

  5. 5.

    Convert the weighted hypergraph adjacency matrix \({{{\varvec{H}}}}_{{{{\varvec{pn}}}}}\) into a CA matrix according to Eq. (21).

  6. 6.

    Run AL to get a consistent partition \(\pi ^*\), and the cluster number is \(k^*\).

Output: The consensus clustering \(\pi ^*\).

3.5 Complexity Analysis

In the above algorithm flow, the first step is to run the k-means algorithm M times, and its time complexity is O(Mkdn), where K is the number of clusters, d is the dimension of samples, and n is the number of samples. In step 2, the time complexity of generating hypergraph adjacency matrix H is O(Mkn). From step 3 to step 4, the time complexity weighted by each layer is O(\(\textit{n}{} \textit{C}_{\textit{K}}\)), where \(\textit{C}_{\textit{K}}\) is the total number of clusters in M partitions, namely the number of edges of the hypergraph. In step 5, the time complexity of transforming \({{{\varvec{H}}}}_{{{{\varvec{pn}}}}}\) into CA matrix is O(\(\textit{C}_{\textit{K}}\times n^2\)). In step 6, the time complexity of AL algorithm is O(\(\log _2n\)). That is, steps 1–4 of the algorithm in this paper are linear, step 5 is square, and step 6 is logarithmic. In addition, step 5 is to construct the similarity matrix. This method can obtain better clustering results, but the complexity is high, that is, this method is suitable for small and medium-sized data sets. For massive datasets, MLRAA, DLRSE, K-means and other algorithms can be directly run on \({{{\varvec{H}}}}_{{{{\varvec{pn}}}}}\) to further improve operation efficiency.

4 Experiments

The experimental platform is an AMD Ryzen 7 5800H 8-core 16-thread processor, the frequency is 3.20 GHz, the memory is 16.00 GB, the graphics card is an NVIDIA GeForce RTX 3060 Laptop GPU, and the program runs under MATLAB2020a.

4.1 Datasets and Evaluation Methods

In this section, we conduct experiments on 11 datasets, namely, Ecoli, Libras Movement Data Set (LM), semeion, satimage, splice, optdigits (ODR), zoo, tr11, tr12, tr31, and tr45. The first seven ones are from the UCI machine learning repository (http://archive.ics.uci.edu/ml). The last four ones are from the Text REtrieval Conference (TREC, http://trec.nist.gov) collection. The details of the datasets are shown in Table 1.

Table 1 Description of the datasets

In accordance with some popular methods [21, 29, 32], this paper adopts the following three performance metrics to evaluate the efficiency of the proposed PCPA algorithm, namely normalized mutual information (NMI), adjusted rand index (ARI) and F measure (F).

The NMI measure provides a sound indication of the shared information between two clusterings. Let \(P^\prime \) be the test clustering and \(P^G\) the ground-truth clustering. The NMI score of \(P^\prime \) w.r.t. \(P^G\) is defined as follows:

$$\begin{aligned} NMI\left( P^\prime \!,\!\ P^G\right) \!=\!\frac{\sum _{i=1}^{n^\prime }\sum _{j=1}^{n^G}{n_{ij}log{\frac{n_{ij}n}{n_i^\prime n_j^G}}}}{\sqrt{\sum _{i=1}^{n^\prime }{n^\prime }_ilog{\frac{n_i^\prime }{n}}\sum _{j=1}^{n^G}{n_j^Glog{\frac{n_j^G}{n}}}}},\end{aligned}$$
(22)

where \(n^\prime \) is the number of clusters in \(P^\prime \), \(n^G\) is the number of clusters in \(P^G\), \(n_i^\prime \) is the number of objects in the i-th cluster of \(P^\prime \), \(n_j^G\) is the number of objects in the j-th cluster of \(P^G\), and \(n_{ij}\) is the number of common objects shared by cluster i in \(P^\prime \) and cluster j in \(P^G\). It can be seen from the formula that the higher the NMI, the better the effect.

The ARI is a generalization of the rand index (RI), which is computed by considering the number of pairs of objects on which two clusterings agree or disagree. Specifically, the ARI score of \(P^\prime \) w.r.t. \(P^G\) is computed as follows:

$$\begin{aligned} ARI\left( P^\prime ,P^G\right) =\frac{2\left( N_{00}N_{11}-N_{01}N_{10}\right) }{\left( N_{00}+N_{01}\right) \left( N_{01}+N_{11}\right) +\left( N_{00}+N_{10}\right) \left( N_{10}+N_{11}\right) ^\prime }, \end{aligned}$$
(23)

where \(N_{11}\) is the number of object pairs that appear in the same cluster in both \(P^\prime \) and \(P^G\), \(N_{00}\) is the number of object pairs that appear in different clusters in \(P^\prime \) and \(P^G\), \(N_{10}\) is the number of object pairs that appear in the same cluster in \(P^\prime \) but in different clusters in \(P^G\), and \(N_{01}\) is the number of object pairs that appear in different clusters in \(P^\prime \) but in the same cluster in \(P^G\). The value range of ARI is [-1, 1], and the larger the value, the better the integration effect.

F measure is an indicator used in statistics to measure the accuracy of the binary classification model. It takes into account both the precision and recall of the classification model. Its calculation formula is as follows:

$$\begin{aligned} F=2\bullet \frac{precision\bullet r e c a l l}{precision+recall}, \end{aligned}$$
(24)

where the precision refers to the proportion of samples with a predicted value of 1 and a true value of 1 in all samples with a predicted value of 1. The recall refers to the proportion of samples with a predicted value of 1 and a true value of 1 among all samples with a true value of 1. The maximum value of F is 1, the minimum value is 0, and the larger the value, the better.

In the experiment, the number of base clusterings M is set as 100, in which each clusteringsis generated by randomly running of k-means. In order to make base clusterings more diverse, we set the number of cluster k in the range of \([2,\ 2k^*]\), and \(k^*\) is the true cluster number of the dataset.

4.2 Comparison of Before and After Three-Layer Weighted

In order to clearly understand the weighting effect, we divided the three-layer weighted process into the following 7 aspects: point weighted, cluster weighted, partition weighted, point-cluster weighted, point-partition weighted, cluster-partition weighted, and point-cluster-partition (PCP) three layers weighted (see Table 2). In order to ensure fairness and impartiality, all the experiments are conducted on the basis of the same base clustering, and all the presented results are the averaged values of 10 runs. The bold symbols indicate that the effect is improved after three layers weighted, and the numbers with underlines indicates the highest score among the 7 cases. The last column, percentage increase, indicates the percentage of the improvement of PCP compared with the Unweighted, which can be calculated by the difference between the values of PCP and Unweighted dividing by the value of Unweighted.

Table 2 Comparison of before and after three-layer weighted

It can be seen from Table 2 that after three-layer weighted, the evaluation indicators of all datasets have increased. Among them, the ARI evaluation index of dataset tr41 has the most significant improvement effect, which is 140%. Dataset splice shows the worst improvement in F evaluation index, which is 0.58%. And by the last column, we can figure out the average improvement of each evaluation index is 22.07%. Table 2 also shows that if one or two aspects of point, cluster, and partition are weighted, the weighted effect has a great relationship with the dataset itself, and different datasets are suitable for different weighted methods. But if the three aspects are weighted at the same time, the aspect with good weighted effect will make up for the deficiencies of other weighted aspects to a certain extent, and the finally result will be more stable. Therefore, it can be judged that the three-layer weighted effect is better than the unweighted effect, and the stability is higher than that of single weighted or selecting two aspects weighted. The comparison before and after three-layer weighted is shown in Fig. 3. From Fig. 3, we can see that except for LM, splice and tr31, the clustering effect of other datasets after three-layer weighted is significantly improved. ARI, F and NMI have different enhancement effects on the same dataset, and the same evaluation index has different enhancement effects on different datasets, which is related to both the dataset itself and the evaluation index. This also reflects the scientific nature of using multiple indexes to comprehensively evaluate the experimental effect in this paper.

Fig. 3
figure 3

The comparison before and after three-layer weighted

4.3 Compare with Other Weighted Methods

This method (PCPA) is compared with these seven weighted algorithms: ALSDM [32], WOMC [37], WOHB [37], WCT_KM [28], WEAC_AL [21], WEAC_CL [21], and LWEA [29]. The comparison results are shown in Table 3. The bold indicates the highest evaluation index value among the eight methods. Figure 4 more clearly shows the performance of each weighted method under the three evaluation indicators in each dataset. Tables 45 and 6 are the ranking tables of each weighted method under ARI, F, and NMI respectively. For each evaluation index, we also use Friedman test and Nemenyi test to verify whether this method is significantly different from other methods.

Table 3 Average performances over 10 runs by different weighted clustering ensemble methods
Fig. 4
figure 4

Results compared with other weighted methods

Table 4 The ranking (w.r.t. ARI) of different weighted clustering ensemble methods
Table 5 The ranking (w.r.t. F) of different weighted clustering ensemble methods
Table 6 The ranking (w.r.t. NMI) of different weighted clustering ensemble methods

From Table 3, we can see that except for the ARI index of LM, zoo, and tr45, F index of splice, and NMI index of tr11, PCPA always ranks first. The proportion of PCPA ranks first is 28/33. Among all datasets, PCPA algorithm has the most significant advantage on Ecoli, semeion, and tr12. For tr12, the ARI value of the PCPA is 0.3094, whereas for the other seven weighted algorithms, the highest value is 0.2248, which increased by 37.63%, the lowest value is 0.1017, which increased by 204.23%. And when the ranking of PCPA is not in the first place, the first-ranked method changes due to the dataset changes. In terms of the complexity of the algorithm, the time complexity of ALSDM, WOMC, WOHB and LWEA is \(O(n^2)\), the time complexity of WEAC_AL and WEAC_CL is \(O(M^{2}n^{2})\), and the time complexity of WCT_KM is \(O(n^3)\), where M is the number of base clusterings, n is the number of points. Except for WCT_KM algorithm, the time complexity of our algorithm and other algorithms is square order, so the proposed method does not sacrifice the algorithm speed while improving the clustering accuracy. Instead, as described in Sect. 3.5, in the face of large datasets, this three-layer weighted framework can directly run the K-means and other algorithms on the \(H_{pn}\) to further improve the algorithm speed.

Next, we provide the Friedman test that uses an algorithm for ARI score ranking. The Friedman test checks whether the measured average ranks in terms of ARI are significantly different from the mean rank \(R_j=4.5\) expected under the null-hypothesis: \(\chi _F^2=\frac{12\times 11}{8\times 9}({1.32}^2+{4.73}^2+{6.09}^2+{6.55}^2+{5.18}^2+{4.36}^2+{4.68}^2+{3.09}^2-\frac{8\times 9^2}{4})=35.5637,\) and \(F_F=\frac{10\times 35.5637}{11\times 7-35.5637}=8.5827\).

With 8 algorithms and 11 datasets, \(F_F\) is distributed according to the F distribution with \(8{-}1=7\) and \((8{-}1)\times (11{-}1)=70\) degrees of freedom. The critical value of F (7, 70) for \(\alpha =0.05\) is 2.143, so we reject the null-hypothesis.

Then, we use the Nemenyi test for pairwise comparisons. The critical value at \(q_{0.05}\) with 8 algorithms is 3.031 and the corresponding CD is \(3.031\times \sqrt{\frac{8\times 9}{6\times 11}}\)=3.1658. Therefore, we can identify that PCPA is significantly different from ALSDM, WOMC, WOHB, WCT_KM, and WEAC_CL. We cannot know which group WEAC_AL and LWEA belongs to. At \(p_{0.10}\), the critical value at \(q_{0.10}\) is 2.780 and the corresponding CD is \(2.780\times \sqrt{\frac{8\times 9}{6\times 11}}\)=2.9036. Therefore, we can identify that PCPA is significantly different from ALSDM, WOMC, WOHB, WCT_KM, WEAC_AL, and WEAC_CL. Now, we cannot tell which group LWEA belongs to because the experimental data is not enough to draw any conclusions.

We also use the ranks in terms of F index to compute the Friedman test of the algorithms, and reject the null-hypothesis because of the Friedman statistic \(\chi _F^2=\frac{12\times 11}{8\times 9}({1.09}^2+{4.50}^2+{5.91}^2+{7.00}^2+{5.27}^2+{3.82}^2+{4.91}^2+{3.50}^2-\frac{8\times 9^2}{4})=40.4976,\) and \(F_F=\frac{10\times 40.4936}{11\times 7-40.4936}=11.0933. \)

At p=0.05, the critical value at \(q_0.05\) is 3.031 and the corresponding CD is 3.1658. We can identify that PCPA is significantly different from ALSDM, WOMC, WOHB, WCT_KM, and WEAC_CL. We cannot tell which group WEAC_AL and LWEA belongs to. At \(p_{0.10}\), the critical value at \(q_{0.10}\) is 2.780 and the corresponding CD is 2.9036. However, we cannot tell which group WEAC_AL and LWEA belongs to because the experimental data is not enough to draw any conclusions.

Finally, we use the ranks in terms of NMI index to compute the Friedman test of the algorithms, and reject the null-hypothesis because the Friedman statistic \(\chi _F^2\)=\(\frac{12\times 11}{8\times 9}({1.09}^2+{5.18}^2+{6.27}^2+{7.36}^2+{4.55}^2+{3.55}^2+{4.64}^2+{3.36}^2-\frac{8\times {9}^2}{4})=46.9382\), and \(F_F=\frac{10\times 46.9382}{11\times 7-46.9382}=15.6289\).

At p=0.05, the critical value at \(q_{0.05}\) is 3.031 and the corresponding CD is 3.1658. We can identify that PCPA is significantly different from ALSDM, WOMC, WOHB, WCT_KM, and WEAC_CL. We cannot tell which group WEAC_AL and LWEA belongs to. At \(p_{0.10}\), the critical value at \(q_{0.10}\) is 2.780 and the corresponding CD is 2.9036. However, we cannot tell which group WEAC_AL and LWEA belongs to because the experimental data is not enough to draw any conclusions.

In general, PCPA is significantly different from ALSDM, WOMC, WOHB, WCT_KM, and WEAC_CL. From Tables 45 and 6, we can see that PCPA is different from LWEA and WEAC_AL. We can also see that different algorithms have different effects on different data sets, but in general, PCPA ranks the highest, indicating that three-layer weighting can significantly improve the accuracy of clustering algorithms compared with other weighting methods, and the improvement of PCPA accuracy is not at the expense of time complexity. Therefore, we can judge that PCPA has obvious advantages over other weighted algorithms.

5 Conclusion

This paper proposes a three-layer weighted strategy called a Point-Cluster-Partition Architecture (PCPA) for weighted clustering ensemble. By weighting points, clusters and partitions in the integration process, the experiments show that:

  • The integration effect after using three-layer weighted is better than the unweighted integration effect. And the effect of three-layer weighted is more stable than that of single weighted or selecting two weighted;

  • Comparing with other weighted methods, PCPA is more prominent in terms of accuracy and stability.

    The results of this paper also show that for different datasets, the effects of weighting different layers are also different. In recent years, many scholars have done a lot of research on the problem of sample unbalanced clustering [41,42,43,44]. Next, this paper will further improve the weight design scheme on these research, so that it can determine the weight adaptively according to the internal data structure characteristics of different datasets. And we hope the three-layer weighted clustering ensemble algorithm can play a good effect on more types of datasets as much as possible.