1 Introduction

Multi-view data can reflect the properties of different views of the data and typically comes from multiple domains or uses multiple feature extractors. Compared to single-view data, multi-view data can describe the attributes of objects more comprehensively, which helps to get better clustering performance. For example, facial image features, voice features, and fingerprint features of different people can make up multi-view data for more secure identification. The Caltech 101-20 dataset comprises multi-view data by extracting HOG and other features from images (Fei-Fei et al. 2004). Chen et al. (2022b) used the deep learning network to extract the deep learning features as multi-view data. Multi-view clustering algorithms aim to reduce intra-class differences and expand inter-class differences by mining the consistency and complementarity information between different views. Technological progress has made it easier to access multi-view data, resulting in significant growth in the scale of multi-view data. Some traditional multi-view clustering methods, such as multi-view spectral clustering (Gao et al. 2015), are no longer applicable due to the cubic computational complexities with respect to the number of samples. For example, in medicine, when using multi-view clustering algorithms to analyze large-scale flow cytometry data, due to the large sample size and the urgent need for doctors to provide diagnostic opinions based on the analysis results, it is hoped that the time complexity can be as low as possible. To exploit the consistency and complementary information of large-scale multi-view data with linear (or even lower) computational complexities while obtaining comparable accuracies, Large-Scale Multi-View Clustering (LSMVC) is therefore proposed. It can be seen that compared to the traditional multi-view clustering algorithms that require cubic computational complexities or even more, LSMVC focuses on reducing the computational complexity to be linear with the number of samples or even less, thus solving large-scale multi-view clustering tasks on usable computing resources.

Numerous researchers have proposed different solutions to the multi-view clustering problem, and several reviews have summarized these multi-view clustering algorithms from different perspectives. According to the clustering principles, Xu et al. (2013) classified the multi-view clustering methods into three types: Co-training, multiple kernel learning, and subspace learning. With the in-depth study, Yang and Wang (2018) combined the latest multi-view clustering method in 2018, and based on Xu et al. (2013), new summaries of multi-view graph clustering and multi-task multi-view clustering were added. Further, classification was also given for multi-view graph and subspace clustering. Since multi-view clustering is essentially mining the consistency and complementarity of each view, Zhang et al. (2019) summarized the algorithm of feature selection and fusion. They derived them into multi-view clustering. Wang et al. (2019a) translated the general paradigm of graph-based multi-view clustering, which are construction and fusion graphs, and obtained the final clustering result from the fusion graph. In addition, a new multi-view graph-based clustering method was proposed. Fu et al. (2020) introduced several graph and subspace clustering algorithms. Unlike the previous review, a binary-code-learning-based multi-view clustering algorithm was mentioned. Chen et al. (2022a) divided multi-view clustering into representation-based learning and non-representation-based learning according to the perspective of representation learning. Among them, the graph-based multi-view clustering and the subspace-based multi-view clustering were partitioned into shallow representation learning, and the deep-learning-based multi-view clustering method was partitioned into deep-representation learning. Wang et al. (2020) summarized large-scale machine learning in terms of model simplification, optimization approximation, and computation parallelism. For example, model simplification was divided into kernel-based, graph-based, deep-based, and tree-based models. It was not explicitly summarized for large-scale multi-view clustering. Song et al. (2022) outlined the large-scale graph-based semi-supervised classification algorithms in terms of graph construction, graph regularization methods, and graph embedding methods, respectively. As can be seen, Xu et al. (2013), Yang and Wang (2018), Zhang et al. (2019), Wang et al. (2019a), Fu et al. (2020) and Chen et al. (2022a) mainly summarized multi-view clustering according to different principles, such as graph-based, subspace-based, multi-kernel-based, etc. However, a summary of large-scale multi-view clustering methods is lacking. Some large-scale data learning methods summarized in Wang et al. (2020) and Song et al. (2022), such as the anchors-based method, were still applicable for large-scale multi-view data learning. However, since they were not specifically summarized for large-scale multi-view clustering (LSMVC), these reviews were not comprehensive.

Reducing the computational complexity of LSMVC algorithms is essential for the application of multi-view clustering in practice (Che et al. 2022; Sheikh Hassani and Green 2019; Labroski 2018; He et al. 2020; Wang et al. 2019b; Yang et al. 2022d). Although many algorithms have emerged to reduce the computational complexity of LSMVC algorithms, some methods are used repeatedly across these algorithms. Furthermore, the current reviews do not summarize these methods comprehensively. Therefore, unlike the previous studies, this paper starts with the principles of methods for reducing the computational complexity of LSMVC algorithms, first with a detailed description of the principles of each method, followed by the specific clustering algorithms. Specifically, the contributions of this paper are as follows.

  • From the perspective of reducing the computational complexity of LSMVC algorithms, this paper summarizes four LSMVC methods that have been widely used in many works in recent years and provides a detailed introduction to the principles of these methods.

  • The representative algorithms of each method are selected, and the performances of these algorithms in multiple public datasets are compared and analyzed through experiments.

  • As for the shortcomings of the current LSMVC algorithms, the future improvement direction is discussed in this paper.

The following chapters of this paper are arranged as follows. Section 2 explains the principles of various methods to reduce the computational complexity of LSMVC algorithms and introduces the corresponding algorithms. In Sect. 3, this paper compares and analyzes the representative algorithms through experiments. In Sect. 4, we mainly discuss the Pros and cons of the current LSMVC algorithms. The future research directions are given in Sect. 5. Finally, the paper is summarized in Sect. 6.

2 Methods

In this section, we describe in detail the principles of four frequently used methods for reducing the computational complexity of LSMVC algorithms: third-order tensor t-SVD, anchors-based graph construction, matrix blocking, and matrix factorization. In addition, to enable the reader to understand better the application of each class of methods to specific clustering algorithms, this paper introduces a variety of current state-of-the-art algorithms for each type of method. The summarized methods and corresponding algorithms are shown in Fig. 1. Unless otherwise stated, the main notations used are listed in Table 1.

Fig. 1
figure 1

The summarized methods and corresponding algorithms

Table 1 The main notations used in this paper

2.1 Third-order tensor t-SVD based LSMVC

Since Kilmer et al. (2013) proposed applying the third-order tensor t-SVD to image processing in 2013, the third-order tensor t-SVD has also been widely used in multi-view clustering. In 2018, Combined with Low-Rank Representation (LRR) (Liu et al. 2012; Xie et al. 2018) proposed the third-order tensor t-SVD-based Multi-view Subspace Clustering (T-SVD-MSC) applied to multi-view data clustering. T-SVD-MSC algorithm explores low-rank tensor subspace and high-order correlations by combining multi-view features into third-order tensors. At the same time, combined with the low-rank tensor constraint provided by LRR, the t-SVD-MSC algorithm can better mine the consensus and complementary information among all views compared with traditional multi-view fusion, such as addition and concatenation (Song et al. 2022). In addition, it is a clever way to reduce the computational complexity of LSMVC algorithms by rotating the constructed third-order tensor. Therefore, combined with third-order tensor decomposition, many multi-view clustering algorithms have been proposed in recent years (Li et al. 2021; Wu et al. 2019, 2020; Jiang and Gao 2022; Chen et al. 2019, 2021, 2022c; Zhang et al. 2021; Xu et al. 2020; Fu et al. 2022; Yang et al. 2022c; Xia et al. 2021, 2022a, b; Sun et al. 2020; Tang et al. 2021; Ma et al. 2022; Dou et al. 2021; Zheng et al. 2020, 2023; Hao et al. 2022; Gao et al. 2020). This section first introduces the third-order t-SVD, then the reduction of computational complexity caused by the third-order rotation will be analyzed theoretically. Finally, some clustering algorithms related to third-order tensor decomposition are introduced (Algorithm 1).

2.1.1 Principle of third-order tensor-based t-SVD

Definition 1

The t-SVD of the third-order \(X_{3rd}\in R^{n_{1}\times n_{2}\times n_{3}}\) is defined as:

$$\begin{aligned} X_{3rd} = U_{3rd} \times S_{3rd} \times V_{3rd} ^ T, \end{aligned}$$
(1)

where \(U_{3rd}\in R^{n_{1}\times n_{1}\times n_{3}}\), \(V_{3rd}\in R^{n_{2}\times n_{2}\times n_{3}}\), \(S_{3rd}\in R^{n_{1}\times n_{2}\times n_{3}}.\)

Algorithm 1
figure a

third-order tensor-based t-SVD

In Fig. 2, the time complexity required for third-order tensor decomposition of P is \(O(N^3 V)\), where V is the number of views of the dataset X. For large-scale data, the computational consumption \(O(N^3 V)\) is unacceptable. A simple but effective solution is to rotate P into \({\widetilde{P}} \in R^{N \times V \times N}\). The time complexity of the third-order tensor t-SVD can be effectively reduced from \(O(N^3 V)\) to \(O(N^2 V^2)\) \((V\ll N)\). In addition, each frontal slice is converted from containing only single-view information to holding each view, thus enabling more effective extraction of deep information of multi-view data (Kilmer et al. 2013). The superiority of rotation for the third-order tensor-based t-SVD is further demonstrated by comparing the performance metrics with and without rotation in the literature (Kilmer et al. 2013).

Fig. 2
figure 2

Construction of the third-order tensors based on each view’s Markov chain transition probability matrix (Wu et al. 2019). \(X^i (i=1,2,\ldots,V)\) represents the single-view data that comes from feature extraction of source dataset X, where the sample size of the dataset X is N. \(W^i\in R ^ {N \times N}\) is the adjacency matrix of each view. \(P^i\in R^{N \times N}\) is the Markov chain transition probability matrix of each view. \(P \in R^{N \times N \times V}\) is the third-order transition probability tensor. \({\widetilde{P}} \in R^{N \times V \times N}\) is the rotation of P

2.1.2 Algorithms based on third-order tensor t-SVD

The rotation has recently been widely used in multi-view clustering algorithms based on the third-order tensor t-SVD (Xie et al. 2018; Li et al. 2021; Wu et al. 2019, 2020; Jiang and Gao 2022; Chen et al. 2019, 2021, 2022c; Zhang et al. 2021; Xu et al. 2020; Fu et al. 2022; Yang et al. 2022c; Xia et al. 2021, 2022a, b; Sun et al. 2020; Tang et al. 2021; Ma et al. 2022; Dou et al. 2021; Zheng et al. 2020, 2023; Hao et al. 2022; Gao et al. 2020). This paper classifies them into three categories according to the time complexity. Generally, since the power of the number of samples N directly determines the time complexity of the algorithm, only the power of N is considered here. The three categories are \(O(N^3)\) (Xie et al. 2018, 2021, 2022a; Li et al. 2021; Wu et al. 2019, 2020; Chen et al. 2019, 2021; Zhang et al. 2021; Sun et al. 2020; Ma et al. 2022; Dou et al. 2021; Zheng et al. 2020, 2023; Hao et al. 2022; Gao et al. 2020), \(O(N^2)\) (Xu et al. 2020; Fu et al. 2022; Chen et al. 2022c), O(N) (Jiang and Gao 2022; Yang et al. 2022c; Tang et al. 2021; Xia et al. 2022b), respectively. Each category will be introduced separately in the following.

As shown in the previous section, the third-order tensor Rotation can effectively reduce the time complexity of the tensor SVD from \(O(N^3)\) to \(O(N^2)\). However, more factors need to be considered in the algorithm, resulting in a time complexity of \(O(N^3)\). Xie et al. (2018), Zhang et al. (2021), Chen et al. (2019), Xia et al. (2021), Sun et al. (2020), Dou et al. (2021), Zheng et al. (2020, 2023), Hao et al. (2022) and Gao et al. (2020) learned the representation matrices of each view from the original multi-view data based on the subspace representation and constructs them as a third-order tensor to learn the high-dimensional correlation. Among them, the time complexity of Xie et al. (2018, 2021), Zhang et al. (2021), Chen et al. (2019), Sun et al. (2020), Dou et al. (2021) and Zheng et al. (2020) is \(O(N^3)\) due to the spectral clustering of the similarity matrix obtained after optimization. In addition, Zhang et al. (2021) added SMR (SMooth Representation) (Hu et al. 2014) to maintain the local geometric structure. However, Data structure obtained directly from raw data may contain noise or redundant information. Xia et al. (2021) used the common indicator matrix to preserve the uniformity of clustering structure across views. The proposed weighted tensor Schatten p-norm method treats each view differentially, which is more reasonable than equality. Zheng et al. (2023) introduced the Degeneration Mapping Model and Low-rank Tensor Constraint to learn the degeneration mapping model and multi-view information thoroughly. While obtaining more comprehensive information, it is more time-consuming compared to other algorithms, where the time complexity of multiple optimization step are \(O(N^3)\). Hao et al. (2022) explored consensus and complementary information using the common matrix and HSIC (Hilbert Schmidt Independence Criterion) (Gretton et al. 2005), respectively. However, it remains to be seen whether the two components can work together to improve performances. The above factors also make the time complexity of the algorithm \(O(N^3)\). In addition, Li et al. (2021), Wu et al. (2019), Chen et al. (2021), Ma et al. (2022) and Wu et al. (2020) also required spectral clustering to obtain the final clustering results, result in the time complexity of these algorithms is also \(O(N^3)\). Xia et al. (2022a) introduced a rank constraint on the consistent indicator matrix to ensure the similarity matrix has a k-connected structure. However, the time complexity of updating the indicator matrix is \(O(N^3)\).

The algorithms related to the multi-view third-order tensor Rotation with time complexity \(O(N^2)\) are summarized below. Fu et al. (2022) constructed the third-order tensor based on the subspace representation matrixes of each view and learns the indicator matrix containing consensus information by weighted spectral embedding. However, the affinity matrix and the representation tensor are solved independently, which cannot effectively exploit the high correlation between them. Xu et al. (2020) optimized the indicator matrix containing consensus information of each view by minimizing the weighted tensor nuclear norm. Despite their good performance, their uniform treatment of all views lacks practicality. Chen et al. (2022c) explored the high-order correlation of each view by the weighted t-SVD-based tensor nuclear norm constructed by the similarity matrix of each view and uses the rank constraint to find the consensus graph containing k connections together. Which enables a comprehensive exploration of the relationships between sample features and sample structure. As can be seen, Xu et al. (2020), Fu et al. (2022) and Chen et al. (2022c) obtained the clustering results by directly optimizing the objective function, omitting the spectral clustering with a time complexity of \(O(N^3)\), allowing the time complexity O\((N^2)\) obtained through the third-order tensor rotation to be maintained.

The algorithms with the time complexity of \(O(N^3)\) and \(O(N^2)\) using the third-order tensor rotation are summarized above. It is known that the t-SVD time complexity of third-order tensor with dimension \(R^{N \times N \times V}\) can only be reduced to \(O(N^2)\) by the Rotation operation. Therefore, the t-SVD complexity of the third-order tensor can be further reduced only by first reducing the size of the similarity matrix, which in turn reduces the algorithm’s time complexity to O(N). Therefore, Jiang and Gao (2022), Yang et al. (2022c) and Xia et al. (2022b) introduced the idea of anchors to construct a similarity matrix with dimension \(R^{N \times M}\) instead of \(R^{N \times N}\) by M anchors \((M \ll N )\), which in turn forms the third-order tensor with dimension \(R^{N \times M \times V}\). Specifically, Yang et al. (2022c) obtained the weighted similarity matrix with k connections by tuning the hidden parameters. Despite their ability to handle large-scale multi-view data, it overlooks the spatial structure and fail to leverage the complementary information inherent in multi-view data. Jiang and Gao (2022) learned bipartite graphs with k connections through rank constraints, and the clustering result can directly obtain. Xia et al. (2022b) used the \(\ell _{1,2}\)-norm of the similarity matrix of each view as a penalty term and weights them after optimization to obtain the consensus graph with k connections. In addition, considering the importance of anchors for graph construction, an effective method for anchors selection is proposed based on Principal Component Analysis (PCA). Tang et al. (2021) proposed to construct a subspace representation matrix using a low-dimensional basis matrix instead of using original multi-view data as a dictionary, similar to anchors. The clustering results are obtained directly by learning the latent clustering structure. The time complexity and the corresponding codes (if available) of the above algorithms are listed in Table 2.

Table 2 The time complexity and the corresponding codes (if available) of algorithms based on third-order tensor t-SVD

In short, constructing the third-order tensor can effectively mine higher-order information of each view. The tensor Rotation reduces the time complexity of the third-order tensor t-SVD from \(O(N^3)\) to \(O(N^2)\). To our best knowledge, the time complexity of multi-view clustering based on the third-order tensor can be mainly classified into \(O(N^3)\), \(O(N^2)\) and O(N). The algorithms with a time complexity of \(O(N^3)\) are mainly caused by spectral clustering, and secondly, adding some methods to improve the clustering performance will also improve the time complexity. The algorithm with a time complexity of \(O(N^2)\) is mainly based on the direct learning of the matrix, which contains the clustering result from the objective function, thus avoiding the time complexity of \(O(N^3)\) caused by spectral clustering. The algorithm with a time complexity of O(N) mainly reduces the size of the similarity matrix by anchors, which in turn reduces the time complexity of the third-order tensor t-SVD to O(N) and thus leads to the time complexity of the algorithm is O(N) by directly learning the clustering results.

2.2 Anchors-based graph construction based LSMVC

Since anchors can substantially reduce the time complexity and spatial complexity of graph construction, it has been widely used in graph-based large-scale data processing since Liu et al. (2010) proposed it (Liu et al. 2010; Guo and Ye 2019; Deng et al. 2016; Yang et al. 2020; Chen and Cai 2011; Cai and Chen 2014; Li et al. 2015, 2016; Hong et al. 2023; Han et al. 2017). Especially after the improvement of anchors-based methods proposed by Nie et al. (2016b) and Wang et al. (2016) later, there are more and more large-scale multi-view data clustering algorithms proposed based on anchors (Shen et al. 2022; He et al. 2020, 2019; Shi et al. 2021a; Zhang and Sun 2022; Sun and Zhang 2022; Affeldt et al. 2020; Qiang et al. 2021; Zhang et al. 2020a; Yang et al. 2022a, b; Wang et al. 2019b; Shu et al. 2022). To facilitate the understanding, this section first presents the concept of graph construction, then introduces the anchors-based method and its improvement based on the problems in graph construction, and finally gives a brief description of the anchors-based algorithms. Note that the anchors-based approach in this section focuses on reducing the complexity of graph construction and thus reducing the computational complexity of the overall algorithm, unlike the matrix factorization, where the factor matrix is regarded as the anchors.

2.2.1 Methods of anchors-based graph construction

The computational complexity of constructing the similarity matrix is \(O(N^2)\) and the time complexity \(O(N^3)\) is required for the eigendecomposition of the similarity matrix, which is not conducive to large-scale multi-view clustering (Wang et al. 2021b). The Nadaraya-Watson kernel regression (Hastie et al. 2009) method is used in Liu et al. (2010) and Chen and Cai (2011) to construct the low-dimensional sparse representation matrix from the original samples as well as anchors instead of a full-size similarity graph. Specifically, the low-dimensional sparse representation matrix \(Z \in R^{N \times M}\) is constructed from the given data set \(X = \left\{ x_{1}, x_{2}, \ldots , x_{n} \right\} \in R^{d \times N}\) and the anchors \(\left\{ a_{1}, a_{2}, \ldots , a_{n} \right\} \in R^{d \times M}\).

$$\begin{aligned} {\textbf{z}}_{ij} = {\left\{ \begin{array}{ll} \frac{K_{\sigma }{({x_{i},a_{j}})}}{\sum _{j^{\prime } \in {\langle i\rangle }}K_{\sigma {({x_{i},a_{j{^{'}}}})}}},&{} \quad {\text {if}}\; j \in \left\langle i \right\rangle \\ {0,}&{}\quad {\text {otherwise}}, \end{array}\right. } \end{aligned}$$
(2)

where \(j \in \left\langle i \right\rangle\) represents the set of data indexes of \(\sigma\) nearest anchors around \(x_i\). Generally, the Gaussian kernel function \(K_{\sigma }{({x_{i},a_{j}})}\) is used.

Let \({\hat{Z}}=Z{\textstyle \sum _{}^{-1/2}}\), \({\textstyle \sum _{ii}^{}} = {\textstyle \sum _{j=1}^{n}}Z_{ij}\), according to Chen and Cai (2011), the similarity matrix of the graph can be defined as:

$$\begin{aligned} W = {\hat{Z}}{{\hat{Z}}}^{T} \end{aligned}$$
(3)

Assuming the SVD decomposition of \({\hat{Z}}\) is \({\hat{Z}} = U_{svd}\Lambda V_{svd}^{T}\), it can be shown that:

$$\begin{aligned} \begin{aligned} W = {\hat{Z}}{{\hat{Z}}}^{T}&= \left( {U_{svd}\Lambda V_{svd}^{T}} \right) \left( {U_{svd}\Lambda V_{svd}^{T}} \right) ^{T} \\&= U_{svd}\Lambda \left( {V_{svd}^{T}V_{svd}} \right) \Lambda U_{svd}^{T} \\&= U_{svd}V_{svd}^{2}U_{svd}^{T} \end{aligned} \end{aligned}$$
(4)

Therefore, by performing the SVD decomposition of \({\hat{Z}}\), the left singular value matrix of the similarity matrix W can be obtained. The time complexity of the SVD decomposition of \({\hat{Z}}\) is \(O(M^2 N)\), which is much smaller than that of \(O(N^3)\) when performing eigendecomposition on similarity matrix since \(M \ll N\). Moreover, the computational complexity of constructing the graph is reduced from \(O(N^2)\) to O(MN). Generally, random selection or k-means clustering can be used for constructing anchors.

However, when solving the above low-dimensional nearest neighbor matrix \(Z \in R^{N \times M}\) by Nadaraya-Watson kernel regression, the data distribution needs to be known in advance to construct a suitable Z with hyperparameters \(\sigma\). Therefore, Nie et al. (2016b) proposed a strategy for calculating Z with more easily determined hyperparameters and higher computational efficiency. Based on the principle that the closer the sample point is to the anchor point when constructing the graph, i.e., the smaller \(\left\| x_{i}-a_{j} \right\| ^2\), the larger \(z_{ij}\) in similarity matrix should be. Solve for each row in z by the following,

$$\begin{aligned} \min {\sum _{j = 1}^{m}{\left\| {x_{i} - a_{j}} \right\| _{2}^{2}z_{ij}}} + \gamma {z_{ij}}^{2},s.t.{{\textbf{z}}_{{\textbf{i}}}}^{T}1 = 1,~z_{ij} \ge 0, \end{aligned}$$
(5)

where \({z_i}^T\) denotes the ith row of Z. Let \(d_{i,j} =\left\| {x_{i} - a_{j}} \right\| _{2}^{2}\), \(\gamma =\frac{l}{2}d_{i,l + 1} - \frac{1}{2}{\sum _{j = 1}^{l}d_{i,j}}\), the solution of the above equation is

$$\begin{aligned} z_{ij} = {\left\{ \begin{array}{ll} \frac{d_{i,l + 1} - d_{i,j}}{ld_{i,l + 1} - {\sum _{j = 1}^{l}d_{i,j}}}, &{}\quad \text {if}\;j \le l \\ {0,} &{} \quad \text{otherwise} \\ \end{array}\right. } \end{aligned}$$
(6)

Where l denotes the number of nearest neighbors.

2.2.2 Algorithms based on anchors-based graph construction

The methods proposed above can significantly reduce the computational complexity of graph construction as well as the computational complexity of SVD decomposition. Therefore, it is widely used in graph-based clustering algorithms (Guo and Ye 2019; Deng et al. 2016; Yang et al. 2020, 2022a, b, 2023a; Li et al. 2015, 2016; Hong et al. 2023; Han et al. 2017; Shen et al. 2022; He et al. 2019, 2020; Shi et al. 2021a; Zhang and Sun 2022; Sun and Zhang 2022; Affeldt et al. 2020; Wang et al. 2019b, 2021b, 2023; Qiang et al. 2021; Han et al. 2019; Yu et al. 2018; Zhang et al. 2020a). The algorithms related to graph construction based on the above two methods are described below.

Guo and Ye (2019), Deng et al. (2016), Yang et al. (2020), Li et al. (2015), Hong et al. (2023), Han et al. (2017), Li et al. (2016), Shen et al. (2022), Wang et al. (2023) and Affeldt et al. (2020) constructed graph by applying the method of Liu et al. (2010), Chen and Cai (2011) and Cai and Chen (2014) to reduce the scale of the graph as well as the computational complexity. Among them, Deng et al. (2016), Yang et al. (2020), Li et al. (2015), Hong et al. (2023), Han et al. (2017), Shen et al. (2022) and Affeldt et al. (2020) proposed algorithms applied to complete large-scale data clustering. Specifically, Deng et al. (2016) proposed to use the landmark of the LSC algorithm (Chen and Cai 2011) to represent the sample class, thus reducing the computational complexity by computing only the distance between the test samples to the landmark instead of all samples in the KNN. Yang et al. (2020) measured the difference between the weighted graph of each view and non-negative orthogonal decomposition based on the F-norm, where the graph of each view is constructed by anchors, and the non-negative orthogonal decomposition contains the matrix with category information. However, the decoupling of anchor selection and clustering might lead to suboptimal results. Li et al. (2015) proposed a new method for solving multi-view spectral clustering based on the construction of bipartite graphs with salient points, i.e., anchors, which reduces the time complexity from \(O(N^2)\) to O(N). Unfortunately, the representation ability of bipartite graphs is limited, which leads to poor clustering performance. Hong et al. (2023) proposed a two-stage spectral clustering algorithm. Firstly, the affinity matrix was constructed based on anchors. Then the affinity matrix was used to learn the low-rank affinity matrix by probability density estimators. However, the choice of anchors in the first stage of the two-stage approach directly affects the clustering results in the second stage. In Han et al. (2017), the diagonal block structure of the graph was optimized by the reconstruction terms while using the indicator matrix to approximate the label matrix. Benefiting from the sparse adjacency matrix obtained from the anchors-based graph construction, the time complexity of the algorithm is only linear with the sample size. Shen et al. (2022) proposed a compact hash coding method that can directly learn multi-view distributed data. In which spectral clustering was used to reveal the underlying structure among the nodes of multi-view data, the computational complexity of spectral clustering was further reduced based on anchors. Affeldt et al. (2020) proposed optimal spectral clustering based on deep autoencoders. Firstly, the affinity matrixes were generated based on the anchors and deep autoencoders with different hyperparameters. Then the clustering results were obtained using spectral clustering after concatenating these affinity matrixes. However, it does not perform well with high noise data or high similarity data. Furthermore, Guo and Ye (2019) used the common instances of partial multi-view data as anchors to bridge all the inter-view instances. Li et al. (2016) proposed a method for sequentially processing each block of datasets, which led to a significant reduction in the one-time memory consumption during clustering. In which anchors were used to reduce the graph’s size and the computational time complexity. Wang et al. (2023) first constructed the similarity matrix based on anchors and obtains the latent partition representation by SVD. Then the learning of the consensus matrix as well as the discrete indicator matrix is unified. By mining consistency information at the partition matrix level instead of the similarity matrix level, it is beneficial to reduce redundant information as well as noise interference.

He et al. (2019, 2020), Shi et al. (2021a), Zhang and Sun (2022), Sun and Zhang (2022), Wang et al. (2019b, 2021b), Qiang et al. (2021), Han et al. (2019), Yu et al. (2018), Zhang et al. (2020a) and Yang et al. (2022a, b, 2023a) applied the method based on minimizing the distance between anchors and samples proposed by Nie et al. (2016b) for graph construction. Shi et al. (2021a) and Wang et al. (2021b) considered spectral embedding and rotation (Huang et al. 2013) to construct the objective function containing the clustering information. The adaptive weighting of different views in Shi et al. (2021a) reflected the importance of different views. Based on the consistency of the relaxed objective function of minimizing Ncut and the Rcut (Shi and Malik 2000; Qiang et al. 2021) learned the clustering results directly based on a weighted fusion graph, and Han et al. (2019) reconstructed the graph structure based on orthogonal constraints as well as non-negative constraints. Zhang and Sun (2022) first constructed the similarity matrix of each view of incomplete multi-view data based on anchors and then learns the non-negative consensus embedding and the view-specific orthogonal spectral embedding of each view based on the non-negative orthogonal decomposition. However, they solely focus on the similarity among incomplete graph of available instances within each view, overlooking the importance of inter-view similarity. As a result, they fail to effectively leverage the complementary information inherent in multiple views. Sun and Zhang (2022) proposed that consensus non-negative embeddings and common graph should also be learned based on the complete graph of each view based on Zhang and Sun (2022). Nonetheless, the initially available instances are not retained. He et al. (2019) and Zhang et al. (2020a) proposed the semi-supervised learning method based on anchors to solve the problem of clustering many unlabeled data. He et al. (2019) implemented unlabeled sample classification based on labeled samples and similarity graphs constructed based on anchors. Zhang et al. (2020a) first constructed the similarity matrix for each view based on anchors. Then learned the consensus graph and the label matrix based on the adaptive weighted graph and currently available label information. However, the consensus relation between instances and their assemble anchors is ignored. He et al. (2020) applied the method in He et al. (2019) to hyperspectral image clustering. In addition, Wang et al. (2019b) clustered large-scale hyperspectral images by the Augmented Lagrangian Multiplier (ALM) algorithm based on anchor-based graph construction as well as the nonnegative relaxation term. The algorithm is efficient because the clustering results can be obtained directly from the nonnegative relaxation term without post-processing. Yu et al. (2018) first explored the anchors by hierarchical k-means and constructs the graph. The objective function with regression residuals and regular terms as constraints was then solved. Yang et al. (2022a, b) proposed to use anchors to solve the problem of high computational complexity when using correntropy in existing methods. Yang et al. (2023a) proposed to first use concept factorization for the similarity matrix constructed based on anchors and measure the similarity before and after the factorization by correntropy, and then imposed orthogonal constraints on the factor matrix. Where anchors bring about the reduction in computational complexity, and correntropy favors more robust results. The time complexity and the corresponding codes (if available) of the above algorithms are listed in Table 3.

Table 3 The time complexity and the corresponding codes (if available) of algorithms based on anchors-based graph construction

An overview of many algorithms using both graph construction methods in recent years is given above. It can be seen that for the method proposed by Liu et al. (2010), Chen and Cai (2011) and Cai and Chen (2014), it is necessary to know the sample distribution to set the appropriate nearest neighbor distance for the graph construction. However, the real-world sample distribution is mainly unpredictable, so the method is challenging to construct a suitable graph. In contrast, Nie et al. (2016b) proposes to construct the graph based on the distance between samples and anchors, where the number of nearest neighbors is easier to determine compared to the nearest neighbor distance, and therefore has been widely used in recent times. However, the above two manual graph construction methods cannot reflect the data structure, especially for multi-view data. Therefore, further research is needed.

2.3 Matrix blocking based LSMVC

Matrix blocking here refers to the reduction of computational complexity by blocking the matrix to speed up the computation. Dhillon (2001) and Nie et al. (2017) proposed to apply matrix blocking to co-clustering, thus simplifying the eigenvalue decomposition of the Laplace matrix in Ncut (Shi and Malik 2000) and directly solving the indicator matrix containing the final classification results without post-processing, thus possessing the ability to reduce the complexity of clustering large-scale multi-view data. In recent years, matrix blocking has been widely used in various clustering algorithms due to the need for clustering large-scale multi-view data (Du et al. 2023; Yang et al. 2022d, 2023b; Yuan and Wang 2022; Li and He 2020; Fang et al. 2023; Hu et al. 2021; Liu et al. 2020; Chang et al. 2019; Zhang et al. 2022a; Nie et al. 2019, 2021, 2017; Zhou et al. 2022, 2023; Zhang and Ma 2022; Li et al. 2020; Lu and Feng 2023; Kang et al. 2021; Ren et al. 2019). This section first introduces the research background of matrix blocking, and then some excellent algorithms based on matrix blocking to improve computational efficiency are selected to give a brief introduction.

2.3.1 Foundation of matrix blocking

For the normalized Laplacian matrix \({\tilde{L}} = I - D^{- 1/2}SD^{- 1/2}\), D is the degree matrix, \(Z \in R^{N \times M}\) is the adjacency matrix composed of anchors, and the similarity graph \(S \in R^{(N+M) \times (N+M)}\) can be expressed using the augmented graph of Z

$$S = \left[ \begin{array}{ll} \quad Z \\ Z^{T} \end{array}\right]$$
(7)

Theorem 1

The multiplicity of the eigenvalue zero of the normalized Laplacian matrix \({\tilde{L}}\) is equal to the number of connected components in the graph associated with S.

Since the number of non-zero eigenvalues of the real symmetric matrix is equal to the rank of the matrix, combined with Theorem 1, the graph can be guaranteed to have k connections by adding the rank constraint rank \(rank(\tilde{L}) = (N + M) - k\) to the objective function, where N is the number of samples and k is the categories. Let \(\sigma _{i}( {\tilde{L}} ) \ge 0\) be the ith smallest eigenvalue of \({\tilde{L}}\), according to KyFan’s theorem (Fan 1949):

$$\begin{aligned} \sum _{i = 1}^{k}{\sigma _{i}( {\tilde{L}} ) = {\begin{matrix} \min \\ {F{\in R}^{(N + M) \times k},FF^{T} = I} \\ \end{matrix}{Tr( {F^{T}{\tilde{L}}F} )}}} \end{aligned}$$
(8)

Note that to solve the matrix F in Eq. (8), the eigendecomposition of \({\tilde{L}}{\in R}^{(N + M) \times (N + M)}\) is required, the time complexity of solving eigendecomposition is \(O( (N + M)^{3} )\), which is not suitable for handling large-scale data, Since \({\tilde{L}} = I - D^{- 1/2}SD^{- 1/2}\), Eq. (8) can be rewritten as:

$$\begin{aligned} \begin{matrix} \min \\ {F{\in R}^{(N + M) \times k},FF^{T} = I} \\ \end{matrix}{Tr( {F^{T}( I - D^{- 1/2}SD^{- 1/2} )F} )} \end{aligned}$$
(9)

The above equation is equivalent to:

$$\begin{matrix} {\max} \\ {F{\in R}^{(N + M) \times k},FF^{T} = I} \\ \end{matrix}{Tr( {F^{T}D^{- \frac{1}{2}}SD^{- \frac{1}{2}}F} )}$$
(10)

\(F{\in R}^{(N + M) \times k}\) and \({D \in R}^{(N + M) \times (N + M)}\) can be re-represented using block matrices. \(F = \left[ \begin{array}{ll} U_{block} \\ V_{block} \\ \end{array}\right] ,D_{S} = \left[ \begin{array}{ll} D_{Su} \\ \quad D_{Sv} \\ \end{array}\right]\), where \(U \in R^{N \times k},V \in R^{M \times k}\), \(D_{Su} \in R^{N \times N},D_{Sv} \in R^{M \times M}\), then Eq. (10) is equivalent to:

$$\begin{matrix} {\max} \\ {U_{block}^{T}U_{block} + V_{block}^{T}V_{block} = I} \\ \end{matrix}( {Tr( {U_{block}^{T}{D_{Su}}^{- \frac{1}{2}}Z{D_{Sv}}^{- \frac{1}{2}}V_{block}} )})$$
(11)

Theorem 2

Given \(M \in R^{n_{1} \times n_{2}}\), the optimal solutions of \(X \in R^{n_{1} \times k}, Y \in R^{n_{2} \times k}\) to problem

$$\begin{matrix} {\max} \\ {X^{T}X + Y^{T}Y = I} \\ \end{matrix}( {Tr( {X^{T}MY})})$$
(12)

are \(X = \frac{\sqrt{2}}{2}U_{1}, Y = \frac{\sqrt{2}}{2}V_{1}\) where \(U_1\) and \(V_1\) are respectively the left and right singular vectors of M corresponding to its c largest singular values.

According to Eq. (12) and Theorem 2, \(U_{block}\) and \(V_{block}\) can be obtained by performing SVD on \({D_{Su}}^{- \frac{1}{2}}Z{D_{Sv}}^{- \frac{1}{2}} \in R^{N \times M}\) and then rebuild F through \(U_{block}\) and \(V_{block}\). At this point, the time complexity of SVD decomposition for \({D_{Su}}^{- \frac{1}{2}}Z{D_{Sv}}^{- \frac{1}{2}}\) is \(O\left( {M^{2}N + M^{3}} \right) \ll O\left( (N + M)^{3} \right)\).

2.3.2 Algorithms based on matrix blocking

The above example of solving the rank constraint is used to theoretically analyze that matrix blocking can effectively reduce the computational complexity of the solution. In recent years, large-scale multi-view data have put higher demands on the computational efficiency of clustering algorithms. Therefore, matrix blocking, which can effectively reduce computational complexity, is widely used in many clustering algorithms (Du et al. 2023; Yang et al. 2022d, 2023b; Yuan and Wang 2022; Li and He 2020; Fang et al. 2023; Hu et al. 2021; Liu et al. 2020; Chang et al. 2019; Zhang et al. 2022a; Nie et al. 2017, 2019, 2021; Zhou et al. 2022, 2023; Zhang and Ma 2022; Li et al. 2020; Lu and Feng 2023; Kang et al. 2021; Ren et al. 2019). A selection of representative algorithms will be outlined below.

By imposing a rank constraint on the bipartite graph and solving the rank constraint based on matrix blocking, the bipartite graph with a clear clustering structure can be obtained, and the computational complexity can be reduced. Li and He (2020), Fang et al. (2023), Nie et al. (2021), Zhou et al. (2022, 2023) and Lu and Feng (2023) further constructed the objective function to use complementary and consensus information among different views fully. Specially, Li and He (2020) and Fang et al. (2023) proposed an objective function based on the mutual optimization of the graph of views, consistent bipartite graphs, and anchor points. Based on the rank constraint, the learned bipartite graph is made to have the k-connected components. Unlike the two-stage approach, in which the sparse adjacency matrix is first constructed based on anchors and then the consensus information is mined, the above two approaches unify the learning of adjacency matrix and the mining of consensus information. Nie et al. (2021) learned graphs with clear block-diagonal structure and k-connected components based on subspace representation and rank constraints. However, it remains limited in its capacity to integrate single-view and unified bipartite graph learning within a unified joint learning model. Zhou et al. (2022) explored the spatial low-rank structure embedded in intra-views using nuclear regularization for the graph of each view while using the Schatten p-norm on the tensor composed of the graph of each view, and finally using rank constraints so that the learned hidden graph has k-connected components. The third-order tensor in the method consisting of an adjacency matrix constructed based on anchors not only mines the complementary information between views, but also further reduces the tensor size. Zhou et al. (2023) used the self-paced Learning method to optimize the initialized bipartite graphs that do not have a clear clustering structure. Gradually mining consistent information from more trusted to less trusted data reduces the side effects of untrustworthy data. Lu and Feng (2023) learned the unified graph with k-connected components by stitching the matrix formed by the bipartite graph of each view, combined with rank constraints. Although it takes into account the diversity of each view’s anchor graph, it reintroduces the problem of high computational complexity.

Some literature considers that multi-step optimization is difficult to obtain optimal clustering results and therefore focuses on constructing a unified optimization framework (Yang et al. 2023b; Yuan and Wang 2022; Chang et al. 2019; Nie et al. 2017; Zhang and Ma 2022; Ren et al. 2019). Yang et al. (2023b) first constructed a low-dimensional representation of the original data based on anchors and incorporates it into an adaptive bipartite graph learning framework to learn bipartite graphs with k-component connection based on rank constraints. Yuan and Wang (2022) proposed to unify anchors selection and bipartite graph learning based on fuzzy k-means and KL (Kullback–Leibler) divergence, and to simplify bipartite graph clustering complexity based on matrix blocking. However, the discretization by spectral rotation may lead to the absence of important information. Chang et al. (2019) proposed to construct an objective function to learn similarity graphs based on a low-rank representation and a regularization term of the error matrix. Moreover, the addition of the rank constraint makes the bipartite graph constructed based on the optimized similarity graph possess k-component connection, which can directly obtain the clustering results. In the literature Nie et al. (2017), the clustering results can be obtained directly by learning the similarity matrix that is most similar to the given affinity matrix and possesses k-connected components. Which used matrix blocking to speed up the solution of the rank constraint. Zhang and Ma (2022) first learned the embedding representation of each view, and then learned the similarity matrix with k-connected components in combination with the rank constraints. Ren et al. (2019) learned the low-dimensional subspace mappings of the original data without compromising its structure based on FME (Nie et al. 2010), while incorporating rank constraints to learn the graph with k-connected components directly from the original data.

In addition, Du et al. (2023) proposed to first solve the optimal neighborhood graphs of each view based on rank constraints, and then construct the third-order tensor for clustering. However, this multi-stage clustering approach needs higher time complexity and is more likely to yield sub-optimal results relative to end-to-end approaches. Nie et al. (2019) proposed a multi-prototype k-means clustering method based on bipartite graph partitioning to address the shortcomings of single-prototype k-means clustering, and based on the rank constraint makes the learned bipartite graph has the k-connected components. Hu et al. (2021) extended the proposed method of Nie et al. (2019) to multi-view clustering. Li et al. (2020) first constructed bipartite graphs for each view based on anchors, and then learns bipartite graphs with k-connected components by directly weighting the bipartite graphs of each view jointly with rank constraints. Unlike constructing similar graphs using anchors directly, Liu et al. (2020) constructed similarity graph by selecting anchors from the graph embedding data of the original data, and then optimized the similarity graphs by combining rank constraints. Zhang et al. (2022a) proposed to learn the similarity matrix between samples and anchors based on the transformation matrix. Also, the transformation matrix forms trace differences to separate samples in different classes, and the matrix blocking was used to solve the rank constraint. Kang et al. (2021) learned the unique graph from the original multi-view data and made the learned unified graph possess k-connected components based on rank constraints. However, from the clustering results, the method is more sensitive to the choice of anchors. Yang et al. (2022d) first learned the spectral embedding representation of samples and anchors based on anchors separately, and then solved the bipartite graph of hyperspectral images containing k-connected components based on the two embedded data and rank constraints. The time complexity and the corresponding codes (if available) of the above algorithms are listed in Table 4.

Table 4 The time complexity and the corresponding codes (if available) of algorithms based on matrix blocking

It can be seen that the rank constraint can be applied in various clustering algorithms to solve graphs with fixed connections. However, its use is limited by the significant computational complexity of the Laplacian matrix. By matrix blocking, the SVD of the Laplace matrix can be significantly simplified, and the computational efficiency of the clustering algorithm can be improved. Therefore, it has been widely used in recent years.

2.4 Matrix factorization based LSMVC

Subspace clustering considers that high-dimensional samples are not uniformly distributed in the high-dimensional space, but in the subspaces of the high-dimensional space (Elhamifar and Vidal 2013; Liu et al. 2012). Thus, solving the memberships of original data to subspaces can be used as the adjacency matrix for spectral clustering. Since subspace clustering theory provides a new and effective clustering idea, it has received more attention after being proposed (Gao et al. 2015; Huang et al. 2022; Kang et al. 2020a). However, the heavy computational complexity makes the subspace clustering unsuitable for large-scale datasets. Subspace clustering based on matrix factorization has a solid theoretical foundation (Lee and Seung 2000, 1999; Cai et al. 2010; Wang et al. 2011; Ding et al. 2006) and can effectively reduce computational complexity. Moreover, the proposed idea of anchors gives the matrix factorization better interpretability (Zhang and Ma 2022). This section first introduces subspace clustering and matrix factorization for reducing subspace clustering, and then introduces some excellent algorithms based on matrix factorization.

2.4.1 Multi-view subspace clustering based on matrix factorization

For multi-view data \(\left\{ X^{1};X^{2};\ldots ;X^{V} \right\}\), subspace clustering solves the adjacency matrix of each view by minimizing the following objective function.

$$\begin{aligned} {\min {\sum \limits _{i = 1}^{V}\left\| {X^{i} - X^{i}\Omega ^{i}} \right\| _{F}^{2}}} + \alpha f\left( \Omega ^{i} \right)\;\text{s.t.}\;\Omega ^{i} \ge 0, \quad \Omega ^{i}1 = 1 \end{aligned}$$
(13)

where \(\Omega ^{i}{\in R}^{N \times N}\) is the subspace coefficient matrix of \(X^i\) and \(\alpha f\left( \Omega ^{i} \right)\) is the regularization term. However, when applied to large-scale multi-view data clustering, the computational complexity of optimizing Eq. (13) to solve for \(\Omega ^i\) is heavy, thus limiting the use of subspace clustering in large-scale multi-view data. NMF (non-negative matrix factorization) of matrix can approximate a large-scale matrix as a multiplication of two small-scale non-negative matrices (Cai et al. 2010)

$$\begin{aligned} X^{i}{\approx U}^{i}Z^{i} \end{aligned}$$
(14)

Combined with the concept of anchors, if M samples are randomly selected as anchors from the original data (which can also be obtained by the cluster centroid of k-means). Then \(U^{i}\) in Eq. (14) can be regarded as the anchors of the ith view \(X^i\), and the matrix \(Z^{i}\) is the coefficients matrix the ith view \(X^i\). The original sample \(X^i\) is approximately characterized by the linear combination of \(U^i\) as well as \(Z^i\) in Eq. (14). Thus Eq. (13) can be rewritten as:

$${\min {\sum \limits _{i = 1}^{V}\left\| {X^{i} - U^{i}Z^{i}} \right\| _{F}^{2}}} + \alpha f\left( Z^{i} \right) ~s.t.~Z^{i} \ge 0, \quad U^{i} \ge 0$$
(15)

According to Lee and Seung (2000) and Huang et al. (2018), Eq. (15) is a convex optimization problem when there are only variables \(U^i\) or \(Z^i\). Generally, it can be solved by alternating minimization. For large-scale data, since \(M \ll N\), the matrix factorization can effectively compress the original data size and reduce the computational complexity.

In addition, based on the ONMTF (Orthogonal Non-negative Matrix Tri-Factorization) proposed in Ding et al. (2006), the original matrix can also be approximated by factorizing it into the multiplications of three matrices.

$$\begin{aligned} X^{i}{\approx C}^{i}A^{i}Z^{i} \end{aligned}$$
(16)

To learn the consensus information of multi-view data, Eq. (16) can be re-expressed as

$$\begin{aligned} X^{i}{\approx C}^{i}AZ \end{aligned}$$
(17)

Therefore, Eq. (13) can be re-expressed as (Sun et al. 2021):

$$\begin{aligned} {\min {\sum \limits _{i = 1}^{V}\left\| {X^{i} - C^{i}AZ} \right\| _{F}^{2}}} + \alpha f(Z)\;\text{s.t.}\;Z \ge 0,C^{i}{C^{i}}^{T} = {\textbf{I}}, \quad AA^{T} = {\textbf{I}}, \quad Z^{T}1 = {\textbf{I}} \end{aligned}$$
(18)

Where A is the base anchors, \(C^i\) is the projection of base anchors in each view, and Z is the consensus matrix. It can be seen that Sun et al. (2021) makes the ONMTF decomposition have better interpretability in multi-view clustering.

2.4.2 Algorithms based on matrix factorization

Matrix factorization-based subspace clustering can effectively reduce the computational complexity of clustering and be used for large-scale data clustering. At the same time, anchors make the matrix factorization more interpretable. Therefore, many clustering algorithms have been proposed based on matrix factorization in recent years. In this section, some algorithms are selected to be sorted out.

Recently, Zong et al. (2017) proposed to learn the consensus manifold and the consensus coefficient matrix based on the non-negative matrix factorization and to construct the objective function in conjunction with the manifold regularization. However, the exploration of information about the variability between views is lacking. Huang et al. (2018) proposed to combine the data space and the feature space to learn the probabilistic neighborhood matrix and construct the regularization term smoothing objective function based on the factors of ONMTF. It avoids suboptimal clustering results that may result from K-Nearest Neighbors (KNN) based graph construction and shows good performance on document clustering tasks. Zhang et al. (2020c) explored intra-view information based on the NMF and self-representation, then integrates multiple information into the complete graph. The proposed method using adaptive weights is able to take into account the specificity between incomplete views. Zhang et al. (2020b) proposed to learn the multiple representation matrixes with k-connected components through the subspace representation and the rank constraint while learning the consensus indication matrix through the multiple indication matrixes and using spectral rotation to obtain the clustering results directly. However, the adjacency matrices of each view learned based on matrix factorization are relatively independent, which is not conducive to the mining of consensus information at the adjacency matrix level. Fan et al. (2020) proposed to divide the shared low-dimensional representation matrix into the summation of row-sparse and column-sparse matrix to enhance the effectiveness of the subspace representation solved by the proposed method. Shi et al. (2021b) proposed to combine the consensus similarity matrix learning and Symmetric and Non-negative Matrix Factorization to solve the clustering results directly and reduce the computational complexity. However, the performance of the algorithm is highly dependent on the adjacency matrix of each view constructed in the first stage using hyperparameters. Liang et al. (2020) improved the clustering performance by imposing orthogonal constraints on the learned basis matrix and the representation matrixes in the non-negative orthogonal factorization. However, the proposed method focuses mainly on the diversity of each view and lacks attention to consensus information between views. Zhou et al. (2021) and Zhang et al. (2022b) proposed using kernel k-means to nonlinearly map the original data and then learned the consensus matrix by minimizing kernel differences through kernel polarization (Wang et al. 2009). The above works combine matrix factorization and constraints to improve the clustering performance while reducing the computational complexity. However, an intuitive explanation of matrix blocking is lacking. Guo et al. (2023) proposed a multi-view clustering method based on anchors without post-processing. The original multi-view data is factorized to learn a unified discrete indicator matrix containing the clustering results. Wan et al. (2023b) and Wan et al. (2023a) considered that the dimension of different view data may be inconsistent, and the fixed-dimension consensus matrix learned based on matrix factorization will destroy the exploration of complementary information. They proposed to first learn multiple dimension embedding matrices for each view based on matrix factorization. Then Wan et al. (2023b) obtained the final consensus matrix through projection matrix and the final result is obtained by using k-means on the consensus matrix. However, Wan et al. (2023a) unified the new k-means to the optimization objective to obtain clustering results directly (Pei et al. 2022).

Kang et al. (2020b) proposed to learn the graph of each view based on fixed anchors and then concatenate them. However, this method is difficult to exploit the consensus information fully. Sun et al. (2021) and Wang et al. (2021a) proposed to learn the consensus graph and anchors based on multiple dates, which can well mine the consensus information. Based on this, Liu et al. (2022b) and Chen et al. (2022b) respectively added rank constraints and orthogonal decomposition terms as constraints to learn the clustering results directly. Li et al. (2022), Wang et al. (2022) and Liu et al. (2022a) applied it to clustering incomplete multi-view datasets. Li et al. (2023) considered that noisy features in multi-view data lead to anchor shift during optimization. Therefore, it proposed to remedy the anchor shift phenomenon based on the discovery of important features. The proposed method is able to reduce the blindness of learning anchor matrix, thus increasing the robustness of learned consensus matrix and improving clustering performance. Su et al. (2023) proposed to learn a normalized consensus matrix based on multiple views, while the distance between samples and anchors is used to constrain the learning of consensus matrix. The learned consensus matrix contains not only consensus information across views, but also localized information between samples and anchors. Dai et al. (2023) proposed to factorize the projected multi-view data based on anchors and constructed the third-order tensor as a regularization term using each view’s subspace matrix. Unlike learning the consensus matrix of each view, the proposed method mines the view complementarity information through the view-specific subspace matrix. The above literature gave an intuitive interpretability to matrix factorization based on anchors. However, the uncertainty of the number of anchors and the dimension brought further challenges to improving clustering performance. To address the fact that the number of anchors as a hyperparameter in the matrix factorization takes extra time to adjust, Zhang et al. (2023) proposed to learn different subspace matrices for each view using a predefined number of sets of anchors, which are then fused by concatenate. The time complexity and the corresponding codes (if available) of the above algorithms are listed in Table 5.

Table 5 The time complexity and the corresponding codes (if available) of algorithms based on matrix factorization

It can be seen that the current clustering algorithm based on matrix factorization can effectively reduce the computational complexity. In addition, anchors provide more intuitive interpretability for matrix factorization. These advantages have attracted more and more researchers to participate in the research of clustering algorithms. However, for multi-view clustering, it is still a challenge to choose the appropriate uniform dimension of anchors because of the differences in the dimensions of each view. In addition, the choice of the number of anchors is also an open question.

3 Experiment

In the previous section, this paper introduces four methods to reduce the complexity of LSMVC algorithms and summarizes the relevant literature for each method. To further elucidate the advantages and disadvantages, nine representative algorithms are selected in this section to compare their clustering results on seven datasets (as shown in Table 6). The relationship between the algorithms TBGL (Hao et al. 2022), LMVSC (Kang et al. 2020b), FPMVC (Wang et al. 2021a), SFMC (Li et al. 2020), FMGL-MVC (Jiang and Gao 2022), EOMSC (Liu et al. 2022b), and the summarized methods are given in Table 7. In addition, three algorithms, CGL (Li et al. 2021), AMGL (Nie et al. 2016a), and DiMSC (Cao et al. 2015), are selected to compare with the above six algorithms to show that the summarized methods can be effectively used for large-scale clustering. The above nine multi-view clustering algorithms are described in detail below.

CGL points out that learning the similarity matrix directly from the raw multi-view data is inappropriate because it may contain noise and redundant information. The paper proposes discovering the similarity graph based on the spectral embedding space. Specifically, the paper first learns the original graph of each view based on adaptive neighbor graph learning and obtains the corresponding spectral embedding matrix through spectral embedding. Then, each view’s regularized spectral embedding matrix is first inner-produced with itself and composed into a corrupted tensor. The low-dimensional embedding tensor is learned from the corrupted tensor. Among them, the spectral embedding and low-dimensional embedding are unified into the objective function and co-optimized. Finally, the similarity graph is obtained based on the optimized spectral embedding matrix. The proposed optimization goal is shown below. The alternate minimizing algorithm is used to optimize each variable while fixing the others.

$$\begin{aligned} \min \sum _{i=1}^{V}-\lambda {\text {Tr}}\left( H^{(i)^{T}} A^{i} H^{i}\right) +\frac{1}{2}\Vert \textrm{B}-\textrm{T}\Vert _{F}^{2}+\Vert \textrm{T}\Vert _{w, \ast}\; \text {s.t.}\; H^{i^{T}} H^{i}=I_{c}, \end{aligned}$$
(19)

where \(H^i\) is the ith view spectral embedding matrix and \(A^i\) is the symmetric normalized view-specific similarity graph. B is the corrupted tensor and T is the low rank tensor learned from B.

AMGL proposes that the differences between views are often ignored in multi-view clustering, and some methods for learning the weights of each view often need to introduce additional hyperparameters. To solve the above problems, the paper proposes parameter-Free auto-weighted multiple graph learning, which avoids the introduction of hyperparameters by rethinking the standard spectral learning model and implicitly solving for each view-specific weight. The proposed optimization goal is shown below. The alternate minimizing algorithm is used to optimize each variable while fixing the others.

$$\begin{aligned} \min \sum _{i=1}^{V} \alpha ^{i} {\text {Tr}}\left( F^{T} L^{i} F\right) , \quad \alpha ^{i}=1 /\left( 2 \sqrt{{\text {Tr}}\left( F^{T} L^{i} F\right) }\right) , \end{aligned}$$
(20)

where F is the cluster indicator matrix. \({L}^i\) is the view-specific Laplacian matrix. Since the coefficients of each view \({\alpha }^i\) can be solved, the entire optimization goal is parameter-free.

DiMSC proposes that exploring the complementary information of multi-view data helps to describe the data more accurately and comprehensively. Still, previous methods treat each view independently and cannot fully exploit the complementarity of multi-view data. To address the above issues, the paper proposes to adopt HSIC to explore the complementary information among views. Specifically, the objective function consists of three parts. First, the paper learns the subspace matrix of each view based on the self-representation as well as the smooth representation, which is called naive multi-view subspace clustering. Then, HSIC is used to measure the difference of the subspace matrices between two different views as diversity regularization terms. Finally, the similarity matrix is obtained based on the optimized subspace matrix, and the clustering results are obtained after performing traditional spectral clustering. The proposed optimization goal is shown below. The alternate minimizing algorithm is used to optimize each variable while fixing the others.

$$\begin{aligned} {\textbf{O}}\left( {\varvec{Z}}^{({\textbf{1}})}, \ldots , {\varvec{Z}}^{({\varvec{V}})}\right) &= \sum _{i=1}^{{\varvec{V}}}\left\| X^{i}-X^{i} Z^{i}\right\| _{F}^{2}+\lambda _{S} \sum _{i=1}^{V} {\text {Tr}}\left( Z^{i} L^{i} Z^{(i)^{T}}\right) \\& \quad +\lambda _{V} \sum _{i \ne w} {\text {HSIC}}\left( Z^{i}, Z^{w}\right) , \end{aligned}$$
(21)

where \({X}^i\) is the ith view data and \({Z}^i\) is the subspace representation of ith view data. \({L}^i\) is the view-specific Laplacian matrix.

FMGL-MVC points out that the existing multi-view clustering methods suffer from high time complexity in constructing graphs and matrix decomposition, which leads to the inability to be applied to large-scale multi-view clustering. In addition, the existing methods are insufficient for mining the complementary information among views. To solve the above problems, the paper first proposes to use anchors to reduce the size of the graph matrix, which in turn reduces the time complexity of matrix decomposition. Second, the tensor Schatten p-norm regularizer is used to fully mine the complementary information among views. Specifically, for the multi-view data with sample size N, the authors first used M anchors to construct the \(N\times M\) (\(M \ll N\)) graph matrix. Second, view-specific graph matrices are used to build the 3rd tensor, and then it is rotated to reduce the time complexity. Finally, the rank constraint is used to make the learned common shared graph with a k-connected component so that the clustering results can be obtained directly without post-processing. The proposed optimization goal is shown below. The alternate minimizing algorithm is used to optimize each variable while fixing the others.

$$\begin{aligned} \begin{aligned} \min \sum _{v=1}^{V}\left( \sqrt{\sum _{i=1}^{N} \sum _{j=1}^{M}\left\| x_{i}^{(v)}-\textrm{a}_{j}^{(v)}\right\| _{2}^{2} Z_{i j}^{(v)}+\alpha \left\| Z^{(v)}\right\| _{F}^{2}}\right) +\lambda \Vert Z\Vert _{S p}^{p} \\ +\beta {\text {Tr}}\left( F^{T} \widetilde{L_{B}} F\right) \text{ s.t. } \forall i, {\varvec{Z}}^{(v)} {\textbf{1}}={\textbf{1}}, Z_{i j}^{(v)} \ge 0, {\varvec{F}}^{{\varvec{T}}} {\varvec{F}}={\varvec{I}}, \end{aligned} \end{aligned}$$
(22)

where \({x}_{i}^{v}\) is the vth view data and \({a}_{j}^{({v})}\) is the anchor points. \({Z}^{({v})}\) is the view-specific bipartite graph. \(|Z\Vert _{S p}^{p}\) is the Schatten p-norm regularizer of the 3rd tensor constructed by \({Z}^{({v})}\). \({\text {Tr}}\left( F^{T} \widetilde{L_{B}} F\right)\) is obtained from rank constraint.

TBGL points out that mining multi-view data’s consensus and complementary information is the core problem of existing multi-view clustering. The existing methods ignore the consensus information among views, resulting in the consensus and complementary information in multi-view data cannot be thoroughly mined. To solve the above problems, the paper first proposes to learn each view’s low-rank representation matrix and construct a 3rd tensor. Then, the rank of the rotated third-order tensor is constrained using the t-SVD tensor nuclear norm. Finally, the low-rank representation matrix of each view is partitioned into a consensus matrix and a view-specificity matrix, which are used to mine the consensus and complementary information between views, respectively. In this case, the consensus matrix is learned by graph regularization, and HSIC measures the differences between views. The proposed optimization goal is shown below. The alternate minimizing algorithm is used to optimize each variable while fixing the others.

$$\begin{aligned} & \min \sum _{i=1}^{V}\left\| E^{i}\right\| _{2,1}+\lambda _{1}\Vert Z\Vert _{\circledast }+\lambda _{2} \sum _{i=1}^{V} {\text {Tr}}\left( C L^{i} C^{T}\right) +\lambda _{3} \sum _{i \ne w} {\text {HSIC}}\left( D^{i}, D^{w}\right) \\ & \text {s.t.}\; X^{i}=X^{i} Z^{i}+E^{i}, Z^{i}=C+D^{i}, \end{aligned}$$
(23)

where \({E}^i\) is the error term of self-representation. \({Z}^i\) is the view-specific self-representation matrix and it is partitioned into the consensus matrix C and the view-specificity matrix \({D}^i\). \(\Vert Z\Vert _{\circledast }\) is the constraint of the 3rd tensor constructed by \({Z}^i\). \({L}^i\) denotes the view-specific Laplacian matrix.

Noting that the performance of most current multi-view clustering methods is limited by the hyperparameters imposed by multiple regularization terms in the objective function, SFMC proposes a scalable and parameter-free multi-view clustering method. Specifically, the paper first constructs the bipartite graph of each view based on anchors, then fuse the bipartite graphs with learnable coefficients to obtain the joint affinity matrix. Finally, the rank constraints in the objective function allow the optimized joint affinity matrix to have a k-connected component. Thus, the clustering results can be obtained directly without post-processing. The proposed optimization goal is shown below. The alternate minimizing algorithm is used to optimize each variable while fixing the others.

$$\begin{aligned}& \min \left\| \sum _{i=1}^{V} \alpha ^{i} B^{i}-P\right\| _{F}^{2}, \text{ s.t. } P 1=1, P \ge 0, \alpha \ge 0, \alpha ^{T} {\textbf{1}}={\textbf{1}}, \\ & {\text {rank}}\left( {\tilde{L}}_{S}\right) =n+m-k, \end{aligned}$$
(24)

where \({B}^i\) is the view-specific bipartite graph constructed by using anchor points. P is the joint affinity matrix. rank(.) indicates the rank constraint, where n is the number of instances, m is the number of anchor points and k is the category.

LMVSC points out that previous state-of-the-art multi-view clustering methods are challenging to apply to large-scale multi-view clustering due to the computational complexity of quadratic or cubic. In addition, the paper suggests that although the affinity matrix constructed based on fixed anchors can reduce the computational complexity, it is difficult to approximate the spatial structure of multi-view data and, therefore, propose a large-scale multi-view clustering method with learnable anchors. The paper approximates the multi-view data as the multiplication of the anchor matrix with the affinity matrix and measures the difference by the F-norm. Then, the optimized affinity matrix of each view is concatenated, and finally, the clustering results are obtained after post-processing. The proposed optimization goal is shown below. The alternate minimizing algorithm is used to optimize each variable while fixing the others.

$$\begin{aligned} \begin{aligned} \min \sum _{i=1}^{V}\left\| X^{i}-A^{i}\left( Z^{i}\right) ^{T}\right\| _{F}^{2}+\alpha \left\| Z^{i}\right\| _{F}^{2}, \text{ s.t. } Z^{i} \ge 0,\left( Z^{i}\right) ^{T} {\textbf{1}}={\textbf{1}}, \end{aligned} \end{aligned}$$
(25)

where \({X}^i\) is the ith view data. \({A}^i\) is the anchor point obtained by random selection or K-means clustering. \({Z}^i\) is the view-specific subspace representation.

FPMVC notes that the fixed-anchor approach can reduce the computational complexity of multi-view clustering. However, each view’s independent and fixed anchors can harm the clustering results. In addition, the hyperparameters in the objective function limit the application of the multi-view clustering method for clustering different data. Therefore, the authors propose the parameter-free multi-view clustering method. Specifically, the authors approximate the multi-view data as a multiplication of the projection matrix, the consensus anchors matrix, and the consensus anchors graph. The projection matrix realizes the projection of the consensus anchors matrix to each view, and the consensus anchors graph of multiple views can be learned based on the consensus anchors matrix. This enables the co-optimization of the consensus anchors matrix and the consensus anchors graph. Finally, the clustering results are obtained after post-processing. The proposed optimization goal is shown below. The alternate minimizing algorithm is used to optimize each variable while fixing the others.

$$\begin{aligned} & \min \sum _{i=1}^{V} \alpha _{i}^{2}\left\| X^{i}-W^{i} A Z\right\| _{F}^{2}, \text{ s.t. } \alpha ^{T} {\textbf{1}}={\textbf{1}}, Z \ge 0, \\ & W^{i} W^{(i)^{T}}={\varvec{I}}, A^{T} A={\textbf{I}}, Z^{T} {\textbf{1}}={\textbf{1}}, \end{aligned}$$
(26)

where \({X}^i\) is the ith view data. \({W}^i\) is the projection matrix. A and Z are the consensus anchor matrix and anchor graph respectively.

The motivation and methodology of EOMSC is similar to that of FPMVC. The difference is the addition of rank constraints, which allows the optimized consensus anchors graph to have a k-connected component, so that clustering results can be obtained without post-processing. The proposed optimization goal is shown below. The alternate minimizing algorithm is used to optimize each variable while fixing the others.

$$\begin{aligned} & \min \sum _{i=1}^{V} \beta _{i}{ }^{2}\left\| X^{i}-W^{i} A Z\right\| _{F}^{2}\; \text {s.t.}\; \beta ^{T} 1=1, Z \ge 0, W^{i} W^{i}={\varvec{I}}, \\ & A^{T} A={\varvec{I}}, Z^{T} {\textbf{1}}={\textbf{1}}, {\text {rank}}({\tilde{L}})=n+m-k, \end{aligned}$$
(27)

where \({X}^i\) is the ith view data. \({W}^i\) is the projection matrix. A and Z are the consensus anchor matrix and anchor graph respectively. rank(.) indicates the rank constraint, where n is the number of instance, m is the number of anchor points and k is the category.

Four commonly used comparison metrics, ACC (Accuracy), NMI (Normalized mutual information), Purity, and Fscore, are used in this paper. All the experiments were done on Intel Core i7-12700F CPU, 64GB RAM, Matlab R2017a (64-bit).

Table 6 The datasets used in this paper and the corresponding anchors rate
Table 7 The comparison algorithms and the corresponding method they use, where \(\surd\) indicates that the method is used

For the FMGL-MVC, TBGL and SFMC algorithms, the number of anchors selected for clustering each dataset is set in Table 6. The number of anchors and dimensions in EOMSC and FPMVC algorithms are set as [k 2*k 3*k 4*k 5*k 6*k 7*k], where k is the number of categories. According to the original paper, the number of anchors in the LMVSC algorithm is set to [k 50 100]. Other parameters of the nine algorithms are set strictly following the original paper. In the experimental results, “OM” indicates out-of-memory. In addition, considering that too long clustering time has no practical value, this paper specifies that clustering time over 2h is timeout and uses “OT” to indicate it.

3.1 Experimental results

Comparing the experimental results in Table 8, it can be observed as follows:

  • Comparing the three multi-view clustering algorithms without optimization, it can be seen that for the datasets “Caltech101-7”, “Caltech101-20” and “CCV”, CGL and AMGL clustering outperform DiMSC. This is because the CGL and AMGL use fewer hyperparameters and the algorithms are more robust than DiMSC. In addition, regarding algorithm design, since AMGL only makes simple improvements to spectral clustering and does not add complex performance enhancement methods, the solution process is more straightforward than that of DiMSC and CGL algorithms, and the clustering takes less time.

  • For small datasets, such as “Caltech101-7” and “Caltech101-20”, the FMGL-MVC algorithm and CGL algorithm can better capture complementary information and consensus information by constructing the third-order tensor. Thus exhibit better performance. Although the TBGL algorithm also constructs the third-order tensor, too many hyperparameters make it difficult to obtain better results.

  • The FPMVC and EOMSC algorithms perform moderately well for clustering multi-view datasets. However, introducing the learnable anchors allows them to be applied to large-scale multi-view datasets while being interpretable. In contrast, the LMVSC and SFMC algorithms use fixed anchors, which leads to poor clustering results.

  • Since the FMGL-MVC, CGL, TBGL, AMGL, and DiMSC algorithms require the construction of \(R^{n \times n}\) matrix during the solution process exceeds the running memory or time when clustering large-scale dataset YoutubeFace and NUSWIDEOBJ.

  • When clustering the datasets “CCV”, “Caltech101-all” and “SUNRGBD” using the DiMSC algorithm, it takes a great of time to solve the standard Sylvester equation and spectral clustering due to the large scale of matrix. Similarly, the CGL algorithm performs spectral clustering on full-size graphs, leading to clustering timeouts in the CCV dataset.

Table 8 Comparison of experimental results, where “OM” means out of memory and “OT” means timeout, the best result are highlighted in bold

As can be seen from Fig. 3:

  • The shortest time-consuming algorithm for clustering is SFMC, followed by EOMSC, both of which use matrix blocking. In addition, the difference between FPMVSC and EOMSC algorithms is that EOMSC uses matrix blocking to solve the rank constraint and thus obtain the clustering results directly without post-processing. It can be seen that matrix blocking can significantly reduce the clustering time-consuming.

  • FMGL-MVC and TBGL algorithms use matrix blocking and anchors-based graph construction. However, the increase in computational complexity due to the third-order tensor makes them still challenging to apply to larger-scale clustering.

  • The LMVSC and FPMVC algorithms can be effectively used for LSMVC algorithms due to the reduction in matrix scale brought by matrix factorization.

  • Both the AMGL and DIMSC algorithms do not adopt the summarized four methods. In addition, the CGL algorithm only uses third-order tensor rotation. Thus, they do not apply to large-scale clustering.

    Fig. 3
    figure 3

    Comparison of clustering time, where the out of memory and timeout are not considered

In addition, we use two large-scale datasets, YTF-100 and EMINIST, to further demonstrate the large-scale effects. The details of the two datasets are shown in Table 6. The clustering performances and the time consumption are shown in Table 9. Firstly, algorithms CGL, AMGL, DiMSC, FMGL, and TBGL exceed the running memory. Among them, algorithms CGL, AMGL, and DiMSC construct large-scale graphs for large-scale multi-view data without employing the outlined optimization methods, such as constructing graphs based on anchor points. For algorithms FMGL and TBGL, the main reason for exceeding the runtime memory is that they consume too much runtime memory when solving the 3rd tensor. Secondly, due to the linear time complexity of algorithms SFMC, LMVSC, FPMVC, and EOMSC and the non-use of large-scale matrices during the solving process, they are capable of clustering YTF-100 and EMINIST. Furthermore, although the time complexity of SFMC, LMVSC, FPMVC, and EOMSC are all linear with the sample size, the clustering time consumption differs. This is because the dimensions of the sample, the number of views and the number of anchor points have different effect on the computational complexity of different algorithms. Finally, Combining the Fig. 3 and Table 9, it can be seen that as the size of the dataset increases, the time complexity of the multi-view clustering algorithm based on the optimization method is consistent with that summarized in Tables 2, 3, 4, 5 and can be applied to larger-scale multi-view clustering tasks compared to the unoptimized algorithm.

Table 9 The clustering performance and time consumption on large-scale datasets, where “OM” means out of memory, the best clustering result are highlighted in bold

4 Discussion

In Sect. 3, nine multi-view clustering algorithms are compared, covering the four methods for reducing the computational complexity introduced in Sect. 2. Each method’s advantages and disadvantages are discussed below.

Rotation-based third-order tensor t-SVD implements the application of high-dimensional information to multi-view clustering. On the one hand, by rotation, each Frontal slice contains data from different views, thus avoiding the problem of computing only for a single view or a linear combination of multiple views in traditional multi-view clustering. On the other hand, the current research mainly focuses on exploring consensus information, while complementary information lacks mine due to the difficulty of quantitative description. Constructing the third-order tensor for multi-view data provides a new idea for exploring complementary information to improve clustering performance. However, due to the large scale of the matrix involved in the computation of the third-order tensor, it is difficult to apply to large-scale clustering. Although the anchors-based graph construction can reduce the data size and reduce the computational complexity of third-order tensor-based multi-view clustering algorithms, such as FMGL-MVC, and TBGL, to the linearity of the number of samples, the clustering large time-consuming and memory requirement make it still difficult to be used for clustering dataset such as YoutubeFace.

Anchors-based graph construction is an effective way to reduce the scale of the graph, which in turn can reduce the computational complexity of the clustering algorithms. On the one hand, the use of anchors to construct the graph of each view and then compose the third-order tensor can effectively reduce the computational complexity of third-order tensor-based multi-view clustering, making the time complexity of the algorithm, such as FMGL-MVC, TBGL algorithm, reduced to the linearity of the number of samples. On the other hand, the combination of anchors-based graph construction and matrix blocking can significantly reduce the computational complexity and be effectively used for LSMVC algorithms, such as the SFMC algorithm. Seriously, the selected anchors and the construction method directly affect the quality of the graph, thus, the clustering results. Randomly selected or k-means clustering centers are conventionally used as anchors, then the graphs are constructed based on Liu et al. (2010) and Nie et al. (2016b). However, such manually constructed graphs do not accurately depict the structure of the original multi-view data, which in turn affects the clustering results.

Matrix blocking is often used to simplify solving rank constraints to obtain the consensus graph with k-connected structures. The clustering results can be obtained directly from the structured consensus graph, so the post-processing can be omitted to reduce the computational complexity of clustering further. Since the rank constraint can be flexibly added to various clustering algorithms, matrix blocking is widely used. On the one hand, the computational complexity of solving the third-order tensor t-SVD is high, so the FMGL-MVC and TBGL algorithms use rank constraints to reduce the additional clustering complexity of the whole algorithm. On the other hand, the EOMSC and SFMC algorithm further reduce the computational complexity utilizing rank constraints, making these two algorithms efficient for clustering large-scale multi-view data. However, comparing the clustering results of the FPMVC and the EOMSC algorithm, it can be found that the clustering performance of the EOMSC algorithm is relatively poor. Therefore, it is still a worthwhile research point to obtain better and more stable clustering results by solving the rank constraint using matrix blocking.

Matrix factorization is closely related to subspace-based multi-view clustering. The idea of subspace considers that the high-dimensional data are not randomly distributed in the high-dimensional space. Still, in the subspace of the high-dimensional space, the original data can be re-represented by the linear combination of the subspace matrix, i.e., expressed in matrix factorization. The factorization of matrices can be better explained based on anchors. On the one hand, the LMVSC algorithm takes the clustering centers of k-means as anchors and learns the graph of each view. On the other hand, the FPMVC and EOMSC algorithms learn the consensus anchors and the graph of each view by mapping to mine the consensus information of multiple views better. However, choosing the appropriate number of anchors and dimensions is still an open problem. Furthermore, complementary information is as important as the consensus information for multi-view clustering, yet the current matrix factorization algorithm based on anchors has not explored it.

The four methods this paper summarizes for reducing the computational complexity of clustering large-scale multi-view data are discussed above in the context of specific algorithms. It is not difficult to find that the four methods are not mutually exclusive. For example, the FMGL-MVC algorithm uses rotation-based third-order tensor t-SVD, anchors-based graph construction, and matrix blocking. Therefore, by using multiple methods together, the computational complexity of clustering can be significantly reduced.

5 Future direction

In Sect. 4, the advantages and disadvantages of the four methods for reducing the complexity of LSMVC are analyzed. Combining them as well as the initial purpose of multi-view clustering, aiming to improve the clustering performance by the consistency and complementary information of multiple views, this section analyzes the future directions of LSMVC.

  • Explore the use of complementary information. In past studies, it was demonstrated that matrix factorization can effectively extract the consistency information of each view to improve the clustering performance. However, the solely matrix factorization lacks the utilization of the complementary information of each view. By constructing the high-dimensional connection of each view through third-order tensor rotation, it is beneficial to explore the complementary information and thus improve the clustering performance. However, it also brings an increase in computational burden. Therefore, how to reduce the computational burden while exploring the complementary information is a direction worthy of future research.

  • Explore more efficient algorithms that do not require post-processing. K-means is commonly used for post-processing of multi-view clustering. However, the clustering results are affected by the initialization. By solving the rank constraints in matrix blocking, we can directly obtain the k-connected maps, and then obtain the clustering results without post-processing. The above experiments show that this method can effectively reduce the time consumption, however, it also brings unstable clustering performance. Therefore, exploring more efficient and stable algorithms without post-processing is a worthy direction for future research.

  • Further exploration of the anchors method. The anchors method is widely used because it can effectively reduce the complexity of clustering while having interpretability. However, the dimensionality and number of anchors in the current anchors-based algorithms are still an open issue and have a large impact on the clustering performance. Therefore, further exploring the potential of the anchors method is a worthy direction for future research.

6 Conclusion

This paper summarizes four widely used methods to improve the computational efficiency of large-scale multi-view from many articles: third-order tensor t-SVD, anchors-based graph construction, matrix blocking, and matrix factorization. For each method, the introduction of the principles and the summary of the corresponding literature are given. In addition, nine multi-view clustering algorithms are selected, and the clustering results and time consumption of seven datasets are compared and analyzed. The experimental results show that constructing the third-order tensor can effectively improve the clustering accuracy. Further, combined with the anchors-based graph construction and matrix blocking, it can be applied to larger-scale multi-view datasets. Matrix factorization can cluster larger multi-view data sets compared to third-order tensor-based algorithms. In particular, the computational complexity can be further reduced by combining matrix blocking. However, choosing the appropriate number of anchors and the dimensions to improve the clustering accuracy is still a problem worth studying. Finally, the advantages and disadvantages of each method are discussed, and the potential improvement points are proposed.