PMSSC: Parallelizable multi-subset based self-expressive model for subspace clustering

Subspace clustering methods which embrace a self-expressive model that represents each data point as a linear combination of other data points in the dataset provide powerful unsupervised learning techniques. However, when dealing with large datasets, representation of each data point by referring to all data points via a dictionary suffers from high computational complexity. To alleviate this issue, we introduce a parallelizable multi-subset based self-expressive model (PMS) which represents each data point by combining multiple subsets, with each consisting of only a small proportion of the samples. The adoption of PMS in subspace clustering (PMSSC) leads to computational advantages because the optimization problems decomposed over each subset are small, and can be solved efficiently in parallel. Furthermore, PMSSC is able to combine multiple self-expressive coefficient vectors obtained from subsets, which contributes to an improvement in self-expressiveness. Extensive experiments on synthetic and real-world datasets show the efficiency and effectiveness of our approach in comparison to other methods.


I. INTRODUCTION
In many real-world cases, approximating high-dimensional data as a union of low-dimensional subspaces is a beneficial technique for reducing computational complexity and the effects of noise.The task of subspace clustering [1], [2], which is the segmentation of a set of data points into those lying on certain subspaces, has been studied in many practical applications such as face clustering [3], image segmentation [4], motion segmentation [5], scene segmentation [6], and homography detection [7].Recently, self-expressive models [8], [9] have been explored, which embrace the self-expressive property of subspaces to compute an affinity matrix.The self-expressive property states that each data point from a union of subspaces can be represented as a linear combination of other points.Specifically, given a data matrix X ∈ R D×N in which each data point is a column, the self-expressive model of data point x i ∈ R D can be described as where c i ∈ R N is a coefficient vector, and the constraint c ii = 0 avoids the trivial solution of representing a point as a linear combination of itself.The feasible solutions of Eq. (1) are generally not unique because the number of data points lying on a subspace is larger than its dimensionality.However, at least one c i exists where c ij is nonzero only if data points x i , x j are in the same subspace, and such a state is called subspace-preserving [10].Previous works have tried to compute subspace-preserving representations by imposing a regularization term on the coefficients c i .In particular, one algorithm for obtaining a sparse solution to Eq. ( 1), sparse subspace clustering (SSC) [8], [9], can recover subspaces under mild conditions by regularizing the coefficient matrix C := [c 1 , . . ., c N ] ∈ R N ×N corresponding to the coefficient vector of each data point x i .SSC not only achieves high clustering accuracy for datasets with outliers and missing entries, but also has the useful properties of giving theoretical guarantees and providing modeling flexibility, which have influenced many other approaches such as [11], [12].However, SSC suffers from high computational and memory costs when dealing with a large-scale dataset because of the need to determine the O(N 2 ) coefficients of C. In light of these problems, there has been much interest in recent years in developing scalable subspace clustering algorithms that can be applied to large-scale datasets, taking advantage of the ease of analyzing computational complexity due to the simplicity of the model.Several works have attempted to address the problem of computational cost for large-scale datasets using a sampling strategy, motivated by the sparsity assumption that each data point can be represented as a linear combination of a few basis vectors.The self-expressive property with a few sampled data points and classifying of the other data points was proposed in [13].While this strategy can produce clustering results more efficiently for a large-scale dataset than directly applying SSC to all data, it leads to poor clustering performance when the sampled data is not representative of the original dataset.Although a learning-based sampling method has also been proposed for generating a coefficient matrix that is representative of the original dataset [14], the accuracy and computational complexity still depend largely on the size of the subsets, as these methods attempt to solve for a selfexpressive model in a single subset.Also, no effort has been made to explicitly improve the self-expressiveness of the selfexpressive coefficient vectors in these methods.
To further improve self-expressiveness without increasing the computational burden, in this paper, we propose a selfexpressive model adopting multiple subsets, which is computable in parallel.Specifically, our model obtains a selfexpressive coefficient matrix by combining multiple subsets; each subset consists of only a small proportion of the samples.This strategy not only enjoys the benefit of low computational cost like other single subset-based methods, but also is more effectively subspace-preserving because the representation of the original data is a linear combination of multiple selfexpressive coefficient vectors.
Our contributions are highlighted as follows: • a novel clustering approach that exploits a self-expressive model based on multiple subsets, • a concisely formulated model, • each subset can be computed independently in parallel without additional computational overhead, • extensive experiments on both synthetic data and realworld datasets showing that our proposed method can achieve better results without increasing processing time.

A. Background
In the past few years, there has been a surge of spectral clustering-based algorithms that segment a set of data points by performing spectral clustering.Previous classical methods, such as k-subspaces [15] and median k-flats [16], assume that the dimensionalities of the underlying subspaces are given in advance.This latent knowledge is generally hard to access in many real-world applications.In addition, these methods are usually non-convex and thus sensitive to initialization [17], [18].Aiming to relax the limitations of the k-subspace algorithm, the majority of modern subspace clustering methods explored have turned to spectral clustering [19], [20], which segment data using an affinity matrix that captures whether a certain pair of data points lie on the same subspace.While many early methods [21], [22], [12], [23] achieve better segmentation than classical clustering algorithms even without the latent knowledge, these methods produce erroneous segmentation results for data points near the intersection of two subspaces due to the dense sampling of points lying on the subspace [24].We now introduce previous subspace clustering approaches based on spectral clustering, then describe various techniques of scalable subspace clustering methods for dealing with large-scale datasets, which are closer to our proposed method.

B. Subspace Clustering Using Spectral Clustering
Most subspace clustering approaches based on spectral clustering consist of two phases: (i) computing an affinity matrix based on the nonzero coefficients that appear in the representation of each data point as a combination of other points, and (ii) segmenting data points from the computed affinity matrix by applying spectral clustering.The key to the success of segmentation is the phase of computing the affinity matrix.Therefore, many methods have been proposed to compute the affinity matrix.For example, local subspace affinity [24] and spectral curvature clustering (SCC) [25] find neighborhoods based on the observation that a point and its k-nearest neighbors often lie on the same subspace.However, the computational complexity of finding multi-way similarity in these methods grows exponentially with the number of subspace dimensions, motivating the use of a sampling strategy to lower the computational complexity [9].Recently, the selfexpressive model, which employs the self-expressive property in Eq. ( 1), has become the most popular one.In particular, SSC takes advantage of sparsity [26] by adopting ℓ 1 norm regularization of the coefficient vector to achieve high clustering performance.This idea has motivated many methods, using the ℓ 2 norm in least squares regression [27], the nuclear norm in low rank representation (LRR) [28], the ℓ 1 plus ℓ 2 norm in elastic net subspace clustering (EnSC) [29], and the Frobenius norm in efficient dense subspace clustering [30].In practice, however, solving the ℓ p norm minimization problem for large-scale data may be prohibitive.Also, the memory required becomes larger as the amount of data increases.

C. Scalable Subspace Clustering
When constructing the affinity matrix, several methods based on spectral clustering suffer from high computational complexity.To reduce the computational complexity of this phase, a sparse self-expressive model adopting a greedy algorithm was proposed in [31], [10].However, these approaches lead to unsatisfactory clustering results if the nonzero elements do not contain sufficient connections within each optimized coefficient vector [32].Other popular approaches to alleviate the computational and memory loads were inspired by a sampling strategy.In [13], scalable sparse subspace clustering (SSSC) is computationally efficient, using a subset generated by random sampling.However, because the random sampling method relies on a single subset, data points from the same subspace will not be represented by the self-expressive model if they are not appropriately sampled.Exemplar-based subspace clustering [33], [12] is an efficient sampling technique that iteratively selects the least well-represented point as a subset to address the problem.Selective sampling-based scalable sparse subspace clustering (S 5 C) [14], which generates a subset by selective sampling, provides approximation guarantees of the subspace-preserving property.In [34], the subspacepreserving representations are found by solving a consensus problem with multiple subsets to improve the connectivity of the affinity matrix.In [35], a divided-and-conquer framework using multiple subsets obtained by separating the entire dataset is proposed.While this approach can deal with large-scale data, final segmentation results depend on the self-expressive properties of the optimized self-expressive coefficient vectors of each subset.Our method differs significantly from [35] and [34] in that our proposed self-representation model is designed to minimize the difference from the original data points by combining the self-expressive property of multiple subsets.Lastly, in this paper, we limit our discussion to nondeep learning approaches, which are more mathematically straightforward to explain and rely less on parameter tuning.

A. Problem and Approach
As a problem definition, our final goal is to find the selfexpressive coefficient vector c i , which satisfies the subspace-preserving representation in Eq. (1).That is, the self-expressive residual can be obtained by solving the following optimization problem, where ∥ • ∥ 0 is the ℓ 0 pseudo-norm that returns the number of nonzero entries in the vector.This optimization problem has been shown [36], [37] to recover provably subspace-preserving solutions using the orthogonal matching pursuit (OMP) algorithm [38].s is a tuning parameter for the OMP algorithm, which controls the sparsity of the solution by selecting up to s entries in the coefficient vector c i .Although the OMP algorithm is computationally efficient and is guaranteed to give subspace-preserving solutions under mild conditions, it is unable to produce a subspace-preserving solution with a number of nonzero entries exceeding the dimensionality of the subspace [10].This leads to poor clustering performance with too sparse affinity between data points, especially when the density of data points lying on the subspace is low.
We propose a novel subspace clustering algorithm with a parallelizable multi-subset based self-expressive model, as illustrated in Fig. 1.Sec.III-B introduces our proposed selfexpressive model that extends the model in Eq. ( 2) to multiple subsets via a sampling technique.Sec.III-C then explains the solution of our self-expressive model by the OMP algorithm.Finally, we summarize the proposed subspace clustering algorithm in Algorithm 4.

B. Parallelizable Multi-Subset based Self-Expressive Model
To deal with large-scale data, we first generate T index subsets from the whole dataset by weighted random sampling [39] as follows: where I (t) is the index set of the t-th subset that is sampled with probability proportional to the elements of the weight vector is the sampling rate, and n(•) is the cardinality function that is a measure of the number of elements.The t-th selected element of w (t) is updated as w i .Then, in each sampled t-th subset, the optimization problem in Eq. ( 2) can be expressed as follows: where i in the t-th subset.Note that to ensure the dimensionality of c (t) i is N , the columns of each data matrix X (t) corresponding to the non-sampled indices are replaced by zero-vectors: i can be represented by a self-expressive model, given by: Given Y and x i , solve for b * (i) via Algorithm 3; is the data point computed by the self-expressive model from the t-th subset.In practice, however, the data point y (t) i in Eq. ( 5) generally has an error term z i , i.e., y (t) i = x i +z i , because of the limitations of using X (t) as a dictionary for reconstruction.To minimize z i , we first represent x i as a linear combination of y (t) i , as follows: where b (i) ∈ R T is the weight coefficient vector to represent x i , and b t ∈ R is the t-th entry of b (i) .The coefficient vector b (i) of the linear combination in Eq. ( 6) can be obtained by solving the following optimization problem, For simplicity, we introduce a data matrix Y = [y (5) as columns, and rewrite Eq. ( 7) as This is the formulation of the optimization problem for subspace clustering in Eq. ( 2), and can be further described as: Unlike in Eq. ( 1), here Y is the data matrix computed from each subset to represent x i .Thus, no constraint is required to avoid the trivial solution of representing a point as a linear Update the residual r via Eq.( 13); Update the residual r via Eq.( 16); 7: combination of itself.To explicitly express Eq. ( 2), the selfexpressive coefficient vector c * i corresponding to X is obtained by It is worth noting that each c * (t) can be determined independently from each subset, so can be computed in parallel for speed.

C. Optimization with Orthogonal Matching Pursuit
In this section, we show that the parameters of the proposed PMS model can be determined by dividing the optimization problem into two small optimization problems as summarized in Algorithm 1. Overall, Eq. ( 10) can be determined by solving the minimization problems in Eqs. ( 4) and (8).We introduce both of the minimization procedures below.Specifically, initially, T subset data matrices {X (t) } T t=1 are generated based on the sampling set I (t) in Eq. (3).
To efficiently solve for c * (t) i in Eq. ( 4), we introduce Algorithm 2 based on the OMP algorithm.The support set and the residual are initialized to S = ∅ and r = x (t) i , respectively.S denotes the index set, which is updated on each iteration by adding one index j * .j * is computed using is found by solving the following problem: otherwise, such that supp(c where supp(c i ) is the support function that returns the subgroup of the domain containing the elements not mapped to zero.r is updated using: This process is repeated until the number of iterations k reaches its limit s or r is smaller than the error ϵ.
To find b * (i) , Eq. ( 8) can also be solved via the OMP algorithm, as shown in Algorithm 3. The input data matrix Y ∈ R D×T is generated by Eq. ( 5) from c * (t) i . Note that the size of Y depends on the number of subsets T and is much smaller than the number of data points.The maximum number of repetitions is T , and S is updated by finding the index j * satisying In addition, the weight coefficient vector b * (i) and update of r are determined by solving For clarity, we summarize the entire framework of our proposed subspace clustering approach in Algorithm 4, calling it the parallelizable multi-subset based sparse subspace clustering (PMSSC) method.Given X and parameters T , δ, s, and ϵ, the optimal solution C * can be found using Algorithm 1.We thus define the affinity matrix as A = |C * | + |C * T | using the computed C * ; the final clustering results can be obtained by applying spectral clustering to A via normalized cut [19].

IV. EXPERIMENTS AND RESULTS
We have evaluated our approach using both synthetic data and real-world benchmark datasets.
A. Baselines and Evaluation Metrics.
We compare our approach to the following eight methods: SCC [25], LRR [28], thresholding-based subspace clustering (TSC) [40], low rank subspace clustering (LRSC) [41], SSSC [13], EnSC [29], SSC-OMP [10], and S 5 C [14].Tests for all comparative methods used provided source code, and each parameter was carefully tuned to give the best clustering accuracy.For spectral clustering, except for SCC, S 5 C, and SSSC, we applied normalized cut [19] to the affinity matrix A = |C| + |C T |. (SCC and S 5 C have their own spectral clustering phase, while SSSC obtains clustering results from the data split into two parts).Unlike SSC-OMP, our method, which involves independent calculation for each subset, can be implemented in parallel with multi-core processing.All algorithms ran on an AMD Ryzen 7 3700x processor with 32 GB RAM.Following [10], as quantitative evaluation metrics, we evaluated each algorithm using clustering accuracy (acc: a%), subspace-preserving representation error (sre: e%), connectivity (conn: c), and runtime (t seconds).Clustering accuracy represents the percentage of correctly labeled data points: where π is a permutation of the L cluster groups.Q est and Q true are the estimated labeling result and ground-truth, respectively, scoring one in the (i, j)th entry if a data point j belongs to the i-th cluster and zero otherwise.The subspacepreserving representation error indicates the average fraction of affinities formed from other subspaces in each c j , where ω ij ∈ {0, 1} is the true affinity, and ∥ • ∥ 1 returns the ℓ 1 norm.The connectivity shows the average connection of the affinity matrix with L cluster groups as follows: where λ 2 is the second smallest eigenvalue of the normalized Laplacian for each of the L subgraph, and λ (i) 2 indicates the algebraic connectivity for each cluster.If c = 0, then at least one subgraph is not connected [42].

B. Experiments on Synthetic Data 1) Setup:
We first report experimental results on data synthesised by randomly generating five linear subspaces of R 6 as ground-truth in an ambient space of R 9 .Each subspace contains n randomly sampled data points.To confirm the statistical results, we conducted the experiments by varying n from 100 to 4,000, so the total number N of data points varies from 500 to 20,000.We set the parameters s = 6, δ = 0.3, and T = 16.The percentage of in-sample in SSSC is set to 10% of the total number of data points.All experimental results recorded on synthetic data were averaged over 50 trials.2) Results: The curve for each metric is shown as a function of n in Fig. 2. We can observe from Fig. 2(a) that PMSSC outperforms SSC-OMP in terms of clustering accuracy.The difference is especially large when the density of data points on the underlying subspace is lower.This could be partly due to the fact that PMSSC succeeds in generating better connectivity than SSC-OMP (see Fig. 2(c)), and achieves lower subspace-preserving representation error (see Fig. 2(b)).On the other hand, as Fig. 2(d) shows, PMSSC is much faster with parallel implementation, which is advisable for solving problems involving large-scale data.In addition, compared to SSSC which adopts a similar sampling approach to our method, PMSSC outperforms both in clustering accuracy and runtime (using a parallel implementation).

C. Experiments on Benchmark Datasets for Real-world Applications 1) Setup:
We conducted experiments on five benchmark datasets: Extended Yale B [43] and ORL [44] for face clustering, BBCSport [45] for text document clustering, German Traffic Sign Recognition Benchmark (GTSRB) [46] for street sign clustering, Modified National Institute of Standards and Technology database (MNIST) [47] and Extended MNIST (EMNIST) [48] for handwritten character clustering, and Canadian Institute For Advanced Research (CIFAR-10) [49] for object clustering.Parameter settings used for our method in these experiments are shown in Table I.Since the sparsity s in PMSSC and SSC-OMP is related to the intrinsic dimensionality of the subspace, it is manually determined to be close to the dimensionality of the subspaces.For sampling rate δ,    2) Extended Yale B: Extended Yale B contains 2,432 facial images in 38 classes; see Fig. 3(a).In this experiment, following [9], we concatenated the pixels of each image resized to 48 × 42, and used the 2016-dimensional vector as input data.
The results on Extended Yale B are shown in Table II.In each column, the best result is shown in bold, and the second-best result is underlined.They confirm that PMSSC yields the best clustering accuracy, and improves the clustering accuracy over SSC-OMP by 6.42%.Although the subspacepreserving error and runtime are slightly lower than SSC-OMP, the connectivity is greatly improved compared to SSC-OMP, leading to a better clustering accuracy.LRR, LRSC, and S 5 C have good connectivity, but poor subspace-preserving errors result in low clustering accuracy.
3) ORL: ORL contains 400 facial images in 40 classes, as shown in Fig. 3(b).In this experiment, following [50], we concatenate the pixels of each image resized to 32 × 32, and use a 1024-dimensional vector as input data.Compared to Extended Yale B, ORL is a more difficult problem setting for subspace clustering because the density of data lying near the same subspace (10 vs. 64) is lower due to the small number of images of each subject, and the subspaces have more nonlinearity due to changes in facial expressions and details.
The results for ORL are listed in Table III.We can again observe that PMSSC achieves the best clustering accuracy, and improves the connectivity compared to SSC-OMP.However, since PMSSC does not incorporate nonlinear constraints, the subspace-preserving error does not improve along with the improvement of the connectivity.
4) GTSRB: GTSRB contains over 50,000 street sign images in 43 classes; see Fig. 3(c).Following [33], we preprocess the dataset represented by a 1568-dimensional HOG feature to get an imbalanced dataset of the 500-dimensional vectors with 12,390 samples in 14 classes.
The results on GTSRB are reported in Table IV.Again PMSSC yields the best clustering accuracy and runtime, and improves the clustering accuracy roughly by 10% compared to SSC-OMP.In particular, PMSSC has both good subspacepreserving error and connectivity.While EnSC and SSSC also achieve competitive clustering accuracy, their computational costs are much higher.5) BBCSport: BBCSport contains 737 documents in five classes.The data provided by the database has been preprocessed by stemming, stop-word removal, and low term frequency filtering.In this experiment, we reduced the dimensionality of feature vectors to 500 by PCA.
The results on BBCSport are summarized in Table V.We can observe that PMSSC yields the second best clustering accuracy and subspace-preserving error.LRSC yields the best clustering accuracy due to good connectivity.For small-scale datasets such as BBCSport and ORL, the speed of PMSSC is slightly lower than for SSC-OMP because the advantage of reducing data size by sampling multiple subsets is diminished.
6) MNIST and EMNIST-Letters: MNIST contains 70,000 images of handwritten digits (0-9), while EMNIST-Letters contains 145,600 images of handwritten characters in 26 classes, as shown in Figs.3(d) and 3(e).In our experiments, following [34], we generate MNIST4000 and MNIST10000, which are produced by randomly sampling 400 and 1,000 images per class of digit, respectively.Each image is represented as a 3472-dimensional feature vector by using the scattering convolution network [51], and its dimensionality reduced to 500 by PCA.
The results on MNIST and EMNIST-Letters are summarized in Tables VI-VIII.We can observe that PMSSC yields the best clustering accuracy on MNIST4000, MNIST10000, and EMNIST-Letters.In particular, PMSSC is remarkably faster than the comparative methods on MNIST70000 and EMNIST-Letters.In the case of MNIST70000, EnSC yields the best clustering accuracy and subspace-preserving error but its computational cost is high.Similarly, S 5 C can achieve good connectivity, but is very slow.7) CIFAR-10: CIFAR-10 includes 60,000 general objects in 10 classes, as illustrated in Fig. 3(f).Following [52], we employ the feature representations extracted by MCR 2 [53], which learns a union of low-dimensional subspaces representation via self-supervised learning.Each feature is represented by a 128-dimensional feature vector, further normalized to have unit ℓ 2 norm.
The comparative results on CIFAR-10 are summarized in Table IX.It can be observed that our method outperforms others in terms of the runtime, while the clustering accuracy is competitive.However, as with SSC-OMP, we see that the connectivity is lower than for S 5 C, which uses ℓ 1 norm regularization.
8) Summary: Overall, our proposed method becomes significantly faster as the amount of input data increases.In addition, it achieves good clustering accuracy and connectivity, and provides subspace-preserving errors comparable to those of the comparative algorithms.

D. Analysis
1) Multi-subset Based Self-Expressive Model: Since our approach aims to minimize the self-expressive residual by the weight coefficient vector b * solved in Algorithm 3, we show the mean self-expressive residual of data points represented by the coefficient vectors in Fig. 4.This experiment was performed on synthetic data, and we fixed T = 10 and δ = 0.3.Each blue bar indicates the mean self-expressive residual of the data points represented by Eq. ( 5), computed as The red bar indicates the mean self-expressive residual of the data points represented by Eq. ( 9), computed as: kz (1) k2 kz (2) k2 kz (3) k2 kz (4) k2 kz (5) k2 kz (6) k2 kz (7) k2 kz (8) k2 kz (9) k2 kz (10) k2 kzk2 Error term We can clearly observe that the mean self-expressive residual of PMS has lower error than every single subset.To highlight the benefit of b * , we made a comparison to a variant of our approach, named PMSSC(avg), which replaced b * by a simple average operation: in PMSSC(avg), Eq. ( 10) is replaced by We performed experiments on synthetic data using the same setup as for Fig. 2 and present the results in Fig. 5.As can be seen, incorporating b * improves clustering performance; in particular, the subspace-preserving representation error is significantly reduced.These experiments indicate that the weight coefficient vector b * contributes to improving selfexpressiveness.
2) Selection of Parameters: We performed multiple experiments on the GTSRB dataset with various choices of hyperparameters (T, δ) to evaluate the sensitivity of our approach to parameter choice.Changes in clustering accuracy, subspace-preserving representation error, connectivity, and runtime when varying each parameter are illustrated in Fig. 6.We can confirm that high clustering accuracy and low subspace-preserving representation error are maintained in most cases, except when both T and δ are extremely small.This implies that the affinity matrix constructed by PMSSC provides subspace-preserving representations at most data points.We can also see that the connectivity improves as the number of subsets T increases, because the affinity matrix contains at most sT N nonzero entries in OMP optimization.Considering runtime, a practical choice of parameters is to increase T for small values of δ, and decrease T for large values of δ.In addition, time taken can be kept low by picking a small value of δ for large-scale datasets.
3) Sampling Technique: Our approach adopts weighted random sampling to generate the subset data matrix X (t) .To analyze the effect of sampling methods on our approach, we compared weighted random sampling to random sampling with uniform weights.The experimental settings used for synthetic data follow those in Fig. 2. Fig. 7 shows the clustering accuracy and connectivity as functions of n.Obviously, weighted random sampling outperforms random sampling in terms of both clustering accuracy and connectivity.In particular, as the density of data points increases, the connectivity of the method with random sampling becomes zero, because imbalanced sampling leads to a disconnected subgraph in an affinity graph.
4) Computational Complexity: Algorithms 2 and 3 for affinity matrix construction consume most of the processing time.In Algorithm 2, the computational complexity for finding the self-expressive coefficient vector c * (t) i requires time O(Ds⌈δN ⌉).In Algorithm 3, the computational complexity for finding the weight coefficient vector b * (i) requires O(DT 2 ).Because these two algorithms are performed on N data points, the computational complexity of PMS requires at least time O(N (T Ds⌈δN ⌉ + DT 2 )).However, processing T subsets (the part taking O(T Ds⌈δN ⌉)) can be performed in parallel, which reduces the computation time compared to methods that directly deal with the whole dataset.Fig. 2(d) supports this analysis.

V. CONCLUSIONS
We have proposed a parallelizable multi-subset based selfexpressive model for subspace clustering.A representation of the input data is formulated by combining the solutions of small optimization problems with respect to multiple subsets generated by data sampling.We have shown that this strategy can significantly improve speed with a multi-core approach that can be easily implemented, especially for large-scale data.Moreover, it has been verified that combining multiple subsets can reduce the self-expressive residuals of data compared to a single subset.Extensive experiments on synthetic data and real-world datasets have demonstrated the efficiency and effectiveness of our approach.As a limitation, our method is still unable to handle nonlinear subspaces due to the problem setting.In future, we would like to design a self-expressive model that can handle nonlinear subspaces, with the help of modeling capabilities from neural network architectures.

Fig. 1 .
Fig.1.Overview of our self-expressive model.Given a data matrix X in which each data point x i is a column, our approach represents a self-expressive model over the entire data by combining multiple subsets generated by sampling (Algorithm 1).Specifically, our method computes the self-expressive data point y (t) i by solving for the self-expressive coefficient vector c * (t) ifor each point x i in T subsets (Algorithm 2).Then, the self-expressive properties of the entire data are obtained by solving for b * using Y with each data point y (t) i computed from each subset as columns (Algorithm 3).

Fig. 2 .
Fig. 2. Comparison of PMSSC, SSC-OMP, and SSSC on synthetic data in terms of (a) clustering accuracy, (b) subspace-preserving representation error, (c) connectivity, and (d) runtime.For SSSC, only clustering accuracy and runtime are shown as SSSC does not generate the self-expressive coefficient matrix and affinity matrix.

Fig. 4 .Fig. 5 .
Fig. 4. Comparative results in terms of the mean residuals over data points represented by the self-expressive models with different subsets.Blue bars represent each single t-th subset, while the red bar is computed using multiple subsets.
Optimization for the parallelizable multi-subset based self-expressive model (PMS) Input: Data matrix X ∈ R D×N , number of subsets T , sampling rate δ, maximum number of repetitions s, error term ϵ 1: Generate T index subsets {I (t) } T t=1 via Eq.(3); 2: Generate T subset data matrices {X (t) } T Algorithm 1 Algorithm 2 OMP algorithm for finding c * (t) iInput: Data matrix X (t) ∈ R D×N , reference data point x , support set S = ∅; 2: while k < s and ∥r∥ 2 > ϵ do i

TABLE I PARAMETERS
(s, δ, AND T ) USED IN PMSSC FOR BENCHMARK DATASETS.

TABLE II COMPARATIVE
RESULTS ON EXTENDED YALE B; '-' INDICATES THE METRIC CANNOT BE COMPUTED.

TABLE IV COMPARATIVE
RESULTS ON GTSRB.

TABLE IX COMPARATIVE
RESULTS ON CIFAR-10 WHERE 'M' MEANS THAT THE MEMORY LIMITATION OF 32G IS EXCEEDED.