Elastic Deep Sparse Self-representation Subspace Clustering Network

Subspace clustering model based on self-representation learning often use ℓ 1 , ℓ 2 or kernel norm to constrain self-representation matrix of the dataset. In theory, ℓ 1 norm can constrain the independence of subspaces, but which may lead to under-connection because the sparsity of the self-representation matrix. ℓ 2 and nuclear norm regularization can improve the connectivity between clusters, but which may lead to over-connection of the self-representation matrix. Because a single regularization term may cause subspaces to be over or insuﬃciently divided, this paper proposes an Elastic Deep Sparse Self-representation Subspace Clustering Network (EDS-SC), which imposes sparse constraints on deep features, and introduces the elastic network regularization mixed ℓ 1 and ℓ 2 norm to constraint self-representation matrix. The network can extract deep sparse features and provide a balance between subspace independence and connectivity. Experiments on human faces, objects, and medical imaging datasets prove the eﬀectiveness of EDS-SC network.


Introduction
Subspace clustering based on spectral clustering needs to learn a self-representation matrix for data dictionaries, and obtain the similarity between data points by the self-representation coefficients of samples, that is, each sample point x j can be reconstructed as a linear of other sample points {x 1 , ..., x j−1 , x j+1 , ..., x n }.This kind of method can be divided into two steps: firstly, by representing the learning model, the problem is transformed into an optimization problem with norm regularization terms by using the self-representation learning model (X = XZ) which minimize the loss of self-representation reconstruction.Secondly, solve the model to obtain the self-representation matrix Z, and use the symmetrized self-representation matrix Z as the adjacency matrix, and applied the N-cut algorithm to segment sample points.The main difference between different subspace clustering methods lies in the selection of regularization terms, in which the sparse subspace clustering SSC [1] adopts ℓ 1 norm regularization to constraint the representation matrix Z as sparse as possible, but its disadvantage is that the sparsity of the adjacency matrix may lead to oversegmentation, that is, sample points from the same subspace cannot be located in the same connected component of the correlation graph [2].Therefore, the least squares subspace clustering LSR [3] uses the ℓ 2 norm as the regularization term.One of the advantages of LSR is that the representation matrix C is usually dense, which alleviates the connectivity problem of the SSC method.However, it is easy to connect the sample points of different subspaces, resulting in the sample points being difficult to be divided.Similarly, low-rank subspace clustering based on nuclear norm [4] also has limitations in application.In order to alleviate the problem caused by single-form regularization, Zhu et al. [5] propose sparse low-rank subspace clustering, and introduce elastic net regularization with mixed ℓ 1 and kernel norms, which has better subspace representation ability than SSC.You et al. [6] give a geometric explanation of the relationship between subspace independence and connectivity, and analyze the role of elastic net regularization, but there is no work to apply Elastic Net regularization to deep subspace clustering.
Deep autoencoder has achieved great success in unsupervised representation learning, Ji et al. [7] first proposed a deep subspace clustering network, combined with deep autoencoder and self-representation model by introducing self-representation layer to learn a self-representation matrix in the latent feature space extracted by deep autoencoder.
Peng et al. [8] proposed structured deep subspace clustering, introduced a priori sparse constraint to learn a self-representation matrix, and maintained the subspace structure of the original data.Zhou et al. [9] proposed deep subspace clustering that maintains the distribution of potential features, and used the loss of distribution consistency to make the distribution of data in the original space and feature space as same as possible.Kang et al. [10] used the relationship between samples to guide the representation learning of the deep autoencoders, and learn the self-representation matrix adaptively with the data.Zhou et al. [11] integrated generative adversarial network and deep subspace clustering, and proposed deep adversarial subspace clustering.The above-mentioned deep subspace clustering methods all adopt a single form of ℓ 1 or ℓ 2 regularization, and do not restrict the deep features extracted by autoencoders, which may lead to insufficient information of the learned self-representation matrix.
Based on the above discussion, this paper proposes elastic deep sparse selfrepresentation subspace clustering network (EDS-SC).EDS-SC constrains deep feature extracted by autoencoder to be sparse to facilitate the study of self-representation matrix.At the same time, the Elastic Net regular term constraint is applied to the selfrepresentation matrix in the self-representation layer of the potential feature space, so that the final constructed adjacency matrix has distinguishable information for spectral clustering.The main contributions of this paper are as follows: (1) Applying sparse regular constraints to the potential features extracted by the autoencoder and transforming the original high-dimensional data samples from the original space into sparse nonlinear feature space, which is helpful to subspace clustering.
(2) Elastic Net regularization term is used to learn the distinguishable information of self-representation matrix class, to better balance the subspace to keep connected with the subspace, and to alleviate the deficiency caused by the single form of regularization.
(3) The effectiveness of the proposed method is verified on a variety of data sets, and there are some improvements in object and face data sets.In addition, we also apply the algorithm to medical images.We cluster the CT images of different COVID19 patients and different ordinary patients (Non-COVID19).As far as we know, there is no work to apply unsupervised clustering to the diagnosis of COVID19.

Subspace clustering based on self-representation learning
Given a data matrix X = [x 1 , x 2 . . ., x n ] ∈ R d×n , d is the characteristic dimension of the sample, and n is the number of samples.The core of the subspace clustering method based on self-representation learning is to use the self-representation model X = XZ to learn the self-representation matrix Z, where Z measures the similarity between samples.Ideally, the similarity matrix Z is a block diagonal matrix, i.e., there is no correlation between clusters.The basic framework of the model is: where Z ∈ R n×n represents the similarity between the original samples, and the constraint that the diagonal matrix is 0 is to prevent the trivial solution from being represented by the sample itself.After solving the coefficient matrix Z through the optimization framework, the adjacency matrix Z * = (|Z| + Z T )/2 is constructed to calculate the Laplace matrix L. most subspace clustering algorithms use principal component analysis (PCA) to obtain the eigenvectors of the Laplacian matrix, and then combined with the classical clustering algorithm such as k-means to obtain the related subspace segmentation.f (•) corresponds to different matrix norms: ℓ 1 norm corresponds to sparse subspace clustering, nuclear norm corresponds to low-rank subspace clustering, and F-norm corresponds to least square subspace clustering algorithm.

Regularization theory
In machine learning algorithm, Bias and variance affect the accuracy of the model together.The error caused by bias is defined as the difference between the prediction (or average prediction) of the model created and the true value.The error caused by variance is defined as the variability of the predicted value of a given data point.The bias is used to measure the distance between the predicted value and the real value, while the variance describes the range of the predicted value and describes the degree of dispersion (i.e., the distance from the expected value).Figure 1 is a visualization of bias and deviations [12].High bias can easily lead to underfitting of the model, and high variance can lead to overfitting of the model.Due to the limited representation learning ability, the structure and parameter settings of the machine learning algorithm may lead to locally optimal solution.The regularization method can reduce the complexity of the model, avoid overfitting, and adapt the model to different tasks.Researchers have proposed a variety of regularization methods.Among them, ℓ 1 regularization, ℓ 2 regularization, and Elastic Net regularization are more commonly used, and they are defined as follows: (1)ℓ 1 regularization (Lasso regression) Given samples X and labels y,β is the weight matrix, ℓ 1 regularization introduces regularization term λ ∥β∥ 1 on the basis of objective loss function, and learns the weight sparse matrix β, λ is a parameter that balances the objective loss and the importance of the regularization.The objective function is as follows: (2)ℓ 2 regularization (Ridge Regression) ℓ 2 regularization(also called ridge regression) adds regularization term λ ∥β∥ (3)Elastic Net regularization (Elastic Net [6]) Elastic Net regularization integrates the ℓ 1 and ℓ 2 regularization, which can balance the sparsity and connectivity of the weight matrix β.Elastic Net regularization method is very useful when multiple features are related to another feature.The objective function is as follows: where α ∈ [0, 1], α is the factor controlling the regularization ratio of ℓ 1 and ℓ 2 , α∥β∥ 1 + (1 − α)∥β∥ 2 2 is exactly the convex linear combination of ℓ 1 regularization term and ℓ 2 , regularization.When α = 1, it is the ℓ 1 norm regularization.When α = 0, it is the ℓ 2 norm regularization.When α ∈ (0, 1), Elastic Net integrates the advantages of ℓ 1 and ℓ 2 to achieve elasticity.By adjusting λ and α, it can better adapt to the characteristics of different data.

Autoencoder
The purpose of the autoencoder is to solve the unguided problem of back propagation and automatically obtain potential features from specific data sets based on the supervision of its own data.By encoding and decoding the input to reconstruct the data, it can essentially be used as a feature extractor [13].With the development of deep learning applications, autoencoders are gradually combined with deep framework to optimize parameters through multi-layer neural network training in a self-supervised way.Assuming that the data X = [x 1 , x 2 , ..., x n ] ∈ R d×n is the original input data, where d is the dimension of the sample and n is the number of samples, take the two-layer coding layer as an example, the potential feature space obtained by coding is: where W (1) and b (1) , W (2) and b (2) are weights and biases of the first and the second coding layers respectively, f 1 and f 2 is the activation function of the corresponding layer, Y is the latent feature extracted by the encoding layer decoded.The original input data itself is used to guide feature extraction and the potential feature Y is decoded to reconstruct to obtain X.The goal of the autoencoder is to make X and X as close as possible.The objective function is as follows: where Θ e is the parameter of the encoder, g(•) is the nonlinear mapping of the encoder, Θ d is the parameter of the decoder, h(•) is the nonlinear mapping of the decoder.For example, g(•) can be a convolution operation, while h(•) is the deconvolution operation, and X is a function of the parameters {Θ e , Θ d }.

Deep subspace clustering algorithm based on autoencoder
In subspace clustering, the method of finishing deep feature extraction and selfrepresentation learning at the same time is called deep subspace clustering.It is different from the traditional subspace clustering method in that it studies the subspace structure after feature extraction.Peng et al. [14] proposed a structured deep autoencoder (Struct AE), which introduces a priori sparse or non-sparse constrained learning self-representation matrix Z as a priori guide for encoder dimensionality reduction to potential space.Clustering in potential feature space, the objective function of optimization is as follows: min where the first item of the model is to optimize the reconstruction error, the smaller the better, X is a function of the parameter {Θ e , Θ d , Z}.However, the deficiency of this method is that the priori self-representation matrix Z depicts the relationship between the original high-dimensional samples, and the data of the original samples may be redundant, so the role of a priori guidance may not be ideal.
The self-representation matrix can be used as a priori constraint to better lead the learning of neural network parameters, and it can also be learned by combining neural networks as parameters.Ji et al. [7] proposed a Deep Subspace Clustering (DSC) model that combines autoencoder and self-representation model, which is mapped to the latent feature space Y by the encoder, and the self-representation layer is introduced between the encoder and the decoder.The projection points in the latent space are used as nodes, so that the nodes can represent each other, and Y and Z are solved by minimizing the reconstruction error and the representation error.The objective function is as follows: where ∥Z∥ p is regularization term, which can be ℓ 1 or ℓ 2 regularization.DSC trains the autoencoding network and the self-representation model uniformly, and learns the selfrepresentation coefficient of the sample with latent features.The self-representation matrix learns the relationship between samples after feature extraction, which can better describe the similarity between samples in low-dimensional space, which is closer to the essential relationship of data, which is conducive to spectral clustering.

Elastic Deep Sparse Self-representation Subspace Clustering Network
Inspired by the deep subspace clustering based on the autoencoder, this paper extends the DSC-Net framework and proposes an elastic deep sparse self-representation subspace clustering (EDS-SC).Based on ℓ 2 regularity can effectively control the overfitting of training model, while based on ℓ 1 regularity can constrain the sparsity of selfrepresentation matrix, they are opposite in the function of balancing connectivity and independence of subspace.In general, ℓ 2 regularity can make some parameters close to 0, and ℓ 1 regularity can make some parameters equal to 0. Combining these two regularity, elastic networks can achieve both regular and sparse [15].In order to sparse the deep features extracted by the autoencoder, the sparse constraint on the potential feature space Y is introduced into the objective function, and a sparse deep subspace clustering algorithm based on elastic network regularity is proposed.The objective function is as follows: min where is the reconstruction error of the autoencoder, is the self-representation error of subspace clustering of latent feature space passing through the coding layer.
F is a regularization term based on Elastic Net, the regularization terms of Lasso regression and ridge regression can be adjusted by the parameter α EN of the elastic network.ζ S = ∥Y ∥ 1 encourages the deep features extracted to be sparse, so that the potential features can help to improve the distinguishability between different clusters.By substituting the parameters, we can get the EDC-SC objective function as follows: (10) where α SC is the balance parameter of the error of the self-representation layer in subspace clustering, α EN is the proportion of the two regularization terms that adjust the elastic network, while λ is set to control the balance parameters of these two parts, α S is the equilibrium parameter of sparse feature.Θ e and Θ d are the network parameters of the encoding layer and the decoding layer respectively.The network structure is shown in figure 2.
In the above objective function, the elastic net regular can weigh ℓ 1 norm and Fnorm, so that the self-representation matrix can keep the subspace independent as well as the subspace internally connected.The learning of self-representation matrix can be optimized by adjusting parameters and joint control.But the reference [7] only considers the regularity of a single form, they consider two kinds of regularization for self-representation matrix Z: (1) The ℓ 1 norm is denoted as DSC-Net-L1.( 2 ℓ 2 norm as regular term, namely DSC-Net-L2.In the subspace clustering based on spectral graph learning, the general constrained self-representation matrix is sparse, low-rank or has group effect.However, the single application of these methods is easy to cause subspace structure imbalance and is not conducive to spectral clustering.In this paper, the algorithm integrates two kinds of regular terms to optimize the selfrepresentation matrix of the potential feature space.Considering that the image has no effective constraint information after convolutional coding, the sparse constraint of the coding layer is introduced.The extracted sparse feature Y is more helpful to learn the subspace structure of the self-representation model.The algorithm flow of EDS-SC is shown in Algorithm 1. Assume that the network has 2N + 1 layers, N layers for encoder and decoder respectively, and a self-representation layer.If the number of channels in the i − th layer network of the encoder is c i (c 0 = 1), the kernel size is k i ×k i , then the parameters of each layer have k 2 i c i−1 c i .Because the encoder and decoder are symmetrical, the total number of network kernel parameter connections is The number of encoder biases is N i=1 c i , the number of biases for the decoder is 1 + N −1 i=1 c i .The self-representation layer is mainly to solve the self-representation matrix.Assuming that the number of samples of the data set is n, the number of parameters of the selfrepresentation layer is n 2 .Therefore, the total number of parameters of the EDS-SC algorithm is The input X is mapped to the potential space Y by the encoder, and the Y Z is obtained by self-representation learning, which is used by the decoder to get X.The objective function (10) is used to train the neural network, fine-tune the parameters of the autoencoder network and learn the self-representation matrix Z.   Fig. 3: Example images on the data sets in this experiment.

Experimental setup
The EDS-SC network proposed in this paper is composed of a sparse convolutional encoder, a self-representation layer of Elastic Net regularization and a deconvolutional decoder.The stride of the convolution kernel is 2, the learning rate is η = 10 −3 , and ReLU activation function is used.Each input image is mapped from the sparse convolutional coding layer to the latent feature space Y .In the self-representation layer, nodes fully connect using linear weights and no bias and nonlinear activation.Finally, the latent feature space is reconstructed back to the original size through the deconvolution decoder.The experiment is based on the Tensorflow-1.15framework and a dual GTX 2080Ti GPUs environment.Firstly, the sparse autoencoder is pretrained and the network parameters are initialized, and then the network is fine-tuned according to the elastic net regularization and deep sparse feature items of EDS-SC.
For each data set, select the value of the equilibrium parameter α EN of the elastic net regularization from {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}, and the deep sparse feature parameter is fixed to α S = 1.This paper compares shallow and deep methods that are closely related to the development of subspace clustering, including the following: (1) SSC, LRR: Sparse subspace clustering and low-rank representation subspace clustering.
(3) EDSC: Efficient Dense Subspace Clustering (EDSC) [17], by introducing a new dictionary variable to approximate the original data, using the dictionary variable to learn a representation coefficient matrix with group effects, and constraining the sample points belonging to the same subspace to be close connection.
(4) AE+SSC: The convolutional autoencoder AE extracts nonlinear features firstly, then uses the extracted potential features as new samples, and uses SSC clustering, and the two processes are separated.
Clustering evaluation index adopts clustering accuracy (ACC) [18] and normalized mutual information (NMI) [19].The calculation formula of ACC is: where n is the number of samples, s i and r i are the true labels and predicted labels respectively.The map(•) function maps the class labels obtained by clustering to the class labels equivalent to the real class labels.Generally, the Kuhn-Munkres algorithm can be used to obtain the results.When s i = map(r i ), δ(s i , map(r i )) = 1, otherwise δ(s i , map(r i )) = 0. Assume that Ω = w 1 , w 2 , ..., w k is a partitioned set obtained through a clustering algorithm, C = c 1 , c 2 , ..., c k is the division set of real clusters of the original data, NMI is defined as follows: where I(Ω; C) is mutual information, H(Ω) is the entropy of Ω, (H(Ω) + H(C))/2 normalizes the mutual information.I(Ω; C) and H(Ω) are defined as follows: where P (w k ) and P (c j ) are the probability that the sample belongs to class w k and the probability that the sample belongs to class c j respectively, P (w k ∩ c j ) is the probability of belonging to both class w k and class c j .

Experimental Result
The original size of the EYale B dataset image is 192×168, and we down sampling it to 42×42.Similarly, we down sampling ORL from 112×92 to 32×32, and COIL20 to 32× 32.The size of CT image in the COVID19 dataset is 1024×1024, and we down sampling it to 224 × 224 and normalize it.According to the different characteristics of data sets, the specific network structure design of each data set is different.The convolution core size, the number of channels per layer and the size of the self-representation matrix Z of the four datasets are shown in Table 1.
From the experimental results in Table 2, as the number of experimenters in the experiment increases, the accuracy of clustering decreases.Compared with the traditional subspace clustering method, the result of the deep subspace clustering method is 10%-30% higher.Compared with deep subspace clustering DSC-L1 and DSC-L2, the accuracy of EDC-SC is higher under various cluster numbers.Theoretically, Elastic Net can balance the connectivity and independence between self-representation matrix classes and classes, and the deep sparse representation in the algorithm can constrain some outliers and noise.
Table 3 shows the clustering results of all 4 image datasets on different comparison methods, in most cases, the proposed EDS-SC method performs the best.In addition, the following conclusions can be drawn from the table : (1) Compared with the traditional shallow subspace clustering SSC and KSSC based on kernel mapping, the clustering result of KSSC has not been completely improved, indicating that the method of kernel trick is not necessarily effective.Compared with the shallow subspace clustering method, the deep subspace clustering method has a greater improvement in the clustering results, which is due to the powerful representation learning ability of the autoencoder neural network.
(2) On the basis of DSC, RGRL introduces sample relationship to guide representation learning, and the clustering results have been improved.Both methods perform better when based on ℓ 2 regularization.The EDS-SC proposed in this paper introduces Elastic Net regularization term and deep feature sparse term, which improves the accuracy of EYaleB, ORL, COIL20 and COVID19 by 0.47%, 1.75%, 0.98% and 3.62% respectively compared with RGRL-L2 with sub-optimal clustering results.(3) EDS-SC plays the most significant role in improving the COVID19 dataset.Because of the complex structure of medical images, EDS-SC introduces sparse terms of deep features, and extracting sparse potential features of medical images is helpful to clustering.In addition, there are some deep subspace clustering code that is not open source.This paper compares the experimental results of COIL20 data sets in this paper, as shown in Table 4.
StructAE maintains the original high-dimensional sample subspace structure, which is not conducive to learning low-dimensional subspace structure.DPSC keeps the distribution of samples before and after feature extraction consistent, and the clustering accuracy is high.DASC generates "fake" samples in the sampling layer and learns the subspace structure in the process of adversarial generation, but the time complexity is high and it is not easy to converge.The method EDS-SC in this paper combines Elastic Net regularization and autoencoder deep feature sparse items, and the clustering accuracy and standard mutual information are 97.99% and 98.10%, respectively, which are better than the single-form regularization DSC and RGRL series methods.The clustering results show that the effectiveness of the method in this paper.The energy of COIL20 similarity graph is mainly distributed on diagonal and its very small neighborhood.Because the object in COIL20 rotates every 5 degrees to take a picture, so each image is only similar to the nearest neighbor angle image, but the energy distribution still satisfies the block diagonal property, and the similarity in each class is relatively uniform, which is less noisy than the ORL data set.It can be seen from figure 4 that the adjacency matrix A learned by EDS-SC method has a better block diagonal structure and has fewer noise points than a single form of regular RGRL-L1 or RGRL-L2.Because the feature of the EDS-SC constraint encoder is sparse, the performance of the self-representation matrix based on the sparse feature is better.

Parametric analysis
The EDS-SC model has 4 parameters α SC ,α EN ,α S ,λ, where the sparse feature parameter α S is fixed to 1. Firstly, we explore the influence of the value of the balance parameter α EN of the Elastic Net regularization that constrains the self-representation matrix on the clustering accuracy.α EN is selected from the range of The EDS-SC method corresponds to a single form of ℓ 1 or ℓ 2 regularization when α EN = 0 or α EN = 1, respectively.It can be seen from Figure 5 that the clustering performance is better when α EN is in the middle of 0.1-0.9, which shows the effectiveness of Elastic Net regularization.
In the experiment, parameter analysis shows that the best clustering results of four data sets EYaleB, COIL20, ORL and COVID19 are α EN equal to 0.4, 0.2, 0.7 and 0.5, respectively.Therefore, taking COIL20 and ORL as examples, discuss the influence of self-representation error parameter α SC and regular parameter λ on the clustering results when selecting the optimal α EN .It can be seen from Figure 6 that when the balance parameter α SC is between 4-10 and the parameter λ is between 0.001-0.1, the clustering accuracy of COIL20 and ORL is better.

Ablation experiment
The innovation of EDS-SC proposed in this paper is divided into two parts: Elastic Net regularization can balance the connectivity and independence of the self-representation matrix, and deep sparse feature items can constrain outliers and noise.This section conducts an ablation study on the effectiveness of these two parts of the work: (1) Only Elastic Net regularization term, no feature sparse term, at this time α S = 0, the algorithm is marked as ED-SC.(2) Only feature sparse term, no Elastic Net regularization term, when α EN = 1, it corresponds to DS-SC-L1, when α EN = 0, it corresponds to DS-SC-L2.
In order to explore the effectiveness of the two parts of the work, the clustering accuracy of the above three types of methods and the time it takes to train one epoch are calculated respectively, as shown in Table 5.
The following conclusions can be drawn from the table: (1) Compared with DSC-L1 and DSC-L2, the clustering results of ED-SC based on Elastic Net regularization are increased by 0.40%, 2.70%, 0.25%, and 1.55%, respectively.It can be seen that Elastic Net regularization can take a better combination between ℓ 1 and ℓ 2 regularities.While DS-SC-L1 or DS-SC-L2 based on deep feature sparse items increases by 0.49%, 2.79%, 2.00%, 4.35% respectively, which is more effective than ED-SC.Theoretically, feature sparsity constraints can constrain outliers and noise, and help to learn more effective latent features, especially in the complex medical imaging dataset COVID19, which has the best improvement effect.
(2) The EDS-SC proposed in this paper integrates the Elastic Net regularization term and feature sparse term, and the clustering accuracy rate reaches 97.99% on COIL20 and 93.12% on COVID19.Compared with the DSC-L2 without these two items, the improvement effect is significant, which verifies the effectiveness of the method proposed in this paper.
(3) The EDS-SC method only takes 0.21s, 0.08s, 0.02s and 0.15s longer to train an epoch than DSC-L1 on the four datasets.The images of ORL and COIL20 datasets are not as complex as the medical CT images of COVID19, so the added time is relatively small.Because of the large number of training samples in the EYaleB dataset, the added time cost is the largest.

Conclusion
With the high-dimensional characteristics of the data, it is often not conducive to capture the essential geometric structure of the data, so the clustering result is not ideal.Under the theoretical condition, the ℓ 1 norm helps to maintain the affinity within the subspace, and the regularization of ℓ 2 and nuclear norm can improve the group effect and connectivity of the sample points, but only maintain the affinity between the independent subspaces.Elastic Net regularization can balance subspace independent maintenance and subspace connectivity, and prevent excessive segmentation or excessive group effects.This paper proposes the EDS-SC algorithm, which balances the intra-class consistency and inter-class separation of class data points through the Elastic Net regularization, and sparsely extracts the features from the autoencoder to obtain deep sparse features.Experiments on data sets in many fields show that the algorithm in this paper has good clustering performance and is suitable for highdimensional data sets.However, in this paper, how to balance the two regularizations and the visualization of sparse feature selection in Elastic Net regularization has not been studied and experimented.This will be an important direction and goal of future research.Although deep autoencoder networks have been applied to subspace clustering, deep subspace clustering frameworks have the problem of loading all images into one batch, which is memory-consuming for neural networks and is not conducive to updating.Therefore, studying scalable deep subspace clustering will be a future work direction.
loss function, and makes the weight matrix β close to zero but not necessarily equal to zero, which has grouping effect, and the objective function is as follows:

4 Experiment and result analysis 4 . 1
Experimental DatasetsFour experimental data sets are used in our experiments, including the face dataset Extend Yale B (abbreviated as EYaleB) and ORL, the object dataset COIL20, and COVID19 dataset.EYaleB includes face images of 38 people, and each person takes 64 images under different lighting conditions, a total of 2432 images.ORL contains 40 subjects and for each subject there are 10 samples with different facial expressions (eyes open/eyes closed, smiling/not smiling) and facial details (glasses/no glasses) under different lighting conditions.COIL20 contains 1440 grayscale images of 20 objects, and each object is shot on the turntable at 5 • angle intervals.COVID-19 consists of 17 patients, and each with 20 lung CT images including 200 confirmed and 140 unconfirmed.Some examples of the datasets are shown in Figure3.Among them, the upper two rows of the COVID19 dataset are non-COVID-19 patients, and the lower two rows are images of patients with COVID-19.After comparison, it can be found that the CT images of patients with COVID-19 have characteristics such as ground glass.

Figure 4
Figure4takes COIL20 (first row) and ORL (second row) as examples to show the adjacency matrix A learned by the EDS-SC network proposed in this paper.Ideally, A should have a block diagonal structure.Each block diagonal structure of the similarity matrix diagram corresponds to a class and corresponds to a subspace.The similarity matrix reflects the relationship between data points, and the denser the relationship, the greater the correlation.Select RGRL-L1 and RGRL-L2 with the second best clustering results to compare with the EDS-SC method in this paper.The energy of COIL20 similarity graph is mainly distributed on diagonal and its very small neighborhood.Because the object in COIL20 rotates every 5 degrees to take a picture, so each image is only similar to the nearest neighbor angle image, but the energy distribution still satisfies the block diagonal property, and the similarity in each class is relatively uniform, which is less noisy than the ORL data set.It can be seen from figure4that the adjacency matrix A learned by EDS-SC method has a better block diagonal structure and has fewer noise points than a single form of regular RGRL-L1 or RGRL-L2.Because the feature of the EDS-SC constraint encoder is sparse, the performance of the self-representation matrix based on the sparse feature is better.

Fig. 4 :
Fig. 4: Visualization of affinity matrix A on the COIL20 and ORL datasets.

Fig. 6 :
Fig. 6: The influence of parameters α SC and λ on accuracy.
Algorithm 1 EDS-SC Algorithm Require: sample X,parameter {α SC , α EN , α S , λ},clustering number k,learning rate η,maximum epochs number M .Ensure: self-representation matrix Z,autoencoder network parameter Θ e , Θ d , clustering result.1: Initialization: Pre-train deep sparse autoencoder network AE without introducing self-presentation layer and initialize the network parameters, set epoch number m = 0.The objective function is min Calculate the k eigenvectors corresponding to the first k eigenvalues of L as the columns of the new sample matrix, and normalize to a new sample matrix whose rows are samples.3.Apply k-means clustering to the new sample matrix to get the cluster label of the image and return.

Table 2 :
Clustering accuracy (in %) on Extended Yale B

Table 4 :
Clustering accuracy of deep subspace clustering method in COIL20 dataset (%)

Table 5 :
Comparison of experimental results for ablation experiments