Robust semi-supervised non-negative matrix factorization for binary subspace learning

Non-negative matrix factorization and its extensions were applied to various areas (i.e., dimensionality reduction, clustering, etc.). When the original data are corrupted by outliers and noise, most of non-negative matrix factorization methods cannot achieve robust factorization and learn a subspace with binary codes. This paper puts forward a robust semi-supervised non-negative matrix factorization method for binary subspace learning, called RSNMF, for image clustering. For better clustering performance on the dataset contaminated by outliers and noise, we propose a weighted constraint on the noise matrix and impose manifold learning into non-negative matrix factorization. Moreover, we utilize the discrete hashing learning method to constrain the learned subspace, which can achieve a binary subspace from the original data. Experimental results validate the robustness and effectiveness of RSNMF in binary subspace learning and image clustering on the face dataset corrupted by Salt and Pepper noise and Contiguous Occlusion.


Introduction
The processing and application of the original high-dimensional data are very challenging. To address this problem, many data dimensionality reduction methods were applied to solve image retrieval image indexing [1,2] and image classification [3]. To achieve a satisfactory subspace from dimensionality reduction, most of the studies mainly consider how to discover an effective low-dimensional representation from the original data. A common scheme is to dig out the geometrical structure information of the original data, which can lead to a more discriminative representation.
In the past decades, several dimensionality reduction techniques were presented such as principal components analysis (PCA) [4] and non-negative matrix factorization (NMF) [5], which can learn an effective subspace for classification and clustering. NMF decomposes an original non-negative matrix into two non-negative matrices, such that their product approximates to the original data matrix. The non-negative property is consistent with the humans perception, which is more meaningful in image representation. Due to the satisfactory performance of NMF, some extensions [6][7][8][9][10][11][12][13][14][15][16][17][18][19] were proposed and utilized to improve the clustering effect.
Traditional NMF is an unsupervised method and cannot be designed for clustering specially. To achieve the better clustering effect, some constraints (i.e., label propagation, manifold learning, pairwise constraint, etc.) were considered to constrain the subspace, which can learn a more effective parts-based representation. When the original data are heavily corrupted, NMF fails to achieve clustering. This is because its loss function is more sensitive to outliers and noise. Therefore, some researchers [6][7][8][9][10][11][12] proposed some other loss functions to try better matrix factorization. Gao et al. [9] presented a capped norm as the loss function and an outlier threshold to reduce outliers. However, there is no theory to adjust the threshold. Recently, Guan et al. [12] proposed the Truncated Cauchy loss (CauchyNMF) as the loss function and the three-sigma-rule to filter outliers. Although CauchyNMF learns a better subspace from the original data space contaminated by outliers and noise than other NMF methods, it leads to an unsatisfactory subspace when the outliers cannot follow the Gaussian distribution.
After dimensionality reduction by non-negative matrix factorization, the parts-based representation composed of real numbers will take much more time in clustering. Recently, data-dependent hashing methods [20][21][22][23][24][25][26][27][28] were put forward to learn the latent features of training data and achieved effective binary codes from different hash functions. It is obvious that a subspace composed of binary codes (−1 and 1 or 0 and 1) can reduce the clustering time. However, the traditional NMF cannot learn a parts-based representation composed of binary codes.
In this paper, based on data-dependent hashing methods, non-negative matrix factorization, and manifold learning, a novel dimensionality reduction method is presented to learn a subspace composed of binary codes from the original data space. Our achievements are as follows: • A robust non-negative matrix factorization framework was proposed to remove outliers in the subspace. Moreover, the learned subspace composed of binary codes obeys the geometrical assumption of the original data. • Our problem can be formulated as a mixed integer optimization problem. We transform it into several subproblems and elegantly solve these subproblems.
• Extensive experiments prove that our method can learn a subspace composed of binary codes from the dataset corrupted by Salt and Pepper noise and Contiguous Occlusion. Moreover, the clustering performance from the subspace can demonstrate that our method can achieve the better clustering effect than other dimensionality reduction methods.

Non-negative matrix factorization and its extensions
Supposed that any image can be represented by a vector x i ∈ R m and the matrix V = [x 1 , . . . , x n ] denotes an original image space composed of n images. NMF can be utilized to discover two low-dimensional matrices W ∈ R m×r and H ∈ R r ×n , such that their product can best approximate to V , where r is a factorization rank and r << min{m, n}. Generally, NMF can be mathematically formulated as follows: where the function Loss is to measure the error between V and W H. The usual loss function can be L 1 norm, L 2,1 norm, or the Frobenius norm. Guan et al. [12] put forward a Truncated Cauchy loss to reduce outliers (CauchyNMF), which can be written by the following form: where The scale parameter σ can be computed by three-sigma-rule, and the truncation parameter γ can be achieved by the Nagy algorithm [12]. Most of NMF variants utilize various loss function to handle outliers, but these approaches cannot remove outliers in the subspace. To address this problem, a novel robust NMF framework was put forward as follows: where M is the original data matrix contaminated by outliers, E is an error matrix, λ is a hyper-parameter, and is the constraint term. Based on problem (3), Zhang et al. [11] proposed the following robust NMF problem: where E M = i j |e i j |.

Data-dependent hashing methods
Assumed that an image sample is expressed by the vector v ∈ R m and the original data space is showed by the matrix V = [v i , · · · , v n ] ∈ R m×n . Data-dependent hashing methods expect to find a binary code matrix B ∈ {−1, 1} L×n , which can maintain the semantic similarities of the data space. Usually, each column of B is L-bit codes for each v, where L << m.
To make full use of the label information of the original data, Shen et al. [25] put forwarded a supervised discrete hashing framework, which can generate binary codes with satisfactory linear classification. Supposed that an original and P ∈ R m×L . SDH are able to sum up by: where φ(·) is the RBF kernel mapping, μ and λ are penalty parameters. For the purpose of solving problem (5), we can optimize the following three subproblems: and and until the stop condition is satisfied. Thus, we are able to obtain the local optimal solution of problem (5).
In response to this problem (6), we have: In response to this problem (7), we make the following assumptions: • z T is the lth row of B and B is the matrix of B not including z.
We can safely come to the conclusion that: With regard to problem (8), the solution is: Consequently, it is easy to search the local optimal solution of the problem (5) using (9), (10), and (11).

Problem formulation
When the original data are destroyed by outliers and noise, existing NMF methods have the following shortcomings: (1) They are unable to learn an effective and powerful subspace from the original data space. (2) The parts-based representation is unable to retain the geometrical structure information of the original data. (3) These NMF methods are unable to learn a subspace with binary codes. For problem (4), Zhang et al. [11] assumed that the outliers of the error matrix E are very sparse. Yet, the abnormal position of the original data is ignored. Supposed that we become aware of some outlier locations. For an image space M ∈ R m×n , a weighted graph S marks the location of outliers by the following equation: Hence, we rewrite the constraint on E as follows: To obtain the geometric information in subspace, manifold regularization can be used to establish the relation between the original data space and the subspace. Therefore, a common used method called manifold learning [29] is as follows: where tr is the trace of a matrix, In summary, combining (13), (14), and (4) results in our robust semi-supervised non-negative matrix factorization for binary subspace learning (RSNMF). Given a non-negative data matrix V ∈ R m×n and a factorization r , one hopes to achieve a code matrix B ∈ {−1, +1} r ×n from V . Our proposed robust semi-supervised non-negative matrix factorization (RSNMF) can be utilized to learn binary codes for clustering. There are three properties as follows: • The learned subspace can remove outliers and noise similar to (4). • The subspace composed of binary codes can be learned from the data space similar to (5). (5) is a supervised problem; therefore, we delete the first two terms of (5) to be an unsupervised problem. • The low-dimensional space composed of binary codes should remain similarity or dissimilarity of the original data space similar to (14).

Optimization scheme
Problem (15) is a non-convex optimization problem. Thus, it is unable to search the global optimal solution. A generic framework for solving problem (15) is the "block-coordinatedescent" method (BCD) [30], in which one block variables are solved in order under relevant constraints and the remaining variables remain fixed. Thus, problem (15) can be converted into several convex problems and solve them in turn until convergence. For (15), we have five block variables W , H , B, P, and E. Thus, BCD is able to optimize the five matrices in turn. Supposed that the kth solution of problem (15) has been realized. The k + 1th solution can be searched by: and and and P k+1 = arg min P B k+1 − P H k+1 2 F (19) and It is easy to get the solution of problems (4) and (4) as follows: For (18), we can utilize Nesterov's optimal method [31] to solve it. To save space, we do not introduce this algorithm. For problem (19), we have the optimal solution: or Using a function RRC in [25], Eq. (24) can be realized . Therefore, the solution of (19) can be realized by: For problem (20), using the discrete cyclic coordinated descent method, we can achieve its local optimal solution. First, problem (20) can be converted into the following form: where K = D − U and C ∈ R n×n is an identity matrix. Second, some assumptions are made as follows: Thirdly, B can be learned column by column. The first term of problem (26) is rewritten as follows: Similar to the second term of problem (26), we are able to conclude that: Therefore, problem (20) can be rewritten by: This problem brings about the following optimal solution: Obviously, each z can be calculated from the pre-learned B in advance. Hence, we can implement B before each z is updated. In [25], it is recommended (30) be used to learn the binary code matrix B in t times, where t = 5.

Algorithm 1 RSNMF
Computing H by [31] Computing P by (25) Computing B by (30) end for
ORL contains a set of the frontal face images, and these images were established from 1992 to 1994 by Cambridge University. There are 40 various persons, and each person includes 10 images. Each image were taken at different facial expressions, times, and so on. The format of each image is PGM with the size of 92 × 122 pixels. We scale down each image to 32 × 32 pixels. YALE was constructed by the Center for Computational Vision and Control of Yale University. There are 165 images of 15 persons, and each person contains 11 pictures. Each image was taken by different facial expressions or configurations with the size of 100 × 100 pixels. Some example images from ORL and YALE are presented in Fig. 1.
To verify the clustering ability on the corrupted data, we propose two corruptions including Salt and Pepper noise and Contiguous Occlusion. Salt and Pepper noise is utilized to change a portion of pixel values to be 0-255. The corrupted percentage of pixels is from 0.05 to 0.8 with the step size (a) Some sample images from ORL (b) Some sample images from YALE Fig. 1 Sample images from ORL and YALE 0.05. Contiguous Occlusion randomly corrupts a block of each image and the pixels of the block is filled with 255. The corrupted block size is proposed to be 1 to 24 with the step size 2. Supposed that r = 32, alpha = 0.1, γ = 1e −5 , λ = 100, and iter = 200. We propose Accuracy (AC) and Normalized Mutual Information (NMI) [32] to validate the clustering effect of each algorithm. Figure 2 presents the clustering performance on ORL and YALE when the two datasets are contaminated by Salt and Pepper noise. From these figures, we observe that:  Figure 3 presents the clustering performance on ORL and YALE when the two datasets are contaminated by Contiguous Occlusion. From these figures, there are some interesting points as follows:

Contiguous occlusion
• NMF, PCA, and RNMF_L1 achieve satisfactory clustering results when the block size is very small. • CauchyNMF achieves the best AC and NMI with the smaller block size (i.e., the block size less than 6). As the block size is greater than 10, CauchyNMF performs worse rapidly.
• Although RSNMF performs not better than CauchyNMF in the beginning, it remains stable clustering results as the block size increases.

Conclusion
This paper presented a robust semi-supervised non-negative matrix factorization for binary subspace learning (RSNMF) to handle Salt and Pepper noise and Contiguous Occlusion. The clustering performances demonstrate that our method achieves the following advantages. First, RSNMF can learn a more effective and discriminative parts-based representation composed of binary codes from ORL and YALE corrupted by Salt and Pepper noise and Contiguous Occlusion. Second, RSNMF is more robust to outliers than the existing dimensionality reduction methods. tion of Chongqing (Grant No. cstc2019jcyj-bshX0101), Foundation of Chongqing Three Gorges University, and National Science Foundation (NSF) Grant #2011927 and DoD Grant #W911NF1810475.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.