Collaborating filtering using unsupervised learning for image reconstruction from missing data
 313 Downloads
Abstract
In the image acquisition process, important information in an image can be lost due to noise, occlusion, or even faulty image sensors. Therefore, we often have images with missing and/or corrupted pixels. In this work, we address the problem of image completion using a matrix completion approach that minimizes the nuclear norm to recover missing pixels in the image. The image matrix has a low rank. The proposed approach uses the nuclear norm function as a surrogate of the rank function in the aim to resolve the problem of rank minimization that is known as an NPhard problem. It is an adaptation of the collaborating filtering approach used for users’ profile construction. The main advantage of this approach is that it uses a learning process to classify pixels into clusters and exploits them to run a predictive method in the aim to recover the missing or unknown data. For performance evaluation, the proposed approach and the existing matrix completion methods are compared for image reconstruction according to the PSNR measure. These methods are applied on a dataset composed of standard images used for image processing. All the recovered images obtained during experimentation are also dressed to compare them visually. Simulation results verify that the proposed approach achieves better performances than the existing matrix completion methods used for image reconstruction from missing data.
Keywords
Image reconstruction Biclustering Matrix completion Unsupervised learning Prediction Rank function Nuclear norm function Surrogate modelAbbreviations
 ALM
Augmented Lagrange multiplier
 FPCA
Fixed point continuation algorithm
 IALM
Inexact augmented Lagrange multiplier
 PCA
Principal component analysis
 PG
Proximal gradient
 PPG
Partial proximal gradient
 PSNR
Peak signaltonoise ratio
 SVT
Singular value thresholding
 SVD
Singular value decomposition
1 Introduction
The reconstruction of missing pixels from an incomplete image is a very active research area in image processing. A simple model for such problem can be defined as follows: given an image which is incomplete, i.e., it has missing pixels, the purpose is to fill its missing pixels based on some observed pixels. In analogy with matrix completion problem, the problem of recovering missing pixels in an image can be referred to as image completion problem.
In this work, we are interested in recovering missing pixels from an incomplete image using a matrix completion method based on the minimization of the nuclear norm of a matrix. The nuclear norm minimization is a category of lowrank matrix approximation methods. Mathematically speaking, given an incomplete image X, missing values are estimated given observed pixels: {D_{ij}/i,j∈Ω} where Ω denotes the set of observed entries. The common assumption is that the matrix should be lowrank (most images have low rank). Then, a direct approach is to minimize the rank of the matrix with certain constraints. This problem is NPhard; a convex relaxation is often used to make the minimization tractable. As the rank function is simply the number of nonvanishing singular values, the most appropriate choice is to replace the rank function with the nuclear norm. Therefore, the proposed approach is based on the nuclear norm minimization that is the surrogate model of the rank minimization.
The approach used in this work is proposed in [1] for users’ profile construction. It uses a matrix completion method based on nuclear norm optimization of the matrix to predict users’ preferences about items. A biclustering process is adopted to detect users’ clusters and items’ clusters in the aim to promote the personal relevancy concept [2]. It applies the prediction process on the ratings given by users that share almost the same preferences.
The main problem with recovering missing data in images is the sparcity of the matrix that modelize them. With the same principle, we adapt the users’ profile construction method to recover the missing pixels. The obtained experimental results proved the efficiency of the proposed prediction process. The proposed approach is applied on a benchmark that contains standard images for image processing. They are graylevel images that have different histograms. The obtained results are compared visually to those obtained by applying different nuclear norm optimization algorithms. The peak signaltonoise ratio (PSNR) measure is also calculated for each recovered image.
The remaining of this article is organized as the follows. Section 2 presents the role of nuclear norm minimization in the optimization of lowrank matrices. It exposes then the problem statement and explains the proposed approach. It also reviews the related works. Section 3 addresses the experimental protocol and discusses the obtained results. The conclusion closes the paper.
2 Methods
2.1 Minimization of lowrank matrices using nuclear norm minimization
In the area of engineering and applied science such as machine learning and computer vision, a wide range of problems can be or have been represented under lowrank minimization framework, since the lowrank formulation seems to be able to capture the loworder structure of the underlying problems.
In many practical problems, one would like to guess the missing entries of an n_{1}×n_{2} matrix from a sampling Ω of its entries. This problem is known as the matrix completion problem. It comes up in a great number of applications including those of collaborating filtering. The collaborating filtering is the task of automatic predicting of the entries in an unknown data matrix. A popular example is the movie recommendation case where the task is to make automatic predictions about the interests of a user by collecting taste information from its formal interests or by collecting them from other users.
In mathematical terms, this problem is posed as follows:
A data matrix \(X\in \mathbb {R}^{n_{1} \times n_{2}}\) is the matrix to be known as much as possible. The only information available about it is a sampling set of entries M_{ij},(i,j)∈Ω, where Ω is a subset of the complete set of entries {1,..,n_{1}}×{1,..,n_{2}}.
Very few factors contribute to an individual’s tastes. Therefore, the problem of matrix completion is an optimization problem of a lowrank r matrix from a sample of its entries. The matrix rank satisfies r≤ min(n_{1},n_{2}). Such a matrix is represented by counting n_{1}×n_{2} numbers but has only r×(n_{1}×n_{2}−r) degrees of freedom. When the matrix rank is small and its dimension is large, then the data matrix carries much less information than its ambient dimension suggests. In the case of collaborative prediction movie recommendation system, users—rows of the matrix—are given the opportunity to rate items—columns of the data matrix. However, they usually rate very few ones so there are very few scattered observed entries of this data matrix. In this case, the usersratings matrix is approximately lowrank, because as mentioned, it is commonly believed that only very few factors contribute to an individual’s tastes or preferences. These preferences are stored in a user profile [1]. In the same analogy, matrix completion can be used to restore images with missing data. From limited information, we aim to recover the image, i.e., infer the many missing pixels.
2.2 Problem statement
The data known in M is given by P_{Ω}(M). The matrix X is recovered then from P_{Ω}(X) if it is the unique matrix of rank less or equal to r and consistent with the data.
where the nuclear norm ∥X∥_{∗} is defined as the sum of its singular values: \(\Vert X\Vert _{*} = \sum _{i} \sigma _{i}(X)\).
Since the nuclear norm ball {X:∥X∥_{∗}≤1} is the convex hull of the set of rankone matrices with spectral norm bounded by one, authors in [4] interpret that under suitable conditions, the rank minimization program (2) and the convex program (3) are formally equivalent in the sense that they have exactly the same unique solution.
Matrix completion problem is not as ill posed as thought. It is possible to resolve it by convex programming. The rank function counts the number of nonvanishing singular values when the nuclear norm sums their amplitude. The nuclear norm is a convex function. It can be optimized efficiently via semidefinite programming.
The following theorem is demonstrated by authors in [4].
Theorem 1
Under the hypothesis of Theorem 1, there is a unique lowrank matrix, which is consistent with the observed entries. This matrix can be recovered by the convex optimization (3). For most problems, the nuclear norm relaxation is formally equivalent to the combinatorial hard rank minimization problem.
If the coherence is low, few samples are required to recover M. As an example, matrices with incoherent column and row space matrices with random orthogonal model or those with small components of the singular vectors of M.
Conventional semidefinite programming solvers such as SDPT3 [5] and SeDeMi [6] solve the problem (3). However, such solvers are usually based on interiorpoint methods and cannot deal with large matrices. They can only solve problems of size at most hundreds by hundreds on a moderate computer. These solvers are problematic when the size of the matrix is large. They need to solve huge systems of linear equations to compute the Newton direction. To be precise, SDTP handles only square matrices with the size less than 100. Another alternative is to think of using iterative solvers such as the method of conjugate gradients to solve the Newton system. However, it is still problematic as well since it is well known that the condition number of the Newton system increases rapidly as one gets closer to the solution. Furthermore, none of these generalpurpose solvers use the fact that the solution may have low rank.
Therefore, the firstorder methods are used to complete large lowrank matrices by solving (3).
shrink(x,λ) is a nonlinear function that applies a softthresholding rule at level λ to the singular values of the input matrix. The key property here is that for large values of τ, the sequence X_{k} converges to a solution which very nearly minimizes (3). Hence, at each step, one only needs to compute at most one singular value decomposition and perform a few elementary matrix additions.
2.3 The singular value thresholding algorithm
The most popular approaches to matrix completion in literature are the thresholding methods that can be divided into two groups: onestep thresholding methods and iterative thresholding methods. Despite the strong theoretical guaranties which have been obtained for onestep thresholding procedures, they show poor behavior in practice and only work under the uniform sampling distribution which is not realistic in many practical situations [7]. On the other hand, iterative thresholding methods are well adapted for general nonuniform distribution as well as they show practical performances as in [4]. Authors in [8] proposed a firstorder singular value thresholding algorithm SVT which is a key subroutine in many numerical schemes for solving nuclear norm minimization. The conventional approach for SVT is to find the singular value decomposition SVD of the matrix, then to shrink its singular values.
The singular value decomposition step
In other words, in D_{τ}(X), the singular vectors of X are kept and the singular values are shrinked by the softthresholding.
Even though the SVD may not be unique, it is easy to see that the singular value shrinkage operators are well defined. In some sense, this shrinkage operator is a straightforward extension of the softthresholding rule for scalars and vectors. In particular, note that if many of the singular values of X are below the threshold τ, the rank of D_{τ} may be considerably lower than that of X, just like the softthresholding rule applied to vectors leads to sparser outputs whenever some entries of the input are below threshold.
The singular value thresholding operator is the proximal operator associated with the nuclear norm. The proximal operator has its origins in convex optimization theory, and it has been widely used for nonsmooth convex optimization problems, such as the l_{1}norm minimization problems arising from compressed sensing [9] and related areas. It is well known that the proximal operator of the l_{1}norm is the softthresholding operator, and softthresholdingbased algorithms are proposed to solve l_{1}norm minimization problems [10].
Shrinkage iteration step

Lowrank property: The matrices X_{k} turn out to have low rank, and hence, the algorithm has minimum storage requirement since it only needs to keep principal factors in memory.

Sparsity: For each k≥0, Y_{k} vanishes outside of Ω and is, therefore, sparse, a fact, which can be used to evaluate the shrink function rapidly.
The SVT algorithm
The matrix completion problem can be viewed as a special case of the matrix recovery matrix, where one has to recover the missing entries of a matrix, given limited number of known entries.
2.4 Literature review
For all these algorithms, the SVT operator is the key to make them converge to lowrank matrices.
Just like the FPC and SVT algorithms, the proximal gradient (PG) [16] algorithm for matrix completion needs to compute the SVD at each iteration. It is as simple as the cited algorithms.
There are two main advantages of the SVT algorithm over the FPC and the PG algorithms when the former is applied to solve the problem of matrix completion.
First, in some cases, we dispose a sequence of lowrank iterates; in contrast, so many iterates at the initial phase of the FPC or PG algorithms may not have low rank even though the optimal solution itself has low rank. We observed this behavior when we applied them to solve the problem of matrix completion.
Second, the intermediate matrices generated during the resolution of our problem are sparse due to the sparcity of Ω, the set of observation. This makes the SVT algorithm computationally more attractive. Indeed, the generated matrices by FPC and PG algorithms may not be sparse and specially for the last one.
The firstorder methods presented above are the basis of a number of recent works that minimize the nuclear norm of a matrix to recover an image with missing data.
In [17], authors proposed a twostep proximal gradient algorithm to solve nuclear norm regularized least squares for the purpose of recovering lowrank data matrix from sampling of its entries. Each iteration generated by the proposed algorithm is a combination of the latest three points, namely, the previous point, the current iterate, and its proximal gradient point. This algorithm preserves the computational simplicity of classical proximal gradient algorithm [16] where a singular value decomposition in proximal operator is involved. Global convergence is followed directly in the literature.
Authors in [18, 19] adopted the SVT algorithm to achieve the completed matrix but by using the power method [20] instead of using PROPACK [21] for computing the singular value decomposition of large and sparse matrix. They showed that accelerating SoftImpute is indeed possible while still preserving the “sparse plus low rank” structure. To further reduce the iteration time complexity, instead of computing SVT exactly using PROPACK, they proposed an approximate SVT scheme based on the power method. Though the SVT obtained in each iteration is only approximate, they demonstrated that convergence can still be as fast as performing exact SVT. Hence, the resultant algorithm has low iteration complexity and fast convergence rate. Our objective is to increase the accuracy and the precision of image completion results by adopting unsupervised learning process that takes into account the characteristics of image pixels.
2.5 Nuclear norm minimizationbased collaborating filtering for image reconstruction
In the problem of collaborating filtering based on nuclear norm minimization, the goal is to predict entries of an unknown matrix based on a subset of its observed entries. For example in a collaborative prediction movie recommendation system, where the rows of the matrix represent users and columns represent movies, the task is to predict ratings that users gave to movies based on their preferences. The prediction of users’ preferences over movies—they have not yet seen—are then based on patterns in the partially observed rating matrix. The setting can be formalized as a matrix completion problem completing entries in a partially observed data matrix.

Clustering step: uses a learning process to identify pixels’ clusters.

Prediction step: uses a predictive method based on clusters found in the first step to predict the unknown pixels.
Clustering defines the optimal partitioning of a given set of N data points into K subgroups. The points belonging to the same group are as similar as much as possible. However, data points from two different groups share the maximum difference.
The first step of our approach is to perform a data filtering. The learning process starts by applying a principal component analysis (PCA) in the attempt to reduce the number of variables and make the information less redundant. As a result, our data are centered. To detect the pixels’ clusters, the process adopts a biclustering step founded on prototypebased clustering by using the Kmeans algorithm on the principal component scores, that is, the representation of the data matrix in the principal component space and its correlation matrix.
The second process takes place to predict the missing pixels using the clusters, which performs a new framework for predicting the missing pixels. The clustering phase regroups automatically the pixels of an image into different homogeneous regions. These homogeneous regions usually contain similar objects or part of them. As a result, interesting performance will be achieved in the prediction step.

Ω the set of locations corresponding to the observed entries.

b the linear vector which contains the observed elements.

m_{u} the smoothing degree.

The first one as a sparse matrix where only the elements different of 0 are to take into account.

The second one as a linear vector that contains the position of the observed elements.

The third one where Ω is specified as indices (i,j) with \((i,j)\in \mathbb {N}\).
The application of the proposed algorithm in image completion procures in some cases certain results that are out of range. In this case, we propose to use a median filtering on the predicted pixels. The median filter is often used as a typical preprocessing step to improve the result of later process in signal processing (for example, edge detection on an image). The idea is to use it as a final process to replace each entry (here, entries are the predicted pixels) with the median of neighboring entries, which performs a good result in image reconstruction as shown in the experimental results.
The result of our proposed approach is a completed data matrix that contains all the pixels’ values. The goal of the proposed approach is to predict the missing pixels in the image matrix. Our learning process detects the partitions of pixels’ indices where the predicting process exploits the clusters found to predict the missing value. It works on the assumption that pixels in the same cluster share almost the same characteristics in the image.
3 Results and discussion
The proposed approach is compared with several stateoftheart matrix completion methods including the following: fixed point continuation (FPC) algorithm [22], proximal gradient (PG) algorithm [16, 18, 19], partial proximal gradient (PPG) algorithm [16], augmented Lagrange multiplier (ALM) algorithm [15], and inexact augmented Lagrange multiplier (IALM) algorithm [15]. All these methods need the PROPACK Package [21] for computing the SVD for large and sparse matrix. Our approach was also compared to the method presented in [18, 19] that used the power method [20].
We constructed images with arbitrary missing data from the specified benchmark.
PSNR values of the reconstructed images using different ratios
Image  Method  15% missing data  25% missing data  35% missing data 

Lena  FPC  46.2950  45.8888  45.6461 
ALM  48.2790  48.2248  45.5295  
IALM  48.2673  48.2278  46.0819  
PPG  45.0964  41.9579  40.2865  
PG  45.0966  41.9582  40.2863  
SVT with power method  48.3221  48.2745  48.2295  
Proposed approach  48.9848  48.7817  48.5952  
Cameraman  FPC  28.2855  32.3433  31.7894 
ALM  36.8075  33.9694  29.5169  
IALM  36.7101  33.9849  29.8495  
PPG  31.3222  27.9425  24.3797  
PG  31.3199  27.9458  24.3757  
SVT with power method  40.9528  40.5182  40.0158  
Proposed approach  43.3525  42.4823  40.4596  
Flinstones  FPC  28.9102  37.4826  37.2896 
ALM  42.9923  37.3583  34.3512  
IALM  42.8769  37.4573  34.6774  
PPG  31.5781  29.5989  24.3693  
PG  31.5936  29.5887  24.3708  
SVT with power method  43.6614  43.5614  43.2075  
Proposed approach  47.8234  47.3814  44.3920  
House  FPC  28.6407  27.7999  27.4826 
ALM  37.0989  36.4579  36.3583  
IALM  36.8952  36.5262  36.4573  
PPG  31.5992  29.8729  29.5998  
PG  31.6005  29.8769  29.5701  
SVT with power method  38.1457  38.0827  37.6494  
Proposed approach  39.0707  38.1743  37.8234  
Man  FPC  39.2493  38.5864  38.1565 
ALM  41.3640  41.2133  40.4455  
IALM  41.2952  41.2191  40.8616  
PPG  39.4997  39.1895  39.3266  
PG  39.5008  39.1889  39.3264  
SVT with power method  42.9572  42.4158  42.0357  
Proposed approach  44.3212  43.9939  43.6724 
The execution of the main proposed algorithm requires an average of 2 to 10 min on 2.60 GHz Intel i7 core computer for 256×256 grayscale images.
The fact that our approach adopts a clustering step to detect the regions with similar pixels allowed us to augment the relevancy and the precision of our SVTbased prediction process. Indeed, when the SVT algorithm is applied on the submatrix that contains the pixels of the same cluster, the predicted values procured better PSNR and reconstructed images that are visually consistent more than the SVT algorithm using the power method presented by [18, 19, 20]. In addition, the sparcity of the observation matrix made the SVT algorithm the most suitable resolution method for matrix completion problem. Indeed, when recovering the missing image pixels, the FPC, PG, and ALM algorithms procured at their initial phase many iterates that have not a low rank though the optimal solution itself has low rank.
4 Conclusions
We propose in this work a new method for image reconstruction from missing data. It is based on two main steps. The first one is a biclustering process using Kmeans algorithm to identify pixels’ clusters. It is applied on the matrix of PCA scores and its correlation. The second step predicts the missing pixels by applying a matrix completion algorithm on the observations’ matrices obtained using the clusters found in step 1. In each iteration, a matrix of observations is constructed. It contains the values of pixels that are in the same cluster of the selected missing pixel.
The experimental process is conducted on a benchmark of five standard graylevel images in image processing. The proposed approach is compared visually to different nuclear norm minimization methods for matrix completion and also by measuring the PSNR for different percentages of missing data. Indeed, the proposed approach augments the PSNR of the completion by exploiting the fact that the SVT algorithm is applied per blocks, i.e., on matrices that contain pixels regrouped in the same cluster. A cluster contains eventually pixels that share almost the same characteristics.
Notes
Acknowledgements
Not applicable.
Funding
Not applicable.
Availability of data and materials
The benchmark used to demonstrate the effectiveness of the proposed approach is composed of five standard images used for image processing. These images are frequently found in literature and available on the following site: http://www.imageprocessingplace.com/root_files_V3/image_databases.htm
Authors’ contributions
OB realized the experimental process. For the remaining work, OB, SM, and SR contributed equally to the rest of the work. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1.O. Banouar, S. Raghay, Novel method for users profiles construction through collaborative filtering. IJCSNS. 17:, 170–176 (2017).Google Scholar
 2.G. Koutrika, Y. Ioannidis, Personalizing queries based on networks of composite preferences. ACM Trans. Database Syst.35:, 1–50 (2010).CrossRefGoogle Scholar
 3.Fazel, H. Hindi, S.P. Boyd, in proceedings of the American Control Conference. A rank minimization heuristic with application to minimum order system approximation (IEEEArlington, 2001).Google Scholar
 4.J. Cai, J.E. Cands, C. Zuowei, A singular value thresholding algorithm for matrix completion. SIAM J. Optim.20:, 1956–1982 (2010).MathSciNetCrossRefGoogle Scholar
 5.K. Toh, M. Todd, R. Tutuncu, SDPT3, a Matlab software package for semidefinite programming. Optim. Methods Softw.11:, 545–581 (1999).MathSciNetCrossRefGoogle Scholar
 6.J. Sturm, Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw.11:, 625–653 (1999).MathSciNetCrossRefGoogle Scholar
 7.O. Klopp, Noisy lowrank matrix completion with general sampling distribution. Bernoulli. 20:, 282–303 (2014).MathSciNetCrossRefGoogle Scholar
 8.J.E. Candès, Y. Plan, in Proceedings of the IEEE, 98, 98. Matrix completion with noise, (2010), pp. 925–936. https://doi.org/10.1109/JPROC.2009.2035722.
 9.J.E. Candès, B. Recht, Exact matrix completion via convex optimization. Found. Comput. Math.9:, 717–772 (2009).MathSciNetCrossRefGoogle Scholar
 10.J.E. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis. JACM. 58:, 37 (2011). https://doi.org/10.1145/1970392.1970395.MathSciNetCrossRefGoogle Scholar
 11.O. Banouar, S. Raghay, User profile construction for personalized access to multiple data sources through matrix completion method. IJCSNS. 16:, 51–57 (2016).Google Scholar
 12.S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization. Optim. Control.128:, 321–353 (2011).MathSciNetzbMATHGoogle Scholar
 13.P.L. Combettes, V.R. Wajs, Signal recovery by proximal forwardbackward splitting. Multiscale Model. Simul.4:, 1168–1200 (2005).MathSciNetCrossRefGoogle Scholar
 14.S. Osher, M. Burger, D. Goldfarb, J. Xu, W. Yin, An iterative regularization method for total variationbased image restoration. Multiscale Model. Simul.4:, 460–489 (2005).MathSciNetCrossRefGoogle Scholar
 15.Z. Lin, M. Chen, L. Wu, Y. Ma, The augmented Lagrange multiplier method for exact recovery of corrupted lowrank matrices (2010). UIUC Technical Report UILU ENG092215 arXiv:1009.5055 [math.OC].Google Scholar
 16.Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, Y. Ma, in Intl. Workshop on Comp. Adv. in MultiSensor Adapt. Processing, Aruba, Dutch Antilles. Fast convex optimization algorithms for exact recovery of a corrupted lowrank matrix. UIUC Technical Report UILUENG092214, (2009).Google Scholar
 17.Q. Wang, W. Cao, Z. Jin, Twostep proximal gradient algorithm for lowrank matrix completion. Stat. Optim. Inf. Comput.4(2), 201–210 (2016).MathSciNetCrossRefGoogle Scholar
 18.Q. Yao, J.T. Kwok, in Proc. of the Int. Joint Conf. on Art. Intel. Accelerated inexact soft impute for fast large scale matrix completion and tensor completion (AAAI PressBuenos Aires, 2015).Google Scholar
 19.Q. Yao, J.T. Kwok, Accelerated inexact soft impute for fast large scale matrix completion and Tensor completion. IEEE Trans. Knowl. Data Eng. (2017). arXiv:1703.05487v2 [cs.NA].Google Scholar
 20.N. Halko, P.G. Martinsson, J.A. Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM. Rev.53(2), 217288 (2011).MathSciNetCrossRefGoogle Scholar
 21.K.C. Toh, S. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pac. J. Optim.6:, 615–640 (2010).MathSciNetzbMATHGoogle Scholar
 22.E.T. Hale, W. Yin, Y. Zhang, Fixedpoint continuation for l1minimization: methodology and convergence. SIAM J. Optim.19:, 1107–1130 (2008).MathSciNetCrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.