Skip to main content
Log in

Graph regularized discriminative non-negative matrix factorization for face recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Non-negative matrix factorization (NMF) has been widely employed in computer vision and pattern recognition fields since the learned bases can be interpreted as a natural parts-based representation of the input space, which is consistent with the psychological intuition of combining parts to form a whole. In this paper, we propose a novel constrained nonnegative matrix factorization algorithm, called the graph regularized discriminative non-negative matrix factorization (GDNMF), to incorporate into the NMF model both intrinsic geometrical structure and discriminative information which have been essentially ignored in prior works. Specifically, both the graph Laplacian and supervised label information are jointly utilized to learn the projection matrix in the new model. Further we provide the corresponding multiplicative update solutions for the optimization framework, together with the convergence proof. A series of experiments are conducted over several benchmark face datasets to demonstrate the efficacy of our proposed GDNMF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  2. Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inform Process Syst 14:585–591

    Google Scholar 

  3. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  4. Berry M, Browne M, Langville A, Pauca V, Plemmons R (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1):155–173

    Article  MATH  MathSciNet  Google Scholar 

  5. Brunet J-P, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101(12):4164–4169

    Article  Google Scholar 

  6. Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560

    Article  Google Scholar 

  7. Chung FRK (1997) Spectral graph theory (Conference Board of the Mathematical Sciences, Regional Conference Series in Mathematics, No. 92), American Mathematical Society

  8. Demrsrsa A, Lamb N, Rubin D (1977) Maximum likelihood from incomplete data via the em algorithm. J Roy Stat Soc Ser B (Methodological) 39(1):1–38

    Google Scholar 

  9. Donoho DL, Stodden VC (2004) When does non-negative matrix factorization give a correct decomposition into parts? In: Advances in neural information processing systems 16: proceedings of the 2003 conference. MIT Press

  10. Graham DB, Allinson NM (1998) Characterising virtual eigensignatures for general purpose face recognition. In: Face recognition. Springer, pp 446–456

  11. Guan N, Tao D, Luo Z, Yuan B (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process 20(7):2030–2048

    Article  MathSciNet  Google Scholar 

  12. Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 1735–1742

  13. Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  14. Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. Adv Neural Inform Process Syst 13:556–562

    Google Scholar 

  15. Li S, Hou X, Zhang H, Cheng Q (2001) Learning spatially localized, parts-based representation. In: IEEE conference on computer vision and pattern recognition, vol 1, pp I–207

  16. Logothetis N, Sheinberg D (1996) Visual object recognition. Ann Rev Neurosci 19(1):577–621

    Article  Google Scholar 

  17. Palmer S (1977) Hierarchical structure in perceptual representation. Cognitive Psychol 9(4):441–474

    Article  Google Scholar 

  18. Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  19. Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142

  20. Sim T, Baker S, Bsat M (2003) The cmu pose, illumination, and expression database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618

    Article  Google Scholar 

  21. Tenenbaum J, De Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  22. Wachsmuth E, Oram M, Perrett D (1994) Recognition of objects and their component parts: responses of single units in the temporal cortex of the macaque. Cereb Cortex 4(5):509–522

    Article  Google Scholar 

  23. Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, pp 267–273

  24. Zhang Q, Li B (2010) Discriminative k-svd for dictionary learning in face recognition. In: IEEE conference on computer vision and pattern recognition, pp 2691–2698

Download references

Acknowledgements

This work is supported in part by the National Basic Research Program of China (973 program) under Grant 2009CB320901, the National Natural Science Foundation of China under Grant 61272247, the National High Technology Research and Development Program of China (863 program) under Grant 2008AA02Z310, the European Union Seventh Framework Programme under Grant 247619, the Shanghai Committee of Science and Technology under Grant 08411951200, and the Innovation Ability Special Fund of Shanghai Jiao Tong University under Grant Z030026.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianzhong Long.

Appendix (Proof of Theorem)

Appendix (Proof of Theorem)

In order to prove Theorem, we need to show that O 1 is non-increasing under the updating steps in (19), (20) and (21). For the objective function O 1, we need to fix H and A if we update W, so, the first term of O 1 exists. Similarly, we need to fix W and H if we update A, the third term of O 1 exists. Therefore, we have exactly the same update formula for W and A in GDNMF as in the original NMF. Thus, we can use the convergence proof of NMF to show that O 1 is nonincreasing under the update step in (20) and (21). These details can be found in [14].

Hence, we only need to prove that O 1 is non-increasing under the updating step in (19). We follow the similar process depicted in [14]. Our proof make use of an auxiliary function similar to that used in the Expectation-Maximization algorithm [8]. We first give the definition of the auxiliary function.

Definition

G(h, h ) is an auxiliary function of F(h) if the following conditions are satisfied.

$$ G(h,h^{'})\geq F(h),~~~~ G(h,h)=F(h) $$
(22)

The above auxiliary function is very important because of the following lemma.

Lemma 1

If G is an auxiliary function of F, then F is non-increasing under the update

$$ h^{(t+1)}=\arg\min\limits_{h} G(h, h^{(t)}) $$
(23)

Proof

F(h (t + 1)) ≤ G(h (t + 1), h (t)) ≤ G(h (t), h (t)) = F(h (t))

Now, we show that the update step for H in (19) is exactly the update in (23) with a proper auxiliary function.

Considering any element h ab in H, we use F ab to denote the part of O 1 which is only relevant to h ab . It is easy to obtain the following derivatives.

$$ F_{ab}^{'}=\left(\dfrac{\partial O_{1}}{\partial\mathbf{H}}\right)_{ab}=[2\mathbf{W}^{T}(\mathbf{W}\mathbf{H}-\mathbf{X})+2\lambda\mathbf{H}(\mathbf{B}-\mathbf{C})+2\gamma\mathbf{A}^{T}(\mathbf{A}\mathbf{H}-\mathbf{S}) ]_{ab} $$
(24)
$$ F_{ab}^{''}=2(\mathbf{W}^{T}\mathbf{W})_{aa}+2\lambda\mathbf{B}_{bb}-2\lambda\mathbf{C}_{bb}+2\gamma(\mathbf{A}^{T}\mathbf{A})_{aa} $$
(25)

It is enough to show that each F ab is nonincreasing under the update step of (19) because our update is essentially element-wise. Consequently, we introduce the following lemma. □

Lemma 2

Function

$$\begin{array}{rll} G(h,h_{ab}^{(t)})&=&F_{ab}(h_{ab}^{(t)})+F_{ab}^{'}(h_{ab}^{(t)})(h-h_{ab}^{(t)})\\ &&+\,\frac{(\mathbf{W}^{T}\mathbf{W}\mathbf{H}+\gamma\mathbf{A}^{T}\mathbf{A}\mathbf{H}+\lambda\mathbf{H}\mathbf{B})_{ab}}{h_{ab}^{(t)}}(h-h_{ab}^{(t)})^2 \end{array}$$
(26)

is an auxiliary function of F ab .

Proof

We only need to prove that \(G(h,h_{ab}^{(t)})\geq F_{ab}(h)\) because G(h,h) = F ab (h) is obvious. Therefore, we first consider the Taylor series expansion of F ab (h).

$$\begin{array}{rll} F_{ab}(h)&=&F_{ab}(h_{ab}^{(t)})+F_{ab}^{'}(h_{ab}^{(t)})(h-h_{ab}^{(t)})\\ &&+\,[(\mathbf{W}^{T}\mathbf{W})_{aa}+\lambda\mathbf{B}_{bb}-\lambda\mathbf{C}_{bb}+\gamma(\mathbf{A}^{T}\mathbf{A})_{aa}](h-h_{ab}^{(t)})^{2} \end{array}$$
(27)

We compare the (27) with (26) to find that \(G(h,h_{ab}^{(t)})\geq F_{ab}(h)\) is equivalent to

$$\dfrac{(\mathbf{W}^{T}\mathbf{W}\mathbf{H}+\gamma\mathbf{A}^{T}\mathbf{A}\mathbf{H}+\lambda\mathbf{H}\mathbf{B})_{ab}}{h_{ab}^{(t)}}\geq (\mathbf{W}^{T}\mathbf{W})_{aa}+\lambda\mathbf{B}_{bb}-\lambda\mathbf{C}_{bb}+\gamma(\mathbf{A}^{T}\mathbf{A})_{aa} $$
(28)

In fact, we have

$$\begin{array}{rll} (\mathbf{W}^{T}\mathbf{W}\mathbf{H}+\gamma\mathbf{A}^{T}\mathbf{A}\mathbf{H})_{ab}&=&\sum\limits_{q=1}^{r}(\mathbf{W}^{T}\mathbf{W})_{aq}h_{qb}^{(t)}+\gamma\sum\limits_{q=1}^{r}(\mathbf{A}^{T}\mathbf{A})_{aq}h_{qb}^{(t)} \\&\geq& (\mathbf{W}^{T}\mathbf{W})_{aa}h_{ab}^{(t)}+\gamma(\mathbf{A}^{T}\mathbf{A})_{aa}h_{ab}^{(t)} \end{array}$$
(29)

and

$$ (\lambda\mathbf{H}\mathbf{B})_{ab}=\lambda\sum\limits_{j=1}^{n}h_{aj}^{(t)}\mathbf{B}_{jb}\geq \lambda h_{ab}^{(t)}\mathbf{B}_{bb}\geq\lambda h_{ab}^{(t)}\mathbf{B}_{bb}-\lambda h_{ab}^{(t)}\mathbf{C}_{bb} $$
(30)

Thus, (28) holds and \(G(h,h_{ab}^{(t)})\geq F_{ab}(h)\). We can now demonstrate the convergence of Theorem: □

Proof of Theorem

Replacing \(G(h,h_{ab}^{(t)})\) in (23) by (26) results in the following update rule:

$$\begin{array}{rll} h_{ab}^{(t+1)}&=&h_{ab}^{(t)}-h_{ab}^{(t)}\frac{F_{ab}^{'}(h_{ab}^{(t)})}{2(\mathbf{W}^{T}\mathbf{W}\mathbf{H}+\gamma\mathbf{A}^{T}\mathbf{A}\mathbf{H}+\lambda\mathbf{H}\mathbf{B})_{ab}}\\ &=&h_{ab}^{(t)}\frac{(\gamma\mathbf{A}^{T}\mathbf{S}+\mathbf{W}^{T}\mathbf{X}+\lambda\mathbf{H}\mathbf{C})_{ab}}{(\mathbf{W}^{T}\mathbf{W}\mathbf{H}+\gamma\mathbf{A}^{T}\mathbf{A}\mathbf{H}+\lambda\mathbf{H}\mathbf{B})_{ab}} \end{array}$$
(31)

Since (26) is an auxiliary function and F ab is nonincreasing under this update rule. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Long, X., Lu, H., Peng, Y. et al. Graph regularized discriminative non-negative matrix factorization for face recognition. Multimed Tools Appl 72, 2679–2699 (2014). https://doi.org/10.1007/s11042-013-1572-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1572-z

Keywords

Navigation