Skip to main content

Nonnegative class-specific entropy component analysis with adaptive step search criterion


Nonnegative learning aims to learn the part-based representation of nonnegative data and receives much attention in recent years. Nonnegative matrix factorization has been popular to make nonnegative learning applicable, which can also be explained as an optimization problem with bound constraints. In order to exploit the informative components hidden in nonnegative patterns, a novel nonnegative learning method, termed nonnegative class-specific entropy component analysis, is developed in this work. Distinguish from the existing methods, the proposed method aims to conduct the general objective functions, and the conjugate gradient technique is applied to enhance the iterative optimization. In view of the development, a general nonnegative learning framework is presented to deal with the nonnegative optimization problem with general objective costs. Owing to the general objective costs and the nonnegative bound constraints, the diseased nonnegative learning problem usually occurs. To address this limitation, a modified line search criterion is proposed, which prevents the null trap with insured conditions while keeping the feasible step descendent. In addition, the numerical stopping rule is employed to achieve optimized efficiency, instead of the popular gradient-based one. Experiments on face recognition with varieties of conditions reveal that the proposed method possesses better performance over other methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14


  1. Note that, for some specific purposes where another existing upper bound is available, it is actually unnecessary to bring such constraint. In NCECA, the NMF approximation is still involved to make an universal arrangement.


  1. Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  2. Bertsekas DP (1976) On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans Autom Control 21:174–184

    Article  MATH  MathSciNet  Google Scholar 

  3. Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  4. Cheng M, Fang B, Pun CM, Tang YY (2011) Kernel-view based discriminant approach for embedded feature extraction in high-dimensional space. Neurocomputing 74(9):1478–1484

    Article  Google Scholar 

  5. Cheng M, Fang B, Tang YY, Zhang T, Wen J (2010) Incremental embedding and learning in the local discriminant subspace with application to face recognition. IEEE Trans Syst Man Cybern Part C Appl Rev 40(5):580–591

    Article  Google Scholar 

  6. Cheng M, Fang B, Wen J, Tang YY (2010) Marginal discriminant projections: an adaptable marginal discriminant approach to feature reduction and extraction. Pattern Recogn Lett 31(13):1965–1974

    Article  Google Scholar 

  7. Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York

  8. Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceeding of SIAM international conference on data mining, pp 606–610

  9. Erdogmus D, Prineipe JC (2002) Generalized information potential criterion for adaptive system training. IEEE Trans Neural Netw 13(5):1035–1044

    Article  Google Scholar 

  10. Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, New York

  11. Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660

    Article  Google Scholar 

  12. Guan N, Tao D, Luo Z, Yuan B (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process 20(7):2030–2048

    Article  MathSciNet  Google Scholar 

  13. Guan N, Tao D, Luo Z, Yuan B (2011) Non-negative patch alignment framework. IEEE Trans Neural Netw 22(8):1218–1230

    Article  Google Scholar 

  14. Han L, Neumann M, Prasad U (2009) Alternating projected Barzilai–Borwein methods for nonnegative matrix factorization. Electron Trans Numer Anal 36(6):54–82

    MATH  MathSciNet  Google Scholar 

  15. He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Recognit Mach Intell 27(3):328–340

    Article  Google Scholar 

  16. Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469

    MATH  MathSciNet  Google Scholar 

  17. Jenssen R (2010) Kernel entropy component analysis. IEEE Trans Pattern Anal Mach Intell 32(5):847–860

    Article  Google Scholar 

  18. Kim D, Sra S, Dhillon IS (2007) Fast newton-type methods for the least squares nonnegative matrix approximation problem. In: IEEE International Conference on Data Mining, pp 343–354

  19. Kotsia I, Zafeiriou S, Pitas I (2007) A novel discriminant non-negative matrix factorization algorithm with applications to facial image characterization problems. IEEE Trans Inform Forensics Secur 2(3):588–595

    Article  Google Scholar 

  20. Laboratory OR.: The Olivetti & Oracle Research Laboratory face database of faces [Online]. Available:

  21. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  22. Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proceedings of neural information processing systems, pp 556–562

  23. Leiva-Murillo JM, Artés-Rodríguez A (2007) Maximization of mutual information for supervised linear feature extraction. IEEE Trans Neural Netw 18(5):1433–1441

    Article  Google Scholar 

  24. Li SZ, Hou XW, Zhang HJ (2001) Learning spatially localized, parts-based representation. In: Proceedings of computer vision and pattern recognition, pp 207–212

  25. Lin CJ (2007) On the convergence of multiplicative update for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1596

    Article  Google Scholar 

  26. Lin CJ (2007) Projected gradients for nonnegative matrix factorization. Neural Comput 19:2756–2779

    Article  MATH  MathSciNet  Google Scholar 

  27. Liu C, He K, Zhou J, Zhang J (2010) Generalized discriminant orthogonal non-negative matrix factorization. J Comput Inform Sys 6(6):1743–1750

    Google Scholar 

  28. Lyons MJ, Akamatsu S, Kamachi M, Gyoba J (2005) Coding facial expression with Gabor wavelets. In: Proceedings of third IEEE international conference on automatic face and gesture recognition, pp 200–205

  29. Moré JJ, Toraldo G (1991) On the solution of large quadratic programming problems with bound constraints. SIAM J Optim 1(1):93–113

    Article  MATH  MathSciNet  Google Scholar 

  30. Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076

    Article  MATH  MathSciNet  Google Scholar 

  31. Phillips PJ, Moon H, Rizvi SA, Rauss PJ (2000) The feret evaluation methodology for face-recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104

    Article  Google Scholar 

  32. Principe JC, Xu D, Fisher JWI (2000) Information-theoretic learning, vol 1. Wiley, New York

  33. Renyi A (1961) On measures of entropy and information. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, University of California Press, Berkeley, pp 547–561

  34. Torkkola K (2003) Feature extraction by non-parametric mutual information. J Mach Learn Res 3:1415–1438

    MATH  MathSciNet  Google Scholar 

  35. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86

    Article  Google Scholar 

  36. Wang Y, Jia Y, Hu C, Turk M (2005) Non-negative matrix factorization framework for face recognition. Int J Pattern Recogn Artif Intell 19(4):495–511

    Article  Google Scholar 

  37. Zafeiriou S, Tefas A, Buciu I, Pitas I (2006) Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. IEEE Trans Neutral Netw 17(3):683–695

    Article  Google Scholar 

  38. Zdunek R, Cichocki A (2006) Non-negative matrix factorization with quasi-newton optimization. In: The 8th international conference on artificial intelligence and soft computing, pp 870–879

  39. Zdunek R, Cichocki A (2007) Nonnegative matrix factorization with constrained second-order optimization. Signal Process 87(8):1904–1916

    Article  MATH  Google Scholar 

Download references


The authors would like to thank the handling associate editor and anonymous reviewers for their constructive comments. And the authors also would like to thank the US Army Research Laboratory for the FERET database. This work was supported by the research grant funded by the research committee of University of Macau.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Miao Cheng.

Appendix: A calculation of \( \nabla J_{E} (W) \)

Appendix: A calculation of \( \nabla J_{E} (W) \)

The gradient \( \nabla \mathcal {H} ({W^T X | C}) \) is quite associated with the gradient of the information potential \( \nabla \mathcal {V} ({W^T X | c}) \) for each class. The gradient \( \nabla \mathcal {V} ({W^T X | c}) \) is calculated as

$$ \begin{aligned} &\frac{{\partial \mathcal {V} ({W^{T} X |c})}}{{\partial W}} \\ &= \frac{1}{{n_{c}^{2}}}\frac{{\partial \sum\limits_{i=1}^{n_{c}} {\sum\limits_{j = 1}^{n_{c}} {G ({W^{T} x_{i}^{c}-W^{T} x_{j}^{c},2\sigma^{2}})}}}}{{\partial W}} \\ &= -\frac{1}{{n_{c}^{2} \sigma^{2}}}\sum\limits_{i = 1}^{{n}_{c}} {\sum\limits_{j = 1}^{n_{c}} {G ({W^T x_{i}^{c}-W^{T} x_{j}^{c},2\sigma^{2}}) ({x_{i}^{c}-x_{j}^{c}}) ({x_{i}^{c}- x_{j}^{c}})^{T}W}}. \end{aligned} $$

Thus, the partial derivative \( {{\partial H ({W^T X | C})} / {\partial W}} \) is given by

$$ \begin{aligned} &\frac{{\partial{\mathcal{H}} ({W^{T}X | C})}}{{\partial W}} \\ &= -\sum\limits_{c=1}^{N} {\frac{{n_{c}}}{n}} \frac{{\frac{{\partial {\mathcal{V}} ({W^{T} X |c})}}{{\partial W}}}}{{{\mathcal{V}} ({W^{T} X | c})}} \\ &= \sum\limits_{c = 1}^N {\frac{{n_{c}}}{{n\sigma^{2}}}} \frac{{\sum\limits_{i = 1}^{n_{c}} {\sum\limits_{j = 1}^{n_{c}} {G ({W^{T} x_{i}^{c}- W^{T} x_{j}^{c},2\sigma^{2}}) ({x_{i}^{c}-x_{j}^{c}}) ({x_{i}^{c}- x_{j}^{c}})^{T} W}}}}{{\sum\limits_{i = 1}^{n_{c}} {\sum\limits_{j = 1}^{n_{c}} {G ({W^{T} x_{i}^{c}-W^{T} x_{j}^{c},2\sigma^{2}})}}}}. \end{aligned} $$

with the above partial derivatives, it is straightforward to obtain the gradient \( \nabla J_E (W). \)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cheng, M., Pun, CM. & Tang, Y.Y. Nonnegative class-specific entropy component analysis with adaptive step search criterion. Pattern Anal Applic 17, 113–127 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Nonnegative learning
  • General objective functions
  • Diseased nonnegative learning problem
  • Line search