## Abstract

Nonnegative learning aims to learn the part-based representation of nonnegative data and receives much attention in recent years. Nonnegative matrix factorization has been popular to make nonnegative learning applicable, which can also be explained as an optimization problem with bound constraints. In order to exploit the informative components hidden in nonnegative patterns, a novel nonnegative learning method, termed nonnegative class-specific entropy component analysis, is developed in this work. Distinguish from the existing methods, the proposed method aims to conduct the general objective functions, and the conjugate gradient technique is applied to enhance the iterative optimization. In view of the development, a general nonnegative learning framework is presented to deal with the nonnegative optimization problem with general objective costs. Owing to the general objective costs and the nonnegative bound constraints, the diseased nonnegative learning problem usually occurs. To address this limitation, a modified line search criterion is proposed, which prevents the null trap with insured conditions while keeping the feasible step descendent. In addition, the numerical stopping rule is employed to achieve optimized efficiency, instead of the popular gradient-based one. Experiments on face recognition with varieties of conditions reveal that the proposed method possesses better performance over other methods.

This is a preview of subscription content, access via your institution.

## Notes

Note that, for some specific purposes where another existing upper bound is available, it is actually unnecessary to bring such constraint. In NCECA, the NMF approximation is still involved to make an universal arrangement.

## References

Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

Bertsekas DP (1976) On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans Autom Control 21:174–184

Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont

Cheng M, Fang B, Pun CM, Tang YY (2011) Kernel-view based discriminant approach for embedded feature extraction in high-dimensional space. Neurocomputing 74(9):1478–1484

Cheng M, Fang B, Tang YY, Zhang T, Wen J (2010) Incremental embedding and learning in the local discriminant subspace with application to face recognition. IEEE Trans Syst Man Cybern Part C Appl Rev 40(5):580–591

Cheng M, Fang B, Wen J, Tang YY (2010) Marginal discriminant projections: an adaptable marginal discriminant approach to feature reduction and extraction. Pattern Recogn Lett 31(13):1965–1974

Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York

Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceeding of SIAM international conference on data mining, pp 606–610

Erdogmus D, Prineipe JC (2002) Generalized information potential criterion for adaptive system training. IEEE Trans Neural Netw 13(5):1035–1044

Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, New York

Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660

Guan N, Tao D, Luo Z, Yuan B (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process 20(7):2030–2048

Guan N, Tao D, Luo Z, Yuan B (2011) Non-negative patch alignment framework. IEEE Trans Neural Netw 22(8):1218–1230

Han L, Neumann M, Prasad U (2009) Alternating projected Barzilai–Borwein methods for nonnegative matrix factorization. Electron Trans Numer Anal 36(6):54–82

He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Recognit Mach Intell 27(3):328–340

Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469

Jenssen R (2010) Kernel entropy component analysis. IEEE Trans Pattern Anal Mach Intell 32(5):847–860

Kim D, Sra S, Dhillon IS (2007) Fast newton-type methods for the least squares nonnegative matrix approximation problem. In: IEEE International Conference on Data Mining, pp 343–354

Kotsia I, Zafeiriou S, Pitas I (2007) A novel discriminant non-negative matrix factorization algorithm with applications to facial image characterization problems. IEEE Trans Inform Forensics Secur 2(3):588–595

Laboratory OR.: The Olivetti & Oracle Research Laboratory face database of faces [Online]. Available: http://www.cam-orl.co.uk/facedatabase.html

Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proceedings of neural information processing systems, pp 556–562

Leiva-Murillo JM, Artés-Rodríguez A (2007) Maximization of mutual information for supervised linear feature extraction. IEEE Trans Neural Netw 18(5):1433–1441

Li SZ, Hou XW, Zhang HJ (2001) Learning spatially localized, parts-based representation. In: Proceedings of computer vision and pattern recognition, pp 207–212

Lin CJ (2007) On the convergence of multiplicative update for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1596

Lin CJ (2007) Projected gradients for nonnegative matrix factorization. Neural Comput 19:2756–2779

Liu C, He K, Zhou J, Zhang J (2010) Generalized discriminant orthogonal non-negative matrix factorization. J Comput Inform Sys 6(6):1743–1750

Lyons MJ, Akamatsu S, Kamachi M, Gyoba J (2005) Coding facial expression with Gabor wavelets. In: Proceedings of third IEEE international conference on automatic face and gesture recognition, pp 200–205

Moré JJ, Toraldo G (1991) On the solution of large quadratic programming problems with bound constraints. SIAM J Optim 1(1):93–113

Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076

Phillips PJ, Moon H, Rizvi SA, Rauss PJ (2000) The feret evaluation methodology for face-recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104

Principe JC, Xu D, Fisher JWI (2000) Information-theoretic learning, vol 1. Wiley, New York

Renyi A (1961) On measures of entropy and information. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, University of California Press, Berkeley, pp 547–561

Torkkola K (2003) Feature extraction by non-parametric mutual information. J Mach Learn Res 3:1415–1438

Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86

Wang Y, Jia Y, Hu C, Turk M (2005) Non-negative matrix factorization framework for face recognition. Int J Pattern Recogn Artif Intell 19(4):495–511

Zafeiriou S, Tefas A, Buciu I, Pitas I (2006) Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. IEEE Trans Neutral Netw 17(3):683–695

Zdunek R, Cichocki A (2006) Non-negative matrix factorization with quasi-newton optimization. In: The 8th international conference on artificial intelligence and soft computing, pp 870–879

Zdunek R, Cichocki A (2007) Nonnegative matrix factorization with constrained second-order optimization. Signal Process 87(8):1904–1916

## Acknowledgments

The authors would like to thank the handling associate editor and anonymous reviewers for their constructive comments. And the authors also would like to thank the US Army Research Laboratory for the FERET database. This work was supported by the research grant funded by the research committee of University of Macau.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix: A calculation of \( \nabla J_{E} (W) \)

### Appendix: A calculation of \( \nabla J_{E} (W) \)

The gradient \( \nabla \mathcal {H} ({W^T X | C}) \) is quite associated with the gradient of the information potential \( \nabla \mathcal {V} ({W^T X | c}) \) for each class. The gradient \( \nabla \mathcal {V} ({W^T X | c}) \) is calculated as

Thus, the partial derivative \( {{\partial H ({W^T X | C})} / {\partial W}} \) is given by

with the above partial derivatives, it is straightforward to obtain the gradient \( \nabla J_E (W). \)

## Rights and permissions

## About this article

### Cite this article

Cheng, M., Pun, CM. & Tang, Y.Y. Nonnegative class-specific entropy component analysis with adaptive step search criterion.
*Pattern Anal Applic* **17**, 113–127 (2014). https://doi.org/10.1007/s10044-011-0258-2

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10044-011-0258-2

### Keywords

- Nonnegative learning
- General objective functions
- Diseased nonnegative learning problem
- Line search