Nonnegative Matrix Factorization for Document Clustering: A Survey

  • Ehsan Hosseini-Asl
  • Jacek M. Zurada
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8468)

Abstract

Nonnegative Matrix Factorization (NMF) is a popular dimension reduction technique of clustering by extracting latent features from high-dimensional data and is widely used for text mining. Several optimization algorithms have been developed for NMF with different cost functions. In this paper we apply several methods of NMF that have been developed for data analysis. These methods vary in using different cost function for matrix factorization and different optimization algorithms for minimizing the cost function. Reuters Document Corpus is used for evaluating the performance of each method. The methods are compared with respect to their accuracy, entropy, purity and computational complexity and residual mean square root error. The most efficient methods in terms of each performance measure are also recognized.

Keywords

Nonnegative Matrix Factorization Document clustering optimization algorithm 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  2. 2.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2000)Google Scholar
  3. 3.
    Hoyer, P.O.: Non-negative sparse coding. In: Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, pp. 557–565. IEEE (2002)Google Scholar
  4. 4.
    Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Information Processing & Management 42(2), 373–386 (2006)CrossRefMATHGoogle Scholar
  5. 5.
    Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis. Neural computation 21(3), 793–830 (2009)CrossRefMATHGoogle Scholar
  6. 6.
    Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the β-divergence. Neural Computation 23(9), 2421–2456 (2011)CrossRefMATHMathSciNetGoogle Scholar
  7. 7.
    Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 267–273. ACM (2003)Google Scholar
  8. 8.
    Liu, W., Pokharel, P.P., Principe, J.C.: Correntropy: A localized similarity measure. In: International Joint Conference on Neural Networks, IJCNN 2006, pp. 4919–4924. IEEE (2006)Google Scholar
  9. 9.
    Zdunek, R., Cichocki, A.: Non-negative matrix factorization with quasi-newton optimization. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 870–879. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis 52(1), 155–173 (2007)CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Pang-Ning, T., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2005)Google Scholar
  12. 12.
    Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)MATHMathSciNetGoogle Scholar
  13. 13.
    Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural computation 19(10), 2756–2779 (2007)CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    Kim, H., Park, H.: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J. Matrix Anal. Appl. 30(2), 713–730 (2008)CrossRefMATHMathSciNetGoogle Scholar
  15. 15.
    Kim, J., Park, H.: Fast nonnegative matrix factorization: An active-set-like method and comparisons. SIAM J. Sci. Comput. 33(6), 3261–3281 (2011)CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Cichocki, A., Anh-Huy, P.: Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans. Fundamentals 92(3), 708–721 (2009)CrossRefGoogle Scholar
  17. 17.
    Li, L., Lebanon, G., Park, H.: Fast bregman divergence nmf using taylor expansion and coordinate descent. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 307–315. ACM (2012)Google Scholar
  18. 18.
    Du, L., Li, X., Shen, Y.D.: Robust nonnegative matrix factorization via half-quadratic minimization. In: ICDM, pp. 201–210 (2012)Google Scholar
  19. 19.
    Dhillon, I.S., Sra, S.: Generalized nonnegative matrix approximations with bregman divergences. In: NIPS, vol. 18 (2005)Google Scholar
  20. 20.
    Kompass, R.: A generalized divergence measure for nonnegative matrix factorization. Neural computation 19(3), 780–791 (2007)CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Liu, W., Pokharel, P.P., Principe, J.C.: Correntropy: properties and applications in non-gaussian signal processing. IEEE Trans. Signal Process 55(11), 5286–5298 (2007)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Jeong, K.H., Principe, J.C.: Enhancing the correntropy MACE filter with random projections. Neurocomputing 72(1), 102–111 (2008)CrossRefGoogle Scholar
  23. 23.
    Ensari, T., Chorowski, J., Zurada, J.M.: Correntropy-based document clustering via nonnegative matrix factorization. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part II. LNCS, vol. 7553, pp. 347–354. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  24. 24.
    Ensari, T., Chorowski, J., Zurada, J.M.: Occluded face recognition using correntropy-based nonnegative matrix factorization. In: 11th International Conference on Machine Learning and Applications (ICMLA), vol. 1, pp. 606–609. IEEE (2012)Google Scholar
  25. 25.
    Schmidt, M.: Matlab software (2008), http://www.di.ens.fr/~mschmidt/Software/minConf.html
  26. 26.
    Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 126–135. ACM (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ehsan Hosseini-Asl
    • 1
  • Jacek M. Zurada
    • 1
    • 2
  1. 1.Electrical and Computer Engineering DepartmentUniversity of LouisvilleLouisvilleUSA
  2. 2.Information Technology InstituteAcademy of ManagementLodzPoland

Personalised recommendations