Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification

  • Catarina Silva
  • Bernardete Ribeiro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5788)


Text classification has received increasing interest over the past decades for its wide range of applications driven by the ubiquity of textual information. The high dimensionality of those applications led to pervasive use of dimensionality reduction methods, often black-box feature extraction non-linear techniques.

We show how Non-Negative Matrix Factorization (NMF), an algorithm able to learn a parts-based representation of data by imposing non-negativity constraints, can be used to represent and extract knowledge from a text classification problem. The resulting reduced set of features is tested with kernel-based machines on Reuters-21578 benchmark showing the method’s performance competitiveness.


Support Vector Machine Semantic Feature Nonnegative Matrix Factorization Positive Matrix Factorization Knowledge Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non-negative Matrix Factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  2. 2.
    Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)CrossRefzbMATHGoogle Scholar
  3. 3.
    Paatero, P., Tapper, U.: Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5, 111–126 (1994)CrossRefGoogle Scholar
  4. 4.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing 13 (Proc. NIPS 2000). MIT Press, Cambridge (2000)Google Scholar
  5. 5.
    Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Zhang, Z.Y., Zhang, X.S.: Two improvements of NMF used for tumor clustering. In: 1st Int. Symposium on Optimization and Systems Biology, pp. 242–249 (2007)Google Scholar
  7. 7.
    Carmona-Saez, P., Pascual-Marqui, R.D., Tirado, F., Carazo, J., Pascual-Montano, A.: Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinf. (2006)Google Scholar
  8. 8.
    Fogel, P., Young, S., Hawkins, D., Ledirac, N.: Inferential, robust non-negative matrix factorization analysis of microarray data. BMC Bioinf. 23(1) (2007)Google Scholar
  9. 9.
    Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. National Academy of Science 101 (2004)Google Scholar
  10. 10.
    Guimet, F., Boque, R., Ferre, J.: Application of non-negative matrix factorization combined with fishers linear discriminant analysis for classification of olive oil excitation emission fluorescence spectra. Chemometrics and Intelligent Laboratory Systems 81, 94–106 (2006)CrossRefGoogle Scholar
  11. 11.
    Ribeiro, B., Silva, C., Vieira, A., Neves, J.: Extracting Discriminative Features Using Non-Negative Matrix Factorization in Financial Distress Data. In: Kolehmainen, V., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495. Springer, Heidelberg (2009)Google Scholar
  12. 12.
    Shahnaz, F., Berry, M., Pauca, V., Plemmons, R.: Document clustering using nonnegative matrix factorization. Information Processing and Management: an International Journal 42(2), 373–386 (2006)CrossRefzbMATHGoogle Scholar
  13. 13.
    Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: ACM SIGIR 2003, pp. 267–273 (2003)Google Scholar
  14. 14.
    Berry, M., Browne, M., Langville, A., Pauca, V., Plemmons, R.: Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis 52(1), 155–173 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Computation 19(10), 2756–2779 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval (1999)Google Scholar
  17. 17.
    Lin, C.J.: On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Tran. on Neural Networks 6(18), 1589–1596 (2007)Google Scholar
  18. 18.
    Chu, M., Plemmons, R.J.: Nonnegative matrix factorization and applications. IMAGE 34, 1–25 (2005)Google Scholar
  19. 19.
    Almeida, A., Júdice, J., Fernandes, L., Patrício, J.: On the computation of a nonnegative matrix factorization and its application in telecommunications. In: 7th Conference on Telecommunications (2009)Google Scholar
  20. 20.
    Sebastiani, F.: Classification of Text, Automatic. In: Brown, K. (ed.) The Encyclopedia of Language and Linguistics, 2nd edn., vol. 14. Elsevier, Amsterdam (2006)Google Scholar
  21. 21.
    Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1999)zbMATHGoogle Scholar
  22. 22.
    Apté, C., Damerau, F., Weiss, S.: Automated Learning of Decision Rules for Text Categorization. ACM Trans. for Information Sys. 12, 233–251 (1994)CrossRefGoogle Scholar
  23. 23.
    van Rijsbergen, C.: Information Retrieval. Butterworths (1979)Google Scholar
  24. 24.
    Ruiz, M., Srinivasan, P.: Automatic Text Categorization and Its Application to Text Retrieval. IEEE Tran. Know. Data Eng. 11(6), 865–879 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Catarina Silva
    • 1
    • 2
  • Bernardete Ribeiro
    • 2
  1. 1.School of Technology and Management of the Polytechnic Institute of LeiriaLeiriaPortugal
  2. 2.Department of Informatics Engineering, Center for Informatics and Systems (CISUC)University of Coimbra, Polo IICoimbraPortugal

Personalised recommendations