Skip to main content
Log in

Block-Diagonal Approach to Non-Negative Factorization of Sparse Linguistic Matrices and Tensors of Extra-Large Dimension Using the Latent Dirichlet Distribution

  • CYBERNETICS
  • Published:
Cybernetics and Systems Analysis Aims and scope

Abstract

This paper describes algorithms for non-negative factorization of sparse matrices and tensors, which is a popular technology in artificial intelligence in general and in computer linguistics in particular. It is proposed to use the latent Dirichlet distribution to reduce matrices and tensors to block-diagonal form for parallelizing computations and accelerating non-negative factorization of linguistic matrices and tensors of extremely large dimension. The proposed model also allows to supplement models with new data without performing non-negative factorization of the entire very large tensor anew from the very beginning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. W. Xu, X. Liu, and Y. Gong, “Document-clustering based on n-negative matrix factorization,” in: Proc. SIGIR’2003 (2003), pp. 267–273.

  2. F. Shahnaz, M. W. Berry, V. Paul Pauca, and R. J. Plemmons, “Document clustering using nonnegative matrix factorization,” Information Processing and Management, Vol. 42, 649–660 (2006).

    Article  Google Scholar 

  3. A. Anisimov, O. Marchenko, A. Nikolenko, E. Porkhun, and V. Taranukha, “Ukrainian WordNet: Creation and Filling,” in: H. L. Larsen, M. J. Marnin-Bautista, M. A. Vila, T. Andreasen, and H. Christiasen (eds.), Flexible Query Answering Systems (FQAS 2013). Lecture Notes in Computer Science, Vol. 8132, 649–660 (2013).

    Chapter  Google Scholar 

  4. T. Van De Cruys, “A non-negative tensor factorization model for selectional preference induction,” Journal of Natural Language Engineering, Vol. 16, No. 4, 417–437 (2010).

    Article  Google Scholar 

  5. T. Van De Cruys, L. Rimell, T. Poibeau, and A. Korhonen, “Multi-way tensor factorization for unsupervised lexical acquisition,” in: Proc. COLING-2012 (2012), pp. 2703–2720.

  6. O. O. Marchenko, “A method for automatic construction of ontological knowledge bases. I. Development of a semantic-syntactic model of natural language,” Cybernetics and Systems Analysis, Vol. 52, No. 1, 20–29 (2016).

    Article  MathSciNet  Google Scholar 

  7. B. W. Bader and T. G. Kolda, Matlab Tensor Toolbox Version 2.5. URL: http://www.sandia.gov/~tgkolda/TensorToolbox/.

  8. K. Kanjani, “Parallel non negative matrix factorization for document clustering,” Tech. Rep., Texas A & M University (2007). URL: https://pdfs.semanticscholar.org/66ad/868f7fe55db5b64f963533a6cb8e9a245257.pdf.

  9. V. Kysenko, K. Rupp, O. Marchenko, S. Selberherr, and A. Anisimov, “GPU accelerated non-negative matrix factorization for text mining,” Natural Language Processing and Information Systems. Lecture Notes in Computer Science, Vol. 7337, 158–163 (2012).

    Article  Google Scholar 

  10. C. Liu, H.-C. Yang, J. Fan, L.-W. He, and Y.-M. Wang, “Distributed non-negative matrix factorization for web-scale dyadic data analysis on mapreduce,” in: Proc. 19th Intern. Conf. on World Wide Web (WWW’10) (2010), pp. 681–690.

  11. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, Vol. 41, No. 6, 391–407 (1990).

    Article  Google Scholar 

  12. P. Paatero and U. Tapper, “Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values,” Environmetrics, Vol. 5, No. 2, 111–126 (1994).

    Article  Google Scholar 

  13. D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in: Advances in Neural Information Processing Systems 13 (NIPS 2000) (2000), pp. 556–562.

  14. S. A. Vavasis, “On the complexity of non-negative matrix factorization,” SIAM J. Optim., Vol. 20, 1364–1377 (2009).

    Article  MathSciNet  Google Scholar 

  15. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, Vol. 3, 993–1022 (2003).

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. V. Anisimov.

Additional information

Translated from Kibernetika i Sistemnyi Analiz, No. 6, November–December, 2018, pp. 3–10.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anisimov, A.V., Marchenko, O.O. & Nasirov, E.Ì. Block-Diagonal Approach to Non-Negative Factorization of Sparse Linguistic Matrices and Tensors of Extra-Large Dimension Using the Latent Dirichlet Distribution. Cybern Syst Anal 54, 853–859 (2018). https://doi.org/10.1007/s10559-018-0087-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10559-018-0087-z

Keywords

Navigation