Abstract
This paper describes algorithms for non-negative factorization of sparse matrices and tensors, which is a popular technology in artificial intelligence in general and in computer linguistics in particular. It is proposed to use the latent Dirichlet distribution to reduce matrices and tensors to block-diagonal form for parallelizing computations and accelerating non-negative factorization of linguistic matrices and tensors of extremely large dimension. The proposed model also allows to supplement models with new data without performing non-negative factorization of the entire very large tensor anew from the very beginning.
Similar content being viewed by others
References
W. Xu, X. Liu, and Y. Gong, “Document-clustering based on n-negative matrix factorization,” in: Proc. SIGIR’2003 (2003), pp. 267–273.
F. Shahnaz, M. W. Berry, V. Paul Pauca, and R. J. Plemmons, “Document clustering using nonnegative matrix factorization,” Information Processing and Management, Vol. 42, 649–660 (2006).
A. Anisimov, O. Marchenko, A. Nikolenko, E. Porkhun, and V. Taranukha, “Ukrainian WordNet: Creation and Filling,” in: H. L. Larsen, M. J. Marnin-Bautista, M. A. Vila, T. Andreasen, and H. Christiasen (eds.), Flexible Query Answering Systems (FQAS 2013). Lecture Notes in Computer Science, Vol. 8132, 649–660 (2013).
T. Van De Cruys, “A non-negative tensor factorization model for selectional preference induction,” Journal of Natural Language Engineering, Vol. 16, No. 4, 417–437 (2010).
T. Van De Cruys, L. Rimell, T. Poibeau, and A. Korhonen, “Multi-way tensor factorization for unsupervised lexical acquisition,” in: Proc. COLING-2012 (2012), pp. 2703–2720.
O. O. Marchenko, “A method for automatic construction of ontological knowledge bases. I. Development of a semantic-syntactic model of natural language,” Cybernetics and Systems Analysis, Vol. 52, No. 1, 20–29 (2016).
B. W. Bader and T. G. Kolda, Matlab Tensor Toolbox Version 2.5. URL: http://www.sandia.gov/~tgkolda/TensorToolbox/.
K. Kanjani, “Parallel non negative matrix factorization for document clustering,” Tech. Rep., Texas A & M University (2007). URL: https://pdfs.semanticscholar.org/66ad/868f7fe55db5b64f963533a6cb8e9a245257.pdf.
V. Kysenko, K. Rupp, O. Marchenko, S. Selberherr, and A. Anisimov, “GPU accelerated non-negative matrix factorization for text mining,” Natural Language Processing and Information Systems. Lecture Notes in Computer Science, Vol. 7337, 158–163 (2012).
C. Liu, H.-C. Yang, J. Fan, L.-W. He, and Y.-M. Wang, “Distributed non-negative matrix factorization for web-scale dyadic data analysis on mapreduce,” in: Proc. 19th Intern. Conf. on World Wide Web (WWW’10) (2010), pp. 681–690.
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, Vol. 41, No. 6, 391–407 (1990).
P. Paatero and U. Tapper, “Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values,” Environmetrics, Vol. 5, No. 2, 111–126 (1994).
D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in: Advances in Neural Information Processing Systems 13 (NIPS 2000) (2000), pp. 556–562.
S. A. Vavasis, “On the complexity of non-negative matrix factorization,” SIAM J. Optim., Vol. 20, 1364–1377 (2009).
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, Vol. 3, 993–1022 (2003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated from Kibernetika i Sistemnyi Analiz, No. 6, November–December, 2018, pp. 3–10.
Rights and permissions
About this article
Cite this article
Anisimov, A.V., Marchenko, O.O. & Nasirov, E.Ì. Block-Diagonal Approach to Non-Negative Factorization of Sparse Linguistic Matrices and Tensors of Extra-Large Dimension Using the Latent Dirichlet Distribution. Cybern Syst Anal 54, 853–859 (2018). https://doi.org/10.1007/s10559-018-0087-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10559-018-0087-z