Optimizing Modality Weights in Topic Models of Transactional Data

Khrylchenko, K. Ya.; Vorontsov, K. V.

doi:10.1134/S00051179220120050

Optimizing Modality Weights in Topic Models of Transactional Data

THEMATIC ISSUE
Published: 16 March 2023

Volume 83, pages 1908–1922, (2022)
Cite this article

Automation and Remote Control Aims and scope Submit manuscript

K. Ya. Khrylchenko¹ &
K. V. Vorontsov¹

65 Accesses
Explore all metrics

Abstract

Modern natural language processing models such as transformers operate multimodal data. In the present paper, multimodal data is explored using multimodal topic modeling on transactional data of bank corporate clients. A definition of the importance of modality for the model is proposed on the basis of which improvements are considered for two modeling scenarios: preserving the maximum amount of information by balancing modalities and automatic selection of modality weights to optimize auxiliary criteria based on topic representations of documents.

A model is proposed for adding numerical data to topic models in the form of modalities: each topic is assigned a normal distribution with learning parameters. Significant improvements are demonstrated in comparison with standard topic models on the problem of modeling bank corporate clients. Based on the topic representations of the bank’s customers, a 90-day delay on the loan is predicted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

A matrix \(F \in \mathbb {R}^{m \times n} \) is said to be stochastic if \(F_{ij} \geqslant 0 \) and \(\sum \nolimits _{i = 1}^m F_{ij} = 1\), so that the columns form probability distributions.
Unimodal representations are obtained using the M-step for a unimodal topic model with one modality with the same value of \(p_{tdw} \).
The cardinality of a modality is the number of tokens of that modality in the document.

REFERENCES

Vaswani, A., Shazeer, N., Parmar, N., et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 2017, vol. 30.
Devlin, J., Chang, M., Lee, K., and Toutanova, K., BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. 2019 Conf. North Am. Ch. Assoc. Comput. Linguist.: Human Lang. Technol. Vol. 1 (Long and Short Pap.) (2019), pp. 4171–4186.
Zhu, W., Tao, D., Cheng, X., et al., BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer, CIKM, 2019, pp. 1441–1450.
Pavlovski, M., Gligorijevic, J., Stojkovic, I., et al., Time-aware user embeddings as a service, Proc. SIGKDD Conf. Knowl. Discovery Data Min. (2020).
Reynolds, D., Gaussian Mixture Models, Boston: Springer, 2009.
Book Google Scholar
Hoffman, T., Probabilistic latent semantic indexing, Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (SIGIR’99) (1999), pp. 50–57.
Dempster, A.P., Laird, N.M., and Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B, 1977, vol. 39, pp. 1–38.
MathSciNet MATH Google Scholar
Kuhn, H.W. and Tucker, A.W., Nonlinear programming, Proc. Second Berkeley Symp. Math. Stat. Probab. (1950), pp. 481–492.
Ke, G., Meng, Q., Finley, Th., et al., LightGBM: A highly efficient gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 2017, vol. 30.
Paszke, A., Gross, S., Massa, F., et al., PyTorch: An imperative style, high-performance deep learning library, Adv. Neural Inf. Process. Syst., 2019, vol. 30.
Vorontsov, V. and Potapenko, A., Additive regularization of topic models, Mach. Learn., 2015, vol. 101, pp. 303–323.
Article MathSciNet MATH Google Scholar

Download references

Funding

This work was supported by the Russian Foundation for Basic Research, project no. 20-07-00936.

Author information

Authors and Affiliations

Federal Research Center “Computer Science and Control,” Russian Academy of Sciences, Moscow, 119333, Russia
K. Ya. Khrylchenko & K. V. Vorontsov

Authors

K. Ya. Khrylchenko
View author publications
You can also search for this author in PubMed Google Scholar
K. V. Vorontsov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to K. Ya. Khrylchenko or K. V. Vorontsov.

Additional information

Translated by V. Potapchouck

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khrylchenko, K.Y., Vorontsov, K.V. Optimizing Modality Weights in Topic Models of Transactional Data. Autom Remote Control 83, 1908–1922 (2022). https://doi.org/10.1134/S00051179220120050

Download citation

Received: 31 January 2022
Revised: 18 May 2022
Accepted: 29 June 2022
Published: 16 March 2023
Issue Date: December 2022
DOI: https://doi.org/10.1134/S00051179220120050

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing Modality Weights in Topic Models of Transactional Data

Abstract

Access this article

Notes

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation