Skip to main content
Log in

Optimizing Modality Weights in Topic Models of Transactional Data

  • THEMATIC ISSUE
  • Published:
Automation and Remote Control Aims and scope Submit manuscript

Abstract

Modern natural language processing models such as transformers operate multimodal data. In the present paper, multimodal data is explored using multimodal topic modeling on transactional data of bank corporate clients. A definition of the importance of modality for the model is proposed on the basis of which improvements are considered for two modeling scenarios: preserving the maximum amount of information by balancing modalities and automatic selection of modality weights to optimize auxiliary criteria based on topic representations of documents.

A model is proposed for adding numerical data to topic models in the form of modalities: each topic is assigned a normal distribution with learning parameters. Significant improvements are demonstrated in comparison with standard topic models on the problem of modeling bank corporate clients. Based on the topic representations of the bank’s customers, a 90-day delay on the loan is predicted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

  1. A matrix \(F \in \mathbb {R}^{m \times n} \) is said to be stochastic if \(F_{ij} \geqslant 0 \) and \(\sum \nolimits _{i = 1}^m F_{ij} = 1\), so that the columns form probability distributions.

  2. Unimodal representations are obtained using the M-step for a unimodal topic model with one modality with the same value of \(p_{tdw} \).

  3. The cardinality of a modality is the number of tokens of that modality in the document.

REFERENCES

  1. Vaswani, A., Shazeer, N., Parmar, N., et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 2017, vol. 30.

  2. Devlin, J., Chang, M., Lee, K., and Toutanova, K., BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. 2019 Conf. North Am. Ch. Assoc. Comput. Linguist.: Human Lang. Technol. Vol. 1 (Long and Short Pap.) (2019), pp. 4171–4186.

  3. Zhu, W., Tao, D., Cheng, X., et al., BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer, CIKM, 2019, pp. 1441–1450.

  4. Pavlovski, M., Gligorijevic, J., Stojkovic, I., et al., Time-aware user embeddings as a service, Proc. SIGKDD Conf. Knowl. Discovery Data Min. (2020).

  5. Reynolds, D., Gaussian Mixture Models, Boston: Springer, 2009.

    Book  Google Scholar 

  6. Hoffman, T., Probabilistic latent semantic indexing, Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (SIGIR’99) (1999), pp. 50–57.

  7. Dempster, A.P., Laird, N.M., and Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B, 1977, vol. 39, pp. 1–38.

    MathSciNet  MATH  Google Scholar 

  8. Kuhn, H.W. and Tucker, A.W., Nonlinear programming, Proc. Second Berkeley Symp. Math. Stat. Probab. (1950), pp. 481–492.

  9. Ke, G., Meng, Q., Finley, Th., et al., LightGBM: A highly efficient gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 2017, vol. 30.

  10. Paszke, A., Gross, S., Massa, F., et al., PyTorch: An imperative style, high-performance deep learning library, Adv. Neural Inf. Process. Syst., 2019, vol. 30.

  11. Vorontsov, V. and Potapenko, A., Additive regularization of topic models, Mach. Learn., 2015, vol. 101, pp. 303–323.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

This work was supported by the Russian Foundation for Basic Research, project no. 20-07-00936.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to K. Ya. Khrylchenko or K. V. Vorontsov.

Additional information

Translated by V. Potapchouck

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khrylchenko, K.Y., Vorontsov, K.V. Optimizing Modality Weights in Topic Models of Transactional Data. Autom Remote Control 83, 1908–1922 (2022). https://doi.org/10.1134/S00051179220120050

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S00051179220120050

Keywords

Navigation