Skip to main content
Log in

Diagnostics of the Topic Model for a Collection of Text Messages Based on Hierarchical Clustering of Terms

  • Published:
Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

Abstract

The problem of constructing a correct topic model is relevant for the automatic processing of large collections of text messages tasks. This paper considers an approach to the topic categorization assessing for a collection of short text messages (labeled up by experts) based on the clustering of terms that make up the message text. The results of a computer experiment on clustering a set of terms are presented and discussed. As part of the experiment, a graph was constructed that reflects the relations and some numerical characteristics of the terms that form the topic model for a collection of messages. Analysis of the structure of the graph allows one to generate some practical recommendations for reorganizing the topic model presented in the expert labeled text collection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

REFERENCES

  1. A. Sychev, ‘‘An approach to processing news text messages based on markeme analysis,’’ in Supplementary Proceedings of the 23rd International Conference on Data Analytics and Management in Data Intensive Domains DAMDID/RCDL 2021, Moscow, Russia, Oct. 26–29, 2021, CEUR Workshop Proc. 3036, 313–324 (2021). https://ceur-ws.org/Vol-3036/paper25.pdf.

    MathSciNet  Google Scholar 

  2. F. Steuber, M. Schoenfeld, and G. D. Rodosek, ‘‘Topic modeling of short texts using anchor words,’’ in WIMS 2020: Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics, June 2020 (2020), pp. 210–219. https://doi.org/10.1145/3405962.3405968

  3. T. Hofmann, ‘‘Probabilistic latent semantic analysis,’’ in Proceedings of the 22nd Annual Intranational ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 1999), pp. 50–57. https://doi.org/10.1145/312624.312649

    Google Scholar 

  4. D. M. Blei, A. Y. Ng, and M. I. Jordan, ‘‘Latent Dirichlet allocation,’’ J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.5555/944919.944937

    MATH  Google Scholar 

  5. T. Liu, N. L. Zhang, and P. Chen, ‘‘Hierarchical latent tree analysis for topic detection,’’ in Machine Learning and Knowledge Discovery in Databases ECML PKDD 2014, Lect. Notes Comput. Sci. 8725, 256 (2014). https://doi.org/10.1007/978-3-662-44851-9_17

  6. Y. Meng, Y. Zhang, J. Huang, et al., ‘‘Hierarchical topic mining via joint spherical tree and text embedding,’’ in KDD’20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2020 (2020), pp. 1908–1917. https://doi.org/10.1145/3394486.3403242

  7. S. Koltcov, V. Ignatenko, M. Terpilovskii, et al., ‘‘Analysis and tuning of hierarchical topic models based on Renyi entropy approach,’’ PeerJ Comput. Sci. 7, e608 (2021). https://doi.org/10.7717/peerj-cs.608

  8. W. Li and A. McCallum,‘‘Pachinko allocation: DAG-structured mixture models of topic correlations,’’ in ICML’06: Proceedings of the 23rd International Conference on Machine Learning, June 2006 (2006), pp. 577–584. https://doi.org/10.1145/1143844.1143917

  9. Y. Yang, Q. Yao, and H. Qu, ‘‘VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling,’’ Source Visual Inform. 1 (1), 40–47 (2017). https://doi.org/10.1016/j.visinf.2017.01.005

    Article  Google Scholar 

  10. D. I. Sorokin, A. S. Nuzhny, and E. A. Saveleva, ‘‘Hierarchical rubrication of text documents,’’ Tr. ISP RAN 32 (6), 127–136 (2020). https://doi.org/10.15514/ISPRAS-2020-32(6)-10

    Article  Google Scholar 

  11. A. A. Kuzmin and V. V. Strijov, ‘‘Validation of the thematic models for document collections,’’ Software Eng. 4, 16–20 (2013). http://strijov.com/papers/Kuzmin2013ThematicClustering.pdf.

    Google Scholar 

  12. A. C. Zlatov and A. A. Kuzmin, ‘‘Thematic model of major conference proceedings,’’ Iskusstv. Intell. Prin. Reshen. 3, 77–86 (2016).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. V. Sychev.

Additional information

(Submitted by E. K. Lipachev)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sychev, A.V. Diagnostics of the Topic Model for a Collection of Text Messages Based on Hierarchical Clustering of Terms. Lobachevskii J Math 44, 219–226 (2023). https://doi.org/10.1134/S1995080223010390

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1995080223010390

Keywords:

Navigation