Assessing the Effectiveness of Topic Modeling Algorithms in Discovering Generic Label with Description

Rahman, Shadikur; Hossain, Syeda Sumbul; Arman, Md. Shohel; Rawshan, Lamisha; Toma, Tapushe Rabaya; Rafiq, Fatama Binta; Badruzzaman, Khalid Been Md.

doi:10.1007/978-3-030-39442-4_18

Shadikur Rahman¹⁷,
Syeda Sumbul Hossain¹⁷,
Md. Shohel Arman¹⁷,
Lamisha Rawshan¹⁷,
Tapushe Rabaya Toma¹⁷,
Fatama Binta Rafiq¹⁷ &
…
Khalid Been Md. Badruzzaman¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1130))

Included in the following conference series:

Future of Information and Communication Conference

1351 Accesses
1 Citations

Abstract

Analyzing short text or documents using topic modeling becomes a popular solutions for the increasing number of documents produced in everyday life. For handling the large amount of documents, many topic modeling algorithms are used e.g. LDA, LSI, pLSI, NMF. In this study, we have used LDA, LSI, NMF and also lexical database wordNet synset for candidate labels in our topics labeling. And finally compare the effectiveness of topic modeling algorithms for short documents. Among those LDA gives the better result in terms of WUP similarity. This study will help to select the proper algorithm for labeling topics and can easily identify the meaning of topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/sadirahman/Effectiveness-of-Topic-ModelingAlgorithms-in-Discovering-Generic-Label-withDescription.

References

Aker, A., Paramita, M., Kurtic, E., Funk, A., Barker, E., Hepple, M., Gaizauskas, R.: Automatic label generation for news comment clusters. In: Proceedings of the 9th International Natural Language Generation Conference, pp. 61–69 (2016)
Google Scholar
Basave, A.E.C., He, Y., Xu, R.: Automatic labelling of topic models learned from twitter by summarisation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 618–624 (2014)
Google Scholar
Bhatia, S., Lau, J.H., Baldwin, T.: Automatic labelling of topics with neural embeddings. arXiv preprint arXiv:1612.05340 (2016)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Hossain, S.S., Ul-Hassan, R., Rahman, S.: Polynomial topic distribution with topic modeling for generic labeling. In: Communications in Computer and Information Science, vol. 1046, pp. 413–419. Springer (2019)
Google Scholar
Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using DBPedia. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 465–474. ACM (2013)
Google Scholar
Kou, W., Li, F., Baldwin, T.: Automatic labelling of topic models using word vectors and letter trigram vectors. In: AIRS, pp. 253–264. Springer (2015)
Google Scholar
Lau, J.H., Grieser, K., Newman, D., Baldwin, T.: Automatic labelling of topic models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1536–1545. Association for Computational Linguistics (2011)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788 (1999)
Article Google Scholar
Mei, Q., Shen, X., Zhai, C.X.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 490–499. ACM (2007)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Niu, L., Dai, X., Zhang, J., Chen, J.: Topic2Vec: learning distributed representations of topics. In: 2015 International Conference on Asian Language Processing (IALP), pp. 193–196. IEEE (2015)
Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Sharing clusters among related groups: hierarchical Dirichlet processes. In: Advances in Neural Information Processing Systems, pp. 1385–1392 (2005)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Daffodil International University, Dhaka, Bangladesh
Shadikur Rahman, Syeda Sumbul Hossain, Md. Shohel Arman, Lamisha Rawshan, Tapushe Rabaya Toma, Fatama Binta Rafiq & Khalid Been Md. Badruzzaman

Authors

Shadikur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Syeda Sumbul Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Md. Shohel Arman
View author publications
You can also search for this author in PubMed Google Scholar
Lamisha Rawshan
View author publications
You can also search for this author in PubMed Google Scholar
Tapushe Rabaya Toma
View author publications
You can also search for this author in PubMed Google Scholar
Fatama Binta Rafiq
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Been Md. Badruzzaman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Syeda Sumbul Hossain .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahman, S. et al. (2020). Assessing the Effectiveness of Topic Modeling Algorithms in Discovering Generic Label with Description. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication. FICC 2020. Advances in Intelligent Systems and Computing, vol 1130. Springer, Cham. https://doi.org/10.1007/978-3-030-39442-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-39442-4_18
Published: 13 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39441-7
Online ISBN: 978-3-030-39442-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics