A decade of research in statistics: a topic model approach
- 763 Downloads
Topic models are a well known clustering approach for textual data, which provides promising applications in the bibliometric context for the purpose of discovering scientific topics and trends in a corpus of scientific publications. However, topic models per se provide poorly descriptive metadata featuring the discovered clusters of publications and they are not related to the other important metadata usually available with publications, such as authors affiliation, publication venue, and publication year. In this paper, we propose a methodological approach to topic modeling and post-processing of topic models results to the end of describing in depth a field of research over time. In particular, we work on a selection of publications from the international statistical literature, we propose an approach that allows us to identify sophisticated topic descriptors, and we analyze the links between topics and their temporal evolution.
KeywordsProbabilistic topic models Scientometrics Clustering Text mining
- Grün, B., & Hornik, K. (2011). Topicsmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), 1–30.Google Scholar
- Hall, D., Jurafsky, D., & Manning, C. (2008). Studying the history of ideas using topic models. In proceedings of the conference on empirical methods in natural language processing (pp. 363–371). Honolulu, Hawaii: Association for Computational Linguistics.Google Scholar
- Mimno, D., & Blei, D. (2011). Bayesian checking for topic models. In proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 227–237.Google Scholar
- Newman, M. E. J. (2006). Power laws, Pareto distribution and Zipf’s law. In arXiv:cond-mat/0412004v3.
- Steyvers, M., T. Griffiths, T. (2007). Probabilistic topic models. In Handbook of latent semantic analysis, chapter 21.Google Scholar