, Volume 103, Issue 2, pp 413–433 | Cite as

A decade of research in statistics: a topic model approach

  • Francesca De Battisti
  • Alfio Ferrara
  • Silvia Salini


Topic models are a well known clustering approach for textual data, which provides promising applications in the bibliometric context for the purpose of discovering scientific topics and trends in a corpus of scientific publications. However, topic models per se provide poorly descriptive metadata featuring the discovered clusters of publications and they are not related to the other important metadata usually available with publications, such as authors affiliation, publication venue, and publication year. In this paper, we propose a methodological approach to topic modeling and post-processing of topic models results to the end of describing in depth a field of research over time. In particular, we work on a selection of publications from the international statistical literature, we propose an approach that allows us to identify sophisticated topic descriptors, and we analyze the links between topics and their temporal evolution.


Probabilistic topic models Scientometrics Clustering Text mining 


  1. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.CrossRefMathSciNetGoogle Scholar
  2. Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17–35.CrossRefzbMATHMathSciNetGoogle Scholar
  3. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.zbMATHGoogle Scholar
  4. Ferrara, A., & Salini, S. (2012). Ten challenges in modeling bibliographic data for bibliometric analysis. Scientometrics, 93, 765–787.CrossRefGoogle Scholar
  5. Genest, C. (1997). Statistics on statistics: Measuring research productivity by journal publications between 1985 and 1995. The Canadian Journal of Statistics, 25(4), 427–433.CrossRefzbMATHGoogle Scholar
  6. Genest, C. (1999). Probability and statistics: A tale of two worlds? The Canadian Journal of Statistics, 27(2), 421–444.CrossRefzbMATHMathSciNetGoogle Scholar
  7. Genest, C. (2002). Worldwide research output in probability and statistics: An update. The Canadian Journal of Statistics, 30(2), 329–342.CrossRefzbMATHMathSciNetGoogle Scholar
  8. Grün, B., & Hornik, K. (2011). Topicsmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), 1–30.Google Scholar
  9. Gupta, H. M., Campahna, J. R., & Pesce, R. A. G. (2005). Power-law distributions for the citation index of scientific publications and scientists. Brazilian Journal of Physics, 35(4A), 981–986.CrossRefGoogle Scholar
  10. Hall, D., Jurafsky, D., & Manning, C. (2008). Studying the history of ideas using topic models. In proceedings of the conference on empirical methods in natural language processing (pp. 363–371). Honolulu, Hawaii: Association for Computational Linguistics.Google Scholar
  11. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.CrossRefGoogle Scholar
  12. Mimno, D., & Blei, D. (2011). Bayesian checking for topic models. In proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 227–237.Google Scholar
  13. Newman, M. E. J. (2006). Power laws, Pareto distribution and Zipf’s law. In arXiv:cond-mat/0412004v3.
  14. Ryan, T. P., & Woodall, W. H. (2005). The most-cited statistical papers. Journal of Applied Statistics, 32(5), 461–474.CrossRefzbMATHMathSciNetGoogle Scholar
  15. Schell, M. J. (2010). Identifying key statistical papers from 1985 to 2002 using citation data for applied biostatisticians. The American Statistician, 64(4), 310–317.CrossRefMathSciNetGoogle Scholar
  16. Steyvers, M., T. Griffiths, T. (2007). Probabilistic topic models. In Handbook of latent semantic analysis, chapter 21.Google Scholar
  17. Stigler, S. (1994). Citation patterns in the journals of statistics and probability. Statistical Science, 9(1), 94–108.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2015

Authors and Affiliations

  • Francesca De Battisti
    • 1
  • Alfio Ferrara
    • 2
  • Silvia Salini
    • 1
  1. 1.DEMMUniversità degli Studi di MilanoMilanItaly
  2. 2.DIUniversità degli Studi di MilanoMilanItaly

Personalised recommendations