Scientometrics

, Volume 111, Issue 2, pp 1169–1221 | Cite as

Comparison of topic extraction approaches and their results

  • Theresa Velden
  • Kevin W. Boyack
  • Jochen Gläser
  • Rob Koopman
  • Andrea Scharnhorst
  • Shenghui Wang
Article

Abstract

This is the last paper in the Synthesis section of this special issue on ‘Same Data, Different Results’. We first provide a framework of how to describe and distinguish approaches to topic extraction from bibliographic data of scientific publications. We then compare solutions delivered by the different topic extraction approaches in this special issue, and explore where they agree and differ. This is achieved without reference to a ground truth, since we have to assume the existence of multiple, equally important, valid perspectives and want to avoid bias through the adoption of an ad-hoc yardstick. Instead, we apply different ways to quantitatively and visually compare solutions to explore their commonalities and differences and develop hypotheses about the origin of these differences. We conclude with a discussion of future work needed to develop methods for comparison and validation of topic extraction results, and express our concern about the lack of access to non-proprietary benchmark data sets to support method development in the field of scientometrics.

Keywords

Topic extraction Comparative methods Astrophysics Data modeling Clustering Topic labeling Science mapping 

References

  1. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), 10,008.CrossRefGoogle Scholar
  2. Boyack, K. (2017a). Investigating the effect of global data on topic detection. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2297-y.
  3. Boyack, K. W. (2017b). Thesaurus-based methods for mapping contents of publication sets. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2304-3.
  4. Boyack, K. W., Glänzel, W., Gläser, J., Havemann, F., Thijs, B., Van Eck, N.J., et al. (2017). Topic identification challenge. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2307-0.
  5. Burger, M., & Bujdosó, E. (1985). Oscillating chemical reactions as an example of the development of a subfield of science. In J. R. Field & M. Burger (Eds.), Oscillating and traveling waves in chemical systems (pp. 565–604). New York: Wiley.Google Scholar
  6. Dillo, I., van Horik, R., & Scharnhorst, A. (2013). Training in data curation as service in a federated data infrastructure-the frontoffice–backoffice model. In: International conference on theory and practice of digital libraries (pp. 205–215). Springer.Google Scholar
  7. Glänzel, W., & Thijs, B. (2017). Using hybrid methods and ‘core documents’ for the representation of clusters and topics. The astronomy dataset. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2301-6.
  8. Gläser, J. (2006). Wissenschaftliche Produktionsgemeinschaften: Die soziale Ordnung der Forschung (Vol. 906). Frankfurt am Main: Campus.Google Scholar
  9. Gläser, J., Glänzel, W., & Scharnhorst, A. (2017). Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2296-z.
  10. Havemann, F., Gläser, J., & Heinz, M. (2017). Memetic search for overlapping topics based on a local evaluation of link communities. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2302-5.
  11. Hicks, D., Wouters, P., Waltman, L., De Rijcke, S., & Rafols, I. (2015). Bibliometrics: The leiden manifesto for research metrics. Nature, 520, 429–431.CrossRefGoogle Scholar
  12. Koopman, R., & Wang, S. (2017a). Clustering articles based on semantic similarity. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2298-x.
  13. Koopman, R., & Wang, S. (2017b). Mutual information based labelling and comparing clusters. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2305-2.
  14. Koopman, R., Wang, S., & Scharnhorst, A. (2017). Contextualization of topics: Browsing through the universe of bibliographic information. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2303-4.
  15. MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge: Cambridge University Press.MATHGoogle Scholar
  16. Mayr, P., & Scharnhorst, A. (2015). Scientometrics and information retrieval: Weak-links revitalized. Scientometrics, 102(3), 2193–2199.CrossRefGoogle Scholar
  17. Petersen, A. C. (2012). Simulating nature: A philosophical study of computer-simulation uncertainties and their role in climate science and policy advice. Boca Raton: CRC Press.CrossRefGoogle Scholar
  18. Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4), 1118–1123.CrossRefGoogle Scholar
  19. Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications using CitNetExplorer and VOSviewer. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2300-7.
  20. Velden, T., & Lagoze, C. (2013). The extraction of community structures from publication networks to support ethnographic observations of field differences in scientific communication. Journal of the American Society for Information Science and Technology, 64(12), 2405–2427.CrossRefGoogle Scholar
  21. Velden, T., Yan, S., & Lagoze, C. (2017). Mapping the cognitive structure of astrophysics by infomap clustering of the citation network and topic affinity analysis. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2299-9.
  22. Waltman, L., & Eck, N. J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology, 63(12), 2378–2392.CrossRefGoogle Scholar
  23. Waltman, L., & van Eck, N. J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European Physical Journal B, 86(11), 1–14.CrossRefGoogle Scholar
  24. Xie, P., & Xing, E. P. (2013). Integrating document clustering and topic modeling. arXiv preprint arXiv:13096874.

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2017

Authors and Affiliations

  • Theresa Velden
    • 1
    • 2
  • Kevin W. Boyack
    • 3
  • Jochen Gläser
    • 2
  • Rob Koopman
    • 4
  • Andrea Scharnhorst
    • 5
  • Shenghui Wang
    • 4
  1. 1.University of Michigan School of InformationAnn ArborUSA
  2. 2.ZTGTechnical University BerlinBerlinGermany
  3. 3.SciTech Strategies, Inc.AlbuquerqueUSA
  4. 4.OCLC ResearchLeidenThe Netherlands
  5. 5.DANS-KNAWThe HagueThe Netherlands

Personalised recommendations