Scientometrics

, Volume 111, Issue 2, pp 1119–1139 | Cite as

Contextualization of topics: browsing through the universe of bibliographic information

Article

Abstract

This paper describes how semantic indexing can help to generate a contextual overview of topics and visually compare clusters of articles. The method was originally developed for an innovative information exploration tool, called Ariadne, which operates on bibliographic databases with tens of millions of records (Koopman et al. in Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. doi:10.1145/2702613.2732781, 2015b). In this paper, the method behind Ariadne is further developed and applied to the research question of the special issue “Same data, different results”—the better understanding of topic (re-)construction by different bibliometric approaches. For the case of the Astro dataset of 111,616 articles in astronomy and astrophysics, a new instantiation of the interactive exploring tool, LittleAriadne, has been created. This paper contributes to the overall challenge to delineate and define topics in two different ways. First, we produce two clustering solutions based on vector representations of articles in a lexical space. These vectors are built on semantic indexing of entities associated with those articles. Second, we discuss how LittleAriadne can be used to browse through the network of topical terms, authors, journals, citations and various cluster solutions of the Astro dataset. More specifically, we treat the assignment of an article to the different clustering solutions as an additional element of its bibliographic record. Keeping the principle of semantic indexing on the level of such an extended list of entities of the bibliographic record, LittleAriadne in turn provides a visualization of the context of a specific clustering solution. It also conveys the similarity of article clusters produced by different algorithms, hence representing a complementary approach to other possible means of comparison.

Keywords

Random projection Clustering Visualization Topical modelling Interactive search interface Semantic map Knowledge map 

References

  1. Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4), 671–687. doi:10.1016/S0022-0000(03)00025-4. http://www.sciencedirect.com/science/article/pii/S0022000003000254.
  2. Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, (pp. 245–250). ACM, New York. doi:10.1145/502512.502546. http://doi.acm.org/10.1145/502512.502546
  3. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. P10008(12pp)Google Scholar
  4. Börner, K. (2011). Plug-and-play macroscopes. Communications of the ACM, 54(3), 60–69.CrossRefGoogle Scholar
  5. Boyack, K., & Klavans, R. (2010). Weaving the fabric of science. In K. Börner & E. F. Hardy (Eds.), 6th Iteration (2009): Science Maps for Scholars, Places and Spaces: Mapping Science. http://scimaps.org/.
  6. Boyack, K. W. (2017a). Investigating the effect of global data on topic detection. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  7. Boyack, K. W. (2017b). Thesaurus-based methods for mapping contents of publication sets. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  8. de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515. doi:10.1126/science.149.3683.510. http://www.sciencemag.org/content/149/3683/510.short.
  9. Galison, P. (1997). Image and logic: A material culture of microphysics. Chicago: University of Chicago Press.Google Scholar
  10. Glänzel, W., & Schubert, A. (2004). Analysing scientific networks through co-authorship. In H. F. Moed, W. Glänzel, & U. Schmoch (Eds.), Handbook of quantitative science and technology research (pp. 257–276). Berlin: Springer. doi:10.1007/1-4020-2755-9_12.Google Scholar
  11. Glänzel, W., & Thijs, B. (2017). Using hybrid methods and ‘core documents’ for the representation of clusters and topics. the astronomy dataset. In: J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  12. Gläser, J., Glänzel, W., & Scharnhorst, A. (2017). Introduction to the special issue “same data, different results?”. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  13. Havemann, F., Gläser, J., & Heinz, M. (2017). Memetic search for overlapping topics. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  14. Havemann, F., & Scharnhorst, A. (2012). Bibliometric networks. CoRR arXiv:1212.5211.
  15. Janssens, F., Zhang, L., Moor, B. D., & Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management, 45(6), 683–702. doi:10.1016/j.ipm.2009.06.003. http://www.sciencedirect.com/science/article/pii/S0306457309000673.
  16. Johnson, W., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26, 189–206.MathSciNetCrossRefMATHGoogle Scholar
  17. Koopman, R., & Wang, S. (2017). Mutual information based labelling and comparing clusters. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  18. Koopman, R., Wang, S., & Scharnhorst, A. (2015). Contextualization of topics—Browsing through terms, authors, journals and cluster allocations. In A. A. Salah, Y. Tonta, A. A. A. Salah, C. R. Sugimoto, & U. Al (Eds.), Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29 June to 3 July, 2015.Google Scholar
  19. Koopman, R., Wang, S., Scharnhorst, A., & Englebienne, G. (2015). Ariadne’s thread: Interactive navigation in a world of networked information. In B. Begole, J. Kim, K. Inkpen, & W. Woo (Eds.), Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, Seoul, CHI 2015 Extended Abstracts, Republic of Korea, April 18–23, 2015, (pp. 1833–1838). ACM. doi:10.1145/2702613.2732781. http://doi.acm.org/10.1145/2702613.2732781.
  20. Kouw, M., Heuvel, C. V. D., & Scharnhorst, A. (2013). Exploring uncertainty in knowledge representations: Classifications, simulations, and models of the world. In P. Wouters, A. Beaulieu, A. Scharnhorst, & S. Wyatt (Eds.), Virtual knowledge. Experimenting in the humanities and the social sciences (pp. 89–126). Cambridge: MIT Press.Google Scholar
  21. Leydesdorff, L., & Welbers, K. (2011). The semantic mapping of words and co-words in contexts. Journal of Informetrics, 5(3), 469–475. doi:10.1016/j.joi.2011.01.008.CrossRefGoogle Scholar
  22. Lu, K., & Wolfram, D. (2012). Measuring author research relatedness: A comparison of word-based, topic-based, and author cocitation approaches. Journal of the American Society for Information Science and Technology, 63(10), 1973–1986. doi:10.1002/asi.22628.CrossRefGoogle Scholar
  23. Mahalanobis, P. C. (1936). On the generalised distance in statistics. Proceedings National Institute of Science, India, 2(1), 49–55.MathSciNetMATHGoogle Scholar
  24. Mali, F., Kronegger, L., Doreian, P., & Ferligoj, A. (2012). Dynamic scientific co-authorship networks. In A. Scharnhorst, K. Börner & P. van den Besselaar (Eds.), Models of Science Dynamics, Understanding Complex Systems (pp. 195–232). Springer, Berlin. doi:10.1007/978-3-642-23068-4_6.
  25. Mayr, P., & Scharnhorst, A. (2015). Scientometrics and information retrieval: weak-links revitalized. Scientometrics, 102(3), 2193–2199. doi:10.1007/s11192-014-1484-3.CrossRefGoogle Scholar
  26. Mutschke, P., & Mayr, P. (2014). Science models for search: A study on combining scholarly information retrieval and scientometrics. Scientometrics 1–23. doi:10.1007/s11192-014-1485-2.
  27. Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (2000). Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences, 61(2), 217–235. doi:10.1006/jcss.2000.1711. http://www.sciencedirect.com/science/article/pii/S0022000000917112.
  28. Petersen, A. (2006). Simulating nature: A philosophical study of computer-simulation uncertainties and their role in climate science and policy advice. Apeldoorn: Het Spinhuis.Google Scholar
  29. Radicchi, F., Fortunato, S., & Vespignani, A. (2012). Citation networks. In A. Scharnhorst, K. Börner, & P. Besselaar (Eds.), Models of Science Dynamics, Understanding Complex Systems, vol. 69, chap. 7, (pp. 233–257). Springer, Berlin. doi:10.1007/978-3-642-23068-4_7.
  30. Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. New York: McGraw-Hill Inc.MATHGoogle Scholar
  31. Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  32. Van Heur, B., Leydesdorff, L., & Wyatt, S. (2013). Turning to ontology in STS? Turning to STS through “ontology”. Social Studies of Science, 43(3), 341–362. doi:10.1177/030631271245814.CrossRefGoogle Scholar
  33. Velden, T., Boyack, K., van Eck, N., Glänzel, W., Gläser, J., & Havemann, F., et al. (2017). Comparison of topic extraction approaches and their results. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  34. Velden, T., Yan, S., & Lagoze, C. (2017). Mapping the cognitive structure of astrophysics by infomap. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  35. Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11, 2837–2854.MathSciNetMATHGoogle Scholar
  36. Wang, S., & Koopman, R. (2017). Clustering articles based on semantic similarity. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results? (pp. 234–556). Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.Google Scholar
  37. Zitt, M., & Bassecoulard, E. (2006). Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences. Information Processing and Management, 42(6), 1513–1531. doi:10.1016/j.ipm.2006.03.016. http://www.sciencedirect.com/science/article/pii/S0306457306000379. Special Issue on Informetrics.
  38. Zitt, M., Lelu, A., & Bassecoulard, E. (2011). Hybrid citation-word representations in science mapping: Portolan charts of research fields? Journal of the American Society for Information Science and Technology, 62, 19–39.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2017

Authors and Affiliations

  • Rob Koopman
    • 1
  • Shenghui Wang
    • 1
  • Andrea Scharnhorst
    • 2
  1. 1.OCLC ResearchLeidenThe Netherlands
  2. 2.DANS-KNAWThe HagueThe Netherlands

Personalised recommendations