Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset

Glänzel, Wolfgang; Thijs, Bart

doi:10.1007/s11192-017-2301-6

Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset

Published: 21 February 2017

Volume 111, pages 1071–1087, (2017)
Cite this article

Scientometrics Aims and scope Submit manuscript

Wolfgang Glänzel^1,2 &
Bart Thijs¹

1166 Accesses
48 Citations
Explore all metrics

Abstract

Based on a dataset on Astronomy and Astrophysics, hybrid cluster analyses have been conducted. In order to obtain an optimum solution and to analyse possible issues resulting from the bibliometric methodologies used, we have systematically studied three models and, within these models, two scenarios each. The hybrid clustering was based on a combination of bibliographic coupling and textual similarities using the Louvain method at two resolution levels. The procedure resulted in three clearly hierarchical structures with six and thirteen, seven and thirteen and finally five and eleven clusters, respectively. These structures are analysed with the help of a concordance table. The statistics reflect a high quality of classification. The results of these three models are presented, discussed and compared with each other. For labelling and interpreting clusters, core documents representing the obtained clusters are used. Furthermore, these core documents help depict the internal structure of the complete network and the clusters. This work has been done as part of the international project ‘Measuring the Diversity of Research’ and in the framework a special workshop on the comparative analysis of algorithms for the identification of topics in science organised in Berlin in August 2014.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Artificial intelligence to automate the systematic review of scientific literature

Article Open access 11 May 2023

References

Ahlgren, P., & Colliander, C. (2009). Document–document similarity approaches and science mapping: experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63.
Article Google Scholar
Batagelj, V., & Mrvar, A. (2003). Pajek-analysis and visualization of large networks. In M. Jünger & P. Mutzel (Eds.), Graph drawing software (pp. 77–103). Berlin: Springer.
Google Scholar
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008. doi:10.1088/1742-5468/2008/10/P10008.
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404.
Article Google Scholar
Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry. Scientometrics, 22(1), 155–205.
Article Google Scholar
Garfield, E. (1969). Permuterm Subject Index—The primordial dictionary of science. Current Contents, 12(22), 4.
Google Scholar
Glänzel, W. (2012). The role of core documents in bibliometric network analysis and their relation with h-type indices. Scientometrics, 93(1), 113–123.
Article Google Scholar
Glänzel, W., & Czerwon, H. J. (1996). A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level. Scientometrics, 37(2), 195–221.
Article Google Scholar
Glänzel, W., & Thijs, B. (2011). Using ‘core documents’ for the representation of clusters and topics. Scientometrics, 88(1), 297–309.
Article Google Scholar
Glänzel, W., & Thijs, B. (2012a). Hybrid solutions—The best of all possible worlds? Bibliometrie & Praxis und Forschung, 1(3). doi:10.5283/bpf.156.
Glänzel, W., & Thijs, B. (2012b). Using ‘core documents’ for detecting and labelling new emerging topics. Scientometrics, 91(2), 399–416.
Article Google Scholar
Glänzel, W., & Thijs, B. (2015). Using hybrid methods and ‘core documents’ for the representation of clusters and topics. The astronomy dataset. In A. A. Salah, Y. Tonta, A. A. Salah, C. Sugimoto, & U. Al (Eds.), Proceedings of ISSI 2015—The 15th international conference on scientometrics and informetrics (pp. 1085–1090). Istanbul: Turkey.
Google Scholar
Glenisson, P., Glänzel, W., Janssens, F., & de Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41(6), 1548–1572.
Article Google Scholar
Gould, R. V., & Fernandez, R. M. (1989). Structures of mediation: a formal approach to brokerage in transaction networks. Sociological Methodology, 19, 89–126.
Article Google Scholar
Hicks, D. (1987). Limitations of co-citation analysis as a tool for science policy. Social Studies of Science, 17(2), 295–316.
Article Google Scholar
Janssens, F. (2007). Clustering of scientific fields by integrating text mining and bibliometrics. Ph.D. Thesis, Faculty of Engineering, Katholieke Universiteit Leuven, Belgium. http://www.hdl.handle.net/1979/847.
Janssens, F., Glänzel, W., & de Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631.
Article Google Scholar
Klein, D., & Manning, Ch. D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st annual meeting of the association for computational linguistics (pp. 423–430).
Kostoff, R. N., Eberhart, H. J., & Toothman, D. R. (1997). Database tomography for information retrieval. Journal of Information Science, 23(4), 301–311.
Article Google Scholar
Thijs, B., Glänzel, W., & Meyer, M. (2015). Using noun phrases extraction for the improvement of hybrid clustering with text- and citation-based components. The example of Information system research. In Proceedings of the workshop mining scientific papers: Computational linguistics and bibliometrics (Vol. 1384). International Society of Scientometrics and Informetrics Conference (ISSI). Istanbul (Turkey), 29 June 2015. http://ceur-ws.org/Vol-1384/.
Thijs, B., Schiebel, E., & Glänzel, W. (2013). Do second-order similarities provide added-value in a hybrid approach? Scientometrics, 96(3), 667–677.
Article Google Scholar
Todorov, R. (1992). Displaying content of scientific journals: A co-heading analysis. Scientometrics, 23(2), 319–334.
Article MathSciNet Google Scholar
Turner, W. A., Chartron, G., Laville, F., & Michelet, M. (1988). Packaging information for peer review: new co-word analysis techniques. In A. van Raan (Ed.), Handbook of quantitative studies of science and technology. North Holland: Elsevier.
Google Scholar
Zitt, M., & Basseacoulard, E. (1996). Reassessment of co-citation methods for science indicators: Effect of methods improving recall rates. Scientometrics, 37(2), 223–244.
Article Google Scholar

Download references

Acknowledgements

This work has been done as part of the international project ‘Measuring the Diversity of Research’ and in the framework a special workshop on the comparative analysis of algorithms for the identification of topics in science organised in Berlin in August 2014. The project and workshop series was jointly organised by the Humboldt Universität and Technische Universität Berlin. We would like to acknowledge their support of our study. We also thank all project members for their comments and discussion. Above all, we would like to thank the internal reviewers Kevin Boyack and Shenghui Wang as well as the anonymous external referees for their valuable comments and suggestions that resulted in a substantial improvement of the manuscript.

Author information

Authors and Affiliations

ECOOM and Department of MSI, KU Leuven, Louvain, Belgium
Wolfgang Glänzel & Bart Thijs
Department of Science Policy and Scientometrics, Library of the Hungarian Academy of Sciences, Budapest, Hungary
Wolfgang Glänzel

Authors

Wolfgang Glänzel
View author publications
You can also search for this author in PubMed Google Scholar
Bart Thijs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfgang Glänzel.

Appendix

The first 20 core documents of Cluster #3 (‘General Theory of Cosmology’) in scenario 1 of model 1 (Bibliographic Coupling)

WoS accession code	Degree	Title
000264174700027	432	An introduction to the dark energy problem
000225317900001	324	Sudden future singularities in FLRW cosmologies
000223638500012	309	Supernova constraints on a holographic dark energy model
000228261200001	299	Quantum fields and ‘big rip’ expansion singularities
000245928000021	253	Exploring the properties of dark energy using type-Ia supernovae and other datasets
000230889600014	224	Parametrization of quintessence and its potential
000220801900003	219	Constraints on a Cardassian model from Type Ia supernova data, revisited
000250363000014	210	Measuring the baryon acoustic oscillation scale using the sloan digital sky survey and 2dF galaxy redshift survey
000245827600007	205	A modified Chaplygin gas model with interaction
000244080700016	205	Statefinder parameters for interacting phantom energy with dark matter
000245405900001	199	Constraints on the generalized Chaplygin gas model from recent supernova data and baryonic acoustic oscillations
000240874500033	194	High redshift detection of the integrated Sachs–Wolfe effect
000183786100002	187	The coincidence of Friedmann integrals
000186983100013	186	Generalized chaplygin gas with alpha = 0 and the Lambda CDM cosmological model
000241963800007	184	Gravitational collapse due to dark matter and dark energy in the braneworld scenario
000234274900033	182	Comparison of the legacy and gold type Ia supernovae dataset constraints on dark energy models
000229888900007	179	Escaping the big rip?
000228112400010	179	Cosmology with interaction between phantom dark energy and dark matter and the coincidence problem
000185229300023	178	k-essence and the coincidence problem
000244535200025	175	Lemaitre–Tolman–Bondi universes as alternatives to dark energy: Does positive averaged acceleration imply positive cosmic acceleration?

Rights and permissions

Reprints and permissions

About this article

Cite this article

Glänzel, W., Thijs, B. Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset. Scientometrics 111, 1071–1087 (2017). https://doi.org/10.1007/s11192-017-2301-6

Download citation

Received: 06 June 2016
Published: 21 February 2017
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11192-017-2301-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

A Comprehensive Survey of Clustering Algorithms

Artificial intelligence to automate the systematic review of scientific literature

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

The first 20 core documents of Cluster #3 (‘General Theory of Cosmology’) in scenario 1 of model 1 (Bibliographic Coupling)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

A Comprehensive Survey of Clustering Algorithms

Artificial intelligence to automate the systematic review of scientific literature

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

The first 20 core documents of Cluster #3 (‘General Theory of Cosmology’) in scenario 1 of model 1 (Bibliographic Coupling)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation