Skip to main content
Log in

Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Based on a dataset on Astronomy and Astrophysics, hybrid cluster analyses have been conducted. In order to obtain an optimum solution and to analyse possible issues resulting from the bibliometric methodologies used, we have systematically studied three models and, within these models, two scenarios each. The hybrid clustering was based on a combination of bibliographic coupling and textual similarities using the Louvain method at two resolution levels. The procedure resulted in three clearly hierarchical structures with six and thirteen, seven and thirteen and finally five and eleven clusters, respectively. These structures are analysed with the help of a concordance table. The statistics reflect a high quality of classification. The results of these three models are presented, discussed and compared with each other. For labelling and interpreting clusters, core documents representing the obtained clusters are used. Furthermore, these core documents help depict the internal structure of the complete network and the clusters. This work has been done as part of the international project ‘Measuring the Diversity of Research’ and in the framework a special workshop on the comparative analysis of algorithms for the identification of topics in science organised in Berlin in August 2014.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Data sourced from Thomson Reuters Web of Science Core Collection

Fig. 2

Data sourced from Thomson Reuters Web of Science Core Collection

Fig. 3

Data sourced from Thomson Reuters Web of Science Core Collection

Similar content being viewed by others

References

  • Ahlgren, P., & Colliander, C. (2009). Document–document similarity approaches and science mapping: experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63.

    Article  Google Scholar 

  • Batagelj, V., & Mrvar, A. (2003). Pajek-analysis and visualization of large networks. In M. Jünger & P. Mutzel (Eds.), Graph drawing software (pp. 77–103). Berlin: Springer.

    Google Scholar 

  • Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008. doi:10.1088/1742-5468/2008/10/P10008.

  • Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404.

    Article  Google Scholar 

  • Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry. Scientometrics, 22(1), 155–205.

    Article  Google Scholar 

  • Garfield, E. (1969). Permuterm Subject Index—The primordial dictionary of science. Current Contents, 12(22), 4.

    Google Scholar 

  • Glänzel, W. (2012). The role of core documents in bibliometric network analysis and their relation with h-type indices. Scientometrics, 93(1), 113–123.

    Article  Google Scholar 

  • Glänzel, W., & Czerwon, H. J. (1996). A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level. Scientometrics, 37(2), 195–221.

    Article  Google Scholar 

  • Glänzel, W., & Thijs, B. (2011). Using ‘core documents’ for the representation of clusters and topics. Scientometrics, 88(1), 297–309.

    Article  Google Scholar 

  • Glänzel, W., & Thijs, B. (2012a). Hybrid solutions—The best of all possible worlds? Bibliometrie & Praxis und Forschung, 1(3). doi:10.5283/bpf.156.

  • Glänzel, W., & Thijs, B. (2012b). Using ‘core documents’ for detecting and labelling new emerging topics. Scientometrics, 91(2), 399–416.

    Article  Google Scholar 

  • Glänzel, W., & Thijs, B. (2015). Using hybrid methods and ‘core documents’ for the representation of clusters and topics. The astronomy dataset. In A. A. Salah, Y. Tonta, A. A. Salah, C. Sugimoto, & U. Al (Eds.), Proceedings of ISSI 2015—The 15th international conference on scientometrics and informetrics (pp. 1085–1090). Istanbul: Turkey.

    Google Scholar 

  • Glenisson, P., Glänzel, W., Janssens, F., & de Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41(6), 1548–1572.

    Article  Google Scholar 

  • Gould, R. V., & Fernandez, R. M. (1989). Structures of mediation: a formal approach to brokerage in transaction networks. Sociological Methodology, 19, 89–126.

    Article  Google Scholar 

  • Hicks, D. (1987). Limitations of co-citation analysis as a tool for science policy. Social Studies of Science, 17(2), 295–316.

    Article  Google Scholar 

  • Janssens, F. (2007). Clustering of scientific fields by integrating text mining and bibliometrics. Ph.D. Thesis, Faculty of Engineering, Katholieke Universiteit Leuven, Belgium. http://www.hdl.handle.net/1979/847.

  • Janssens, F., Glänzel, W., & de Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631.

    Article  Google Scholar 

  • Klein, D., & Manning, Ch. D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st annual meeting of the association for computational linguistics (pp. 423–430).

  • Kostoff, R. N., Eberhart, H. J., & Toothman, D. R. (1997). Database tomography for information retrieval. Journal of Information Science, 23(4), 301–311.

    Article  Google Scholar 

  • Thijs, B., Glänzel, W., & Meyer, M. (2015). Using noun phrases extraction for the improvement of hybrid clustering with text- and citation-based components. The example of Information system research. In Proceedings of the workshop mining scientific papers: Computational linguistics and bibliometrics (Vol. 1384). International Society of Scientometrics and Informetrics Conference (ISSI). Istanbul (Turkey), 29 June 2015. http://ceur-ws.org/Vol-1384/.

  • Thijs, B., Schiebel, E., & Glänzel, W. (2013). Do second-order similarities provide added-value in a hybrid approach? Scientometrics, 96(3), 667–677.

    Article  Google Scholar 

  • Todorov, R. (1992). Displaying content of scientific journals: A co-heading analysis. Scientometrics, 23(2), 319–334.

    Article  MathSciNet  Google Scholar 

  • Turner, W. A., Chartron, G., Laville, F., & Michelet, M. (1988). Packaging information for peer review: new co-word analysis techniques. In A. van Raan (Ed.), Handbook of quantitative studies of science and technology. North Holland: Elsevier.

    Google Scholar 

  • Zitt, M., & Basseacoulard, E. (1996). Reassessment of co-citation methods for science indicators: Effect of methods improving recall rates. Scientometrics, 37(2), 223–244.

    Article  Google Scholar 

Download references

Acknowledgements

This work has been done as part of the international project ‘Measuring the Diversity of Research’ and in the framework a special workshop on the comparative analysis of algorithms for the identification of topics in science organised in Berlin in August 2014. The project and workshop series was jointly organised by the Humboldt Universität and Technische Universität Berlin. We would like to acknowledge their support of our study. We also thank all project members for their comments and discussion. Above all, we would like to thank the internal reviewers Kevin Boyack and Shenghui Wang as well as the anonymous external referees for their valuable comments and suggestions that resulted in a substantial improvement of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Glänzel.

Appendix

Appendix

The first 20 core documents of Cluster #3 (‘General Theory of Cosmology’) in scenario 1 of model 1 (Bibliographic Coupling)

WoS accession code

Degree

Title

000264174700027

432

An introduction to the dark energy problem

000225317900001

324

Sudden future singularities in FLRW cosmologies

000223638500012

309

Supernova constraints on a holographic dark energy model

000228261200001

299

Quantum fields and ‘big rip’ expansion singularities

000245928000021

253

Exploring the properties of dark energy using type-Ia supernovae and other datasets

000230889600014

224

Parametrization of quintessence and its potential

000220801900003

219

Constraints on a Cardassian model from Type Ia supernova data, revisited

000250363000014

210

Measuring the baryon acoustic oscillation scale using the sloan digital sky survey and 2dF galaxy redshift survey

000245827600007

205

A modified Chaplygin gas model with interaction

000244080700016

205

Statefinder parameters for interacting phantom energy with dark matter

000245405900001

199

Constraints on the generalized Chaplygin gas model from recent supernova data and baryonic acoustic oscillations

000240874500033

194

High redshift detection of the integrated Sachs–Wolfe effect

000183786100002

187

The coincidence of Friedmann integrals

000186983100013

186

Generalized chaplygin gas with alpha = 0 and the Lambda CDM cosmological model

000241963800007

184

Gravitational collapse due to dark matter and dark energy in the braneworld scenario

000234274900033

182

Comparison of the legacy and gold type Ia supernovae dataset constraints on dark energy models

000229888900007

179

Escaping the big rip?

000228112400010

179

Cosmology with interaction between phantom dark energy and dark matter and the coincidence problem

000185229300023

178

k-essence and the coincidence problem

000244535200025

175

Lemaitre–Tolman–Bondi universes as alternatives to dark energy: Does positive averaged acceleration imply positive cosmic acceleration?

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Glänzel, W., Thijs, B. Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset. Scientometrics 111, 1071–1087 (2017). https://doi.org/10.1007/s11192-017-2301-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2301-6

Keywords

Navigation