Abstract
In order to explore the suitability of a fine-grained classification of journal articles by exploiting multiple sources of information, articles are organized in a two-layer multiplex. The first layer conveys similarities based on the full-text of articles, and the second similarities based on cited references. The information of the two layers are only weakly associated. The Similarity Network Fusion process is adopted to combine the two layers into a new single-layer network. A clustering algorithm is applied to the fused network and the classification of articles is obtained. In order to evaluate its coherence, this classification is compared with the ones obtained by applying the same algorithm to each of two layers. Moreover, the classification obtained for the fused network is also compared with the classifications obtained when the layers of information are integrated using different methods available in literature. In the case of the Cambridge Journal of Economics, Similarity Network Fusion appears to be the best option. Moreover, the achieved classification appears to be fine-grained enough to represent the extreme heterogeneity characterizing the contributions published in the journal.
Similar content being viewed by others
Data availability
After acceptance, raw data will be available here https://10.5281/zenodo.7876691 Preprint: the article is available at https://arxiv.org/pdf/2305.00026.pdf.
References
Agresti, A. (2012). Categorical data analysis (Vol. 792). John Wiley & Sons.
Ahlgren, P., & Colliander, C. (2009). Document similarity approaches and science mapping: Experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63. https://doi.org/10.1016/j.joi.2008.11.003
Ambrosino, A., Cedrini, M., Davis, J. B., Fiori, S., Guerzoni, M., & Nuccio, M. (2018). What topic modeling could reveal about the evolution of economics. Journal of Economic Methodology, 25(4), 329–348.
Baccini, A., Barabesi, L., Khelfaoui, M., & Gingras, Y. (2019). Intellectual and social similarity among scholarly journals: An exploratory comparison of the networks of editors, authors and co-citations. Quantitative Science Studies, 1(1), 277–289.
Baccini, F., Barabesi, L., Baccini, A., Khelfaoui, M., & Gingras, Y. (2022a). Similarity network fusion for scholarly journals. Journal of Informetrics, 16(1), 101226. https://doi.org/10.1016/j.joi.2021.101226
Baccini, F., Bianchini, M., & Geraci, F. (2022b). Graph-based integration of histone modification profiles. Mathematics, 10(11), 503–515. https://doi.org/10.3390/math10111842
Baccini, F., Barabesi, L., & Petrovich, E. (2023). Similarity matrix average for aggregating multiplex networks. Journal of Physics: Complexity, 4(2), 025017. https://doi.org/10.1088/2632-072X/acda09
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python: Analyzing text with the natural language toolkit. O’Reilly Media Inc.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. https://doi.org/10.5555/944919.944937
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. https://doi.org/10.1088/1742-5468/2008/10/p10008
Boyack, K. W. (2017). Investigating the effect of global data on topic detection. Scientometrics, 111(2), 999–1015. https://doi.org/10.1007/s11192-017-2297-y
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404. https://doi.org/10.1002/asi.21419
Boyack, K. W., & Klavans, R. (2020). A comparison of large-scale science models based on textual, direct citation and hybrid relatedness. Quantitative Science Studies, 1(4), 1570–1585. https://doi.org/10.1162/qss\$\$a\$\$00085
Brunson, J. C. (2020). ggalluvial: Layered grammar for alluvial plots. Journal of Open Source Software, 5(49), 2017. https://doi.org/10.21105/joss.02017
Brunson, J.C., & Read, Q.D. (n.d.). ggalluvial: Alluvial plots in ‘ggplot2’. Retrieved from http://corybrunson.github.io/ggalluvial/ (R package version 0.12.4)
Cherrier, B. (2017). Classifying economics: A history of the JEL codes. Journal of Economic Literature, 55(2), 545–79. https://doi.org/10.1257/jel.20151296
Claveau, F., & Gingras, Y. (2016). Macrodynamics of economics: A bibliometric history. History of Political Economy, 48(4), 551–592. https://doi.org/10.1215/00182702-3687259
Cramér, H. (1946). Mathematical methods of statistics. Princeton University Press.
Edwards, J., Giraud, Y., & Schinckus, C. (2018). A quantitative turn in the historiography of economics? Journal of Economic Methodology, 25(4), 283–290. https://doi.org/10.1080/1350178X.2018.1529133
Eykens, J., Guns, R., & Engels, T. C. E. (2021). Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches. Quantitative Science Studies, 2(1), 89.
Fisher, N. (1990). The classification of the sciences. In R. Olby (Ed.), Companion to the history of modern science (pp. 853–868). Routledge.
Garćýa, C., Otero, D., & Salazar, B. (2023). The drifting influence of Hall’s random-walk hypothesis on consumption modeling. History of Political Economy, 55(1), 103–143. https://doi.org/10.1215/00182702-10213653
Glänzel, W., & Schubert, A. (2003). A new classification scheme of science fields and subfields designed for scientometric evaluation purposes. Scientometrics, 56(3), 357–367. https://doi.org/10.1023/A:1022378804087
Glänzel, W., & Thijs, B. (2011). Using ‘core documents’ for the representation of clusters and topics. Scientometrics, 88(1), 297–309. https://doi.org/10.1007/s11192-011-0347-4
Glänzel, W., & Thijs, B. (2017). Using hybrid methods and ‘core documents’ for the representation of clusters and topics: The astronomy dataset. Scientometrics, 111(2), 1071–1087. https://doi.org/10.1007/s11192-017-2301-6
Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11, 37–50. https://doi.org/10.1111/j.1469-8137.1912.tb05611.x
Janssens, F., Glänzel, W., & De Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631. https://doi.org/10.1007/s11192-007-2002-7
Kessler, M. M. (1965). Comparison of the results of bibliographic coupling and analytic subject indexing. American Documentation, 16(3), 223–233. https://doi.org/10.1002/asi.5090160309
Klavans, R., & Boyack, K. W. (2017). Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? Journal of the Association for Information Science and Technology, 68(4), 984–998. https://doi.org/10.1002/asi.23734
Kleminski, R., Kazienko, P., & Kajdanowicz, T. (2020). Analysis of direct citation, co-citation and bibliographic coupling in scientific topic identification. Journal of Information Science, 48(3), 349–373. https://doi.org/10.1177/0165551520962775
Marcuzzo, M. C., Naldi, N., Sanfilippo, E., & Rosselli, A. (2008). Cambridge as a place in economics. History of Political Economy, 40(4), 569–593. https://doi.org/10.1215/00182702-2008-027
Ni, C., Sugimoto, C. R., & Jiang, J. (2013). Venue-author-coupling: A measure for identifying disciplines through author communities. Journal of the American Society for Information Science and Technology, 64(2), 265–279. https://doi.org/10.1002/asi.22630
Omelka, M., & Hudecová, S. (2013). A comparison of the mantel test with a generalised distance covariance test. Environmetrics, 24(7), 449–460. https://doi.org/10.1002/env.2238
Petrovich, E. (2020). Science mapping and science maps. Knowledge Organization, 48(7–8), 535–562.
R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria. Retrieved from http://www.R-project.org/
Saith, A. (2023). The Cambridge journal of economics—A forum of one’s own. Review of Political Economy, 35(1), 28–49. https://doi.org/10.1080/09538259.2022.2104027
Savoy, J. (2020). Machine learning methods for stylometry. Springer.
Sjögåarde, P., & Ahlgren, P. (2018). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics. Journal of Informetrics, 12(1), 133–152. https://doi.org/10.1016/j.joi.2017.12.006
Sjögåarde, P., & Ahlgren, P. (2020). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of specialties. Quantitative Science Studies, 1(1), 207–238.
Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269. https://doi.org/10.1002/asi.4630240406
Székely, G. J., & Rizzo, M. L. (2014). Partial distance correlation with methods for dissimilarities. Annals of Statistics, 42(6), 2382–2412. https://doi.org/10.1214/14-AOS1255
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794.
Thor, A., Marx, W., Leydesdorff, L., & Bornmann, L. (2016). Introducing CitedReferencesExplorer (CRExplorer): A program for reference publication year spectroscopy with cited references standardization. Journal of Informetrics, 10(2), 503515. https://doi.org/10.1016/j.joi.2016.02.005
Todeschini, R., & Baccini, A. (2016). Handbook of bibliometric indicators: Quantitative tools for studying and evaluating research. Wiley-VCH.
Truc, A., Claveau, F., & Santerre, O. (2021). Economic methodology: A bibliometric perspective. Journal of Economic Methodology, 28(1), 67–78. https://doi.org/10.1080/1350178X.2020.1868774
Wang, B., Jiang, J., Wang, W., Zhou, Z.-H., & Tu, Z. (2012). Unsupervised metric fusion by cross diffusion. IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2012.6248029
Wang, B., Mezlini, A., Demir, F., Fiume, M., Tu, Z., Brudno, M., & Goldenberg, A. (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature methods, 11, 333–337. https://doi.org/10.1038/nmeth.2810
Zitt, M., Lelu, A., Cadot, M., & Cabanac, G. (2019). Bibliometric delineation of scientific fields. In W. Glänzel, H. Moed, U. Schmoch, & M. Thelwall (Eds.), Springer handbook of science and technology indicators (pp. 25–68). Springer.
Acknowledgements
We thank Alessandra Durio who contributed to the work by doing all the processing for the construction of the similarity matrices based on bags of words and topic modeling. We also thank two anonymous referees for their insightful comments that enabled substantial improvement of the article. This article is available as preprint at https://arxiv.org/pdf/2305.00026.pdf.
Funding
The research is funded by the Italian Ministry of University, PRIN project: 2017MPXW98, PI: Alberto Baccini.
Author information
Authors and Affiliations
Contributions
AB and LB contributed to the study conception and design. Material preparation, data collection and analysis were performed by AB, LB, MC and EP; FB supervised the methods of matrix integration and their comparison; DP interpreted data from the methodology of economics perspective. All authors partecipated to the writing of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher' Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Supplementary figures
Appendix A: Supplementary figures
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Baccini, A., Baccini, F., Barabesi, L. et al. Fine-grained classification of journal articles based on multiple layers of information through similarity network fusion: The case of the Cambridge Journal of Economics. Scientometrics 129, 373–400 (2024). https://doi.org/10.1007/s11192-023-04884-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-023-04884-2
Keywords
- Similarity network fusion
- Generalized distance correlation
- Partial distance correlation
- Multilayer social networks
- Communities in networks
- Topic modeling