Improving co-authorship network structures by combining multiple data sources: evidence from Italian academic statisticians

Fuccella, Vittorio; De Stefano, Domenico; Vitale, Maria Prosperina; Zaccarin, Susanna

doi:10.1007/s11192-016-1872-y

Improving co-authorship network structures by combining multiple data sources: evidence from Italian academic statisticians

Published: 09 February 2016

Volume 107, pages 167–184, (2016)
Cite this article

Scientometrics Aims and scope Submit manuscript

Vittorio Fuccella¹,
Domenico De Stefano²,
Maria Prosperina Vitale³ &
…
Susanna Zaccarin⁴

623 Accesses
6 Citations
Explore all metrics

Abstract

The aim of the present contribution is to merge bibliographic data for members of a bounded scientific community in order to derive a complete unified archive, with top-international and nationally oriented production, as a new basis to carry out network analysis on a unified co-authorship network. A two-step procedure is used to deal with the identification of duplicate records and the author name disambiguation. Specifically, for the second step we strongly drew inspiration from a well-established unsupervised disambiguation method proposed in the literature following a network-based approach and requiring a restricted set of record attributes. Evidences from Italian academic statisticians were provided by merging data from three bibliographic archives. Non-negligible differences were observed in network results in the comparison of disambiguated and not disambiguated data sets, especially in network measures at individual level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Improving Co-authorship Network Structures by Combining Heterogeneous Data Sources

A new insight to the analysis of co-authorship in Google Scholar

Article Open access 08 April 2022

Ghazal Kalhor, Amin Asadi Sarijalou, … Behnam Bahrak

Exploratory Analysis of Communities in Co-authorship Networks: A Case Study

Notes

Two international databases, one general (WoS) and one thematic (Current Index to Statistics, CIS) were considered, together with bibliographic information retrieved from the Italian Ministry of University and Research (MIUR) database of nationally funded research projects (PRIN).
At December 2014 the size of population was 722.
Although PRIN funding was launched in 1996, information on funded projects has been released only since the year 2000.
For deepening on this problem, the reader can refer to Bilenko et al. (2003).
For instance, Lee et al. (2005) and Santana et al. (2015) supposed different weights according to the discriminative capability of the attributes.
A co-authorship network is derived from the matrix product \(\mathbf {Y}=\mathbf {A}\mathbf {A}'\), where \(\mathbf {A}\) is a \(n \times p\) affiliation matrix, with elements \(a_{ik}\) = 1 if \(i \in \mathcal {N}\) authored the publication \(k \in {\mathcal {P}}\), 0 otherwise. The matrix \(\mathbf {Y}\) is the undirected and valued \(n \times n\) adjacency matrix with element \(y_{ij}\) greater than 0 if \(i,\,j \in \mathcal {N}\) co-authored one or more publications in \(\mathcal {P}\), and otherwise 0. The binary version of \(\mathbf {Y}\), setting all entries in the valued adjacency matrix greater than zero to 1, was used in our analysis.

References

Albert, R., & Barabási, A.-L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1), 47.
Article MathSciNet MATH Google Scholar
Baxter, R., Christen, P., & Churches, T. (2003). A comparison of fast blocking methods for record linkage. In ACM KDD Workshops (Vol. 3, pp. 25–27).
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., & Fienberg, S. (2003). Adaptive name matching in information integration. IEEE Intelligent Systems, 5, 16–23.
Article Google Scholar
Christen, P. (2008). Febrl: An open source data cleaning, deduplication and record linkage system with a graphical user interface. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1065–1068. ACM.
Christen, P. (2012). A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering, 24(9), 1537–1555.
Article Google Scholar
Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.
Article Google Scholar
Criminisi, A., Shotton, J., & Konukoglu, E. (2012). Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends in Computer Graphics and Vision, 7(2–3), 81–227.
MATH Google Scholar
Cuxac, P., Lamirel, J.-C., & Bonvallot, V. (2013). Efficient supervised and semi-supervised approaches for affiliations disambiguation. Scientometrics, 97(1), 47–58.
Article Google Scholar
de Carvalho, A. P., Ferreira, A. A., Laender, A. H., & Gonçalves, M. A. (2011). Incremental unsupervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management, 2(3), 289.
Google Scholar
De Stefano, D., Fuccella, V., Vitale, M. P., & Zaccarin, S. (2013). The use of different data sources in the analysis of co-authorship networks and scientific performance. Social Networks, 35(3), 370–381.
Article Google Scholar
De Stefano, D., & Zaccarin, S. (2016). Co-authorship networks and scientific performance: An empirical analysis using the generalized extreme value distribution. Journal of Applied Statistics, 43(1), 262–279.
Article MathSciNet Google Scholar
Domingo-Ferrer, J., & Torra, V. (2003). Disclosure risk assessment in statistical microdata protection via advanced record linkage. Statistics and Computing, 13(4), 343–354.
Article MathSciNet Google Scholar
Dong, X., Halevy, A., & Madhavan, J. (2005). Reference reconciliation in complex information spaces. In Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp. 85–96. ACM.
Durham, E., Xue, Y., Kantarcioglu, M., & Malin, B. (2012). Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage. Information Fusion, 13(4), 245–259.
Article Google Scholar
Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183–1210.
Article MATH Google Scholar
Ferreira, A. A., Gonçalves, M. A., & Laender, A. H. (2012). A brief survey of automatic methods for author name disambiguation. ACM Sigmod Record, 41(2), 15–26.
Article Google Scholar
Goyal, S., Van Der Leij, M. J., & Moraga-González, J. L. (2006). Economics: An emerging small world. Journal of Political Economy, 114(2), 403–412.
Article Google Scholar
Gurney, T., Horlings, E., & Van Den Besselaar, P. (2011). Author disambiguation using multi-aspect similarity indicators. Scientometrics, 91(2), 435–449.
Article Google Scholar
Han, H., Giles, L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two supervised learning approaches for name disambiguation in author citations. In Digital Libraries, 2004. Proceedings of the 2004 joint ACM/IEEE conference on, pp. 296–305. IEEE.
Han, H., Zha, H., & Giles, C. L. (2005). Name disambiguation in author citations using a k-way spectral clustering method. In Digital Libraries, 2005. JCDL’05. Proceedings of the 5th ACM/IEEE-CS joint conference on, pp. 334–343. IEEE.
Hernandez, M. A., & Stolfo, S. J. (1995). The merge/purge problem for large databases. ACM Sigmod Record, 24(2), 127–138.
Article Google Scholar
Hicks, D. (1999). The difficulty of achieving full coverage of international social science literature and the bibliometric consequences. Scientometrics, 44(2), 193–215.
Article Google Scholar
Imran, M., Gillani, S., & Marchese, M. (2013). A real-time heuristic-based unsupervised method for name disambiguation in digital libraries. D-Lib Magazine, 19(9), 1.
Google Scholar
Kang, I.-S., Na, S.-H., Lee, S., Jung, H., Kim, P., Sung, W.-K., et al. (2009). On co-authorship for author disambiguation. Information Processing and Management, 45(1), 84–97.
Article Google Scholar
Lee, D., On, B.-W., Kang, J., & Park, S. (2005). Effective and scalable solutions for mixed and split citation problems in digital libraries. In Proceedings of the 2nd international workshop on Information quality in information systems, pp. 69–76. ACM.
Li, G.-C., Lai, R., D’Amour, A., Doolin, D. M., Sun, Y., Torvik, V. I., et al. (2014). Disambiguation and co-authorship networks of the US patent inventor database (1975–2010). Research Policy, 43(6), 941–955.
Article Google Scholar
Liseo, B., Montanari, G. E., & Torelli, N. (2006). Metodi statistici per l’integrazione di dati da fonti diverse (Vol. 412). Milan: FrancoAngeli.
Google Scholar
Milojević, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767–773.
Article Google Scholar
Moody, J. (2004). The structure of a social science collaboration network: Disciplinary cohesion from 1963 to 1999. American Sociological Review, 69(2), 213–238.
Article Google Scholar
Newman, M. E. (2004). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences, 101(suppl 1), 5200–5205.
Article Google Scholar
Robertson, S. (2004). Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation, 60(5), 503–520.
Article Google Scholar
Sadinle, M., Hall, R., & Fienberg, S. E. (2011). Approaches to multiple record linkage. In Proceedings of International Statistical Institute (Vol. 260).
Santana, A. F., Gonçalves, M. A., Laender, A. H., & Ferreira, A. A. (2015). On the combination of domain-specific heuristics for author name disambiguation: The nearest cluster method. International Journal on Digital Libraries, 16(3–4), 229–246.
Article Google Scholar
Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.
Article Google Scholar
Strotmann, A., Zhao, D., & Bubela, T. (2009). Author name disambiguation for collaboration network analysis and visualization. Proceedings of the American Society for Information Science and Technology, 46(1), 1–20.
Google Scholar
Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158.
Article Google Scholar
Veloso, A., Ferreira, A. A., Gonçalves, M. A., Laender, A. H., & Meira, W. (2012). Cost-effective on-demand associative author name disambiguation. Information Processing and Management, 48(4), 680–697.
Article Google Scholar
Ventura, S. L., Nugent, R., & Fuchs, E. R. (2015). Seeing the non-stars: (Some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records. Research Policy, 44(9), 1672–1701.
Article Google Scholar
Wu, H., Li, B., Pei, Y., & He, J. (2014). Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics, 101(3), 1955–1972.
Article Google Scholar
Wu, J., & Ding, X.-H. (2013). Author name disambiguation in scientific collaboration and mobility cases. Scientometrics, 96(3), 683–697.
Article MathSciNet Google Scholar
Yan, S., Lee, D., Kan, M. -Y., & Giles, L. C. (2007). Adaptive sorted neighborhood methods for efficient record linkage. In Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries (pp. 185–194). ACM.

Download references

Acknowledgments

The authors would like to thank Andreas Strotmann for providing details on the algorithm code adopted in Strotmann et al. (2009).

Author information

Authors and Affiliations

Department of Informatics, University of Salerno, Via Giovanni Paolo II, 132, 84084, Fisciano, SA, Italy
Vittorio Fuccella
Department of Political and Social Sciences, University of Trieste, Piazzale Europa, 1, 34127, Trieste, Italy
Domenico De Stefano
Department of Economics and Statistics, University of Salerno, Via Giovanni Paolo II, 132, 84084, Fisciano, SA, Italy
Maria Prosperina Vitale
Department of Economics, Business, Mathematics and Statistics “B. de Finetti”, University of Trieste, Piazzale Europa, 1, 34127, Trieste, Italy
Susanna Zaccarin

Authors

Vittorio Fuccella
View author publications
You can also search for this author in PubMed Google Scholar
Domenico De Stefano
View author publications
You can also search for this author in PubMed Google Scholar
Maria Prosperina Vitale
View author publications
You can also search for this author in PubMed Google Scholar
Susanna Zaccarin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vittorio Fuccella.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fuccella, V., De Stefano, D., Vitale, M.P. et al. Improving co-authorship network structures by combining multiple data sources: evidence from Italian academic statisticians. Scientometrics 107, 167–184 (2016). https://doi.org/10.1007/s11192-016-1872-y

Download citation

Received: 28 July 2015
Published: 09 February 2016
Issue Date: April 2016
DOI: https://doi.org/10.1007/s11192-016-1872-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Improving co-authorship network structures by combining multiple data sources: evidence from Italian academic statisticians

Abstract

Access this article

Similar content being viewed by others

Improving Co-authorship Network Structures by Combining Heterogeneous Data Sources

A new insight to the analysis of co-authorship in Google Scholar

Exploratory Analysis of Communities in Co-authorship Networks: A Case Study

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving co-authorship network structures by combining multiple data sources: evidence from Italian academic statisticians

Abstract

Access this article

Similar content being viewed by others

Improving Co-authorship Network Structures by Combining Heterogeneous Data Sources

A new insight to the analysis of co-authorship in Google Scholar

Exploratory Analysis of Communities in Co-authorship Networks: A Case Study

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation