Abstract
The advent of large data repositories and the necessity of distributed skillsets have led to a need to study the scientific collaboration network emerging around cyber-infrastructure-enabled repositories. To explore the impact of scientific collaboration and large-scale repositories in the field of genomics, we analyze coauthorship patterns in NCBIs big data repository GenBank using trace metadata from coauthorship of traditional publications and coauthorship of datasets. We demonstrate that using complex network analysis to explore both networks independently and jointly provides a much richer description of the community, and addresses some of the methodological concerns discussed in previous literature regarding the use of coauthorship data to study scientific collaboration.
Similar content being viewed by others
Notes
In an undirected network, these calculations should only be done for the upper or lower triangle, not both.
These totals differ slightly from 5 because datasets and publications with no dates are included in the figures.
References
Advanced Cyberinfrastructure Division, Cyberinfrastructure framework for 21st century science and engineering: Vision. http://www.nsf.gov/cise/aci/cif21/CIF21Vision2012current.pdf.
Arias, J. J., Pham-Kanter, G., & Campbell, E. G. (2015). The growth and gaps of genetic data sharing policies in the United States. Journal of Law and the Biosciences, 2, 56–68.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.
Barabási, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311, 590–614.
Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2013). GenBank. Nucleic Acids Research, 41(Database issue), D36–D42.
Braun, T., Glänzel, W., & Schubert, A. (2001). Publication and cooperation patterns of the authors of neuroscience journals. Scientometrics, 51, 499–510.
Collins, F. S., Morgan, M., & Patrinois, A. (2003). The Human Genome Project: Lessons from large-scale biology. Science, 300, 286–290.
Costa, M., Qin, J., & Wang, J. (2014). Research networks in data repositories. In Joint conference of digital libraries (JCDL) London, UK, September 8–10, 2014.
Cronin, B., Shaw, D., & La Barre, K. (2003). A cast of thousands: Co-authorship and sub-authorship collaboration in the twentieth century as manifested in the scholarly literature of psychology and philosophy. Journal of the American Society for Information Science and Technology, 54, 855–871.
Faniel, I. M., & Jacobsen, T. E. (2010). Reusing scientific data: How earthquake engineering researchers assess the reusability of colleagues’ data. Computer Supported Cooperative Work (CSCW), 19, 355–375.
Faniel, I. M., & Zimmerman, A. (2011). Beyond the data deluge: A research agenda for large-scale data sharing and reuse. International Journal of Digital Curation, 6, 58–69.
Glänzel, W., & Schubert, A. (2005). Analysing scientific networks through co-authorship. In Handbook of quantitative science and technology research (pp. 257–276). http://link.springer.com/chapter/10.1007%2F1-4020-2755-9_12.
Hey, T., Tansley, S., & Tolle, K. (Eds.). (2009). “The fourth paradigm” data-intensive scientific discovery. Redmond, WA: Microsoft.
Holme, P., & Saramäki, J. (2012). Temporal networks. Physics Reports, 519, 97–125.
King, C. (2012). Multiauthor papers: Onward and upward. Science Watch Newsletter, July 2012. http://archive.sciencewatch.com/newsletter/2012/201207/multiauthor_papers/.
Laudel, G. (2002). What do we measure by co-authorships? Research Evaluation, 11, 3–15.
Marcial, L. H., & Hemminger, B. M. (2010). Scientific data repositories on the Web: An initial survey. Journal of the Association for Information Science and Technology, 61, 2029–2048.
Newman, M. E. J. (2001a). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64, 016132.
Newman, M. E. J. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physics Review E, 64(1). http://pre.aps.org/pdf/PRE/v64/i1/e016132.
Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45, 167–256.
Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81, 719–745.
Price, D. J. S., & Gürsey, S. (2001). Studies in scientometrics I transience and continuance in scientific authorship. Ciência da Informação, 4. http://revista.ibict.br/cienciadainformacao/index.php/ciin-f/article/view/1611.
Qin, J., Costa, M., & Wang, J. (2014). Attributions from data authors to publications: Implications for data curation. In The 9th international digital curation conference, 24–27 February 2014, San Francisco.
Qin, J., Lancaster, F. W., & Allen, B. (1997). Levels and types of collaboration in interdisciplinary research. Journal of the American Society for Information Science, 48, 893–916.
Rodriguez, H., Snyder, M., Uhlén, M., et al. (2009). Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles. Journal of Proteome Research, 8, 3689–3692.
Szalay, A. S., & Blakeley, J. A. (2009). Grey’s laws: Database-centric computing in science. In T. Hey & S. Tansley (Eds.), The fourth paragim: Data-intensive scientific discovery (pp. 5–11). Redmond, WA: Microsoft Research.
Acknowledgments
This research is sponsored by the NSF’s Science of Science Policy Program, Grant Number 1262535. The authors thank Jun Wang, Qianqian Chen for their technical assistance in data processing and analysis.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Costa, M.R., Qin, J. & Bratt, S. Emergence of collaboration networks around large scale data repositories: a study of the genomics community using GenBank. Scientometrics 108, 21–40 (2016). https://doi.org/10.1007/s11192-016-1954-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-016-1954-x