, Volume 108, Issue 1, pp 21–40 | Cite as

Emergence of collaboration networks around large scale data repositories: a study of the genomics community using GenBank

  • Mark R. Costa
  • Jian Qin
  • Sarah Bratt


The advent of large data repositories and the necessity of distributed skillsets have led to a need to study the scientific collaboration network emerging around cyber-infrastructure-enabled repositories. To explore the impact of scientific collaboration and large-scale repositories in the field of genomics, we analyze coauthorship patterns in NCBIs big data repository GenBank using trace metadata from coauthorship of traditional publications and coauthorship of datasets. We demonstrate that using complex network analysis to explore both networks independently and jointly provides a much richer description of the community, and addresses some of the methodological concerns discussed in previous literature regarding the use of coauthorship data to study scientific collaboration.


Team science Big data repository Scientific collaboration Complex network analysis Cyber-infrastructure enabled science 



This research is sponsored by the NSF’s Science of Science Policy Program, Grant Number 1262535. The authors thank Jun Wang, Qianqian Chen for their technical assistance in data processing and analysis.


  1. Advanced Cyberinfrastructure Division, Cyberinfrastructure framework for 21st century science and engineering: Vision.
  2. Arias, J. J., Pham-Kanter, G., & Campbell, E. G. (2015). The growth and gaps of genetic data sharing policies in the United States. Journal of Law and the Biosciences, 2, 56–68.CrossRefGoogle Scholar
  3. Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Barabási, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311, 590–614.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2013). GenBank. Nucleic Acids Research, 41(Database issue), D36–D42.CrossRefGoogle Scholar
  6. Braun, T., Glänzel, W., & Schubert, A. (2001). Publication and cooperation patterns of the authors of neuroscience journals. Scientometrics, 51, 499–510.CrossRefGoogle Scholar
  7. Collins, F. S., Morgan, M., & Patrinois, A. (2003). The Human Genome Project: Lessons from large-scale biology. Science, 300, 286–290.CrossRefGoogle Scholar
  8. Costa, M., Qin, J., & Wang, J. (2014). Research networks in data repositories. In Joint conference of digital libraries (JCDL) London, UK, September 810, 2014.Google Scholar
  9. Cronin, B., Shaw, D., & La Barre, K. (2003). A cast of thousands: Co-authorship and sub-authorship collaboration in the twentieth century as manifested in the scholarly literature of psychology and philosophy. Journal of the American Society for Information Science and Technology, 54, 855–871.CrossRefGoogle Scholar
  10. Faniel, I. M., & Jacobsen, T. E. (2010). Reusing scientific data: How earthquake engineering researchers assess the reusability of colleagues’ data. Computer Supported Cooperative Work (CSCW), 19, 355–375.CrossRefGoogle Scholar
  11. Faniel, I. M., & Zimmerman, A. (2011). Beyond the data deluge: A research agenda for large-scale data sharing and reuse. International Journal of Digital Curation, 6, 58–69.CrossRefGoogle Scholar
  12. Glänzel, W., & Schubert, A. (2005). Analysing scientific networks through co-authorship. In Handbook of quantitative science and technology research (pp. 257–276).
  13. Hey, T., Tansley, S., & Tolle, K. (Eds.). (2009). “The fourth paradigm” data-intensive scientific discovery. Redmond, WA: Microsoft.Google Scholar
  14. Holme, P., & Saramäki, J. (2012). Temporal networks. Physics Reports, 519, 97–125.CrossRefGoogle Scholar
  15. King, C. (2012). Multiauthor papers: Onward and upward. Science Watch Newsletter, July 2012.
  16. Laudel, G. (2002). What do we measure by co-authorships? Research Evaluation, 11, 3–15.CrossRefGoogle Scholar
  17. Marcial, L. H., & Hemminger, B. M. (2010). Scientific data repositories on the Web: An initial survey. Journal of the Association for Information Science and Technology, 61, 2029–2048.CrossRefGoogle Scholar
  18. Newman, M. E. J. (2001a). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64, 016132.CrossRefGoogle Scholar
  19. Newman, M. E. J. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physics Review E, 64(1).
  20. Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45, 167–256.MathSciNetCrossRefzbMATHGoogle Scholar
  21. Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81, 719–745.CrossRefGoogle Scholar
  22. Price, D. J. S., & Gürsey, S. (2001). Studies in scientometrics I transience and continuance in scientific authorship. Ciência da Informação, 4.
  23. Qin, J., Costa, M., & Wang, J. (2014). Attributions from data authors to publications: Implications for data curation. In The 9th international digital curation conference, 2427 February 2014, San Francisco.Google Scholar
  24. Qin, J., Lancaster, F. W., & Allen, B. (1997). Levels and types of collaboration in interdisciplinary research. Journal of the American Society for Information Science, 48, 893–916.CrossRefGoogle Scholar
  25. Rodriguez, H., Snyder, M., Uhlén, M., et al. (2009). Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles. Journal of Proteome Research, 8, 3689–3692.CrossRefGoogle Scholar
  26. Szalay, A. S., & Blakeley, J. A. (2009). Grey’s laws: Database-centric computing in science. In T. Hey & S. Tansley (Eds.), The fourth paragim: Data-intensive scientific discovery (pp. 5–11). Redmond, WA: Microsoft Research.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2016

Authors and Affiliations

  1. 1.School of Information StudiesSyracuse UniversitySyracuseUSA

Personalised recommendations