Skip to main content
Log in

Emergence of collaboration networks around large scale data repositories: a study of the genomics community using GenBank

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The advent of large data repositories and the necessity of distributed skillsets have led to a need to study the scientific collaboration network emerging around cyber-infrastructure-enabled repositories. To explore the impact of scientific collaboration and large-scale repositories in the field of genomics, we analyze coauthorship patterns in NCBIs big data repository GenBank using trace metadata from coauthorship of traditional publications and coauthorship of datasets. We demonstrate that using complex network analysis to explore both networks independently and jointly provides a much richer description of the community, and addresses some of the methodological concerns discussed in previous literature regarding the use of coauthorship data to study scientific collaboration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In an undirected network, these calculations should only be done for the upper or lower triangle, not both.

  2. These totals differ slightly from 5 because datasets and publications with no dates are included in the figures.

References

  • Advanced Cyberinfrastructure Division, Cyberinfrastructure framework for 21st century science and engineering: Vision. http://www.nsf.gov/cise/aci/cif21/CIF21Vision2012current.pdf.

  • Arias, J. J., Pham-Kanter, G., & Campbell, E. G. (2015). The growth and gaps of genetic data sharing policies in the United States. Journal of Law and the Biosciences, 2, 56–68.

    Article  Google Scholar 

  • Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.

    Article  MathSciNet  MATH  Google Scholar 

  • Barabási, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311, 590–614.

    Article  MathSciNet  MATH  Google Scholar 

  • Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2013). GenBank. Nucleic Acids Research, 41(Database issue), D36–D42.

    Article  Google Scholar 

  • Braun, T., Glänzel, W., & Schubert, A. (2001). Publication and cooperation patterns of the authors of neuroscience journals. Scientometrics, 51, 499–510.

    Article  Google Scholar 

  • Collins, F. S., Morgan, M., & Patrinois, A. (2003). The Human Genome Project: Lessons from large-scale biology. Science, 300, 286–290.

    Article  Google Scholar 

  • Costa, M., Qin, J., & Wang, J. (2014). Research networks in data repositories. In Joint conference of digital libraries (JCDL) London, UK, September 810, 2014.

  • Cronin, B., Shaw, D., & La Barre, K. (2003). A cast of thousands: Co-authorship and sub-authorship collaboration in the twentieth century as manifested in the scholarly literature of psychology and philosophy. Journal of the American Society for Information Science and Technology, 54, 855–871.

    Article  Google Scholar 

  • Faniel, I. M., & Jacobsen, T. E. (2010). Reusing scientific data: How earthquake engineering researchers assess the reusability of colleagues’ data. Computer Supported Cooperative Work (CSCW), 19, 355–375.

    Article  Google Scholar 

  • Faniel, I. M., & Zimmerman, A. (2011). Beyond the data deluge: A research agenda for large-scale data sharing and reuse. International Journal of Digital Curation, 6, 58–69.

    Article  Google Scholar 

  • Glänzel, W., & Schubert, A. (2005). Analysing scientific networks through co-authorship. In Handbook of quantitative science and technology research (pp. 257–276). http://link.springer.com/chapter/10.1007%2F1-4020-2755-9_12.

  • Hey, T., Tansley, S., & Tolle, K. (Eds.). (2009). “The fourth paradigm” data-intensive scientific discovery. Redmond, WA: Microsoft.

    Google Scholar 

  • Holme, P., & Saramäki, J. (2012). Temporal networks. Physics Reports, 519, 97–125.

    Article  Google Scholar 

  • King, C. (2012). Multiauthor papers: Onward and upward. Science Watch Newsletter, July 2012. http://archive.sciencewatch.com/newsletter/2012/201207/multiauthor_papers/.

  • Laudel, G. (2002). What do we measure by co-authorships? Research Evaluation, 11, 3–15.

    Article  Google Scholar 

  • Marcial, L. H., & Hemminger, B. M. (2010). Scientific data repositories on the Web: An initial survey. Journal of the Association for Information Science and Technology, 61, 2029–2048.

    Article  Google Scholar 

  • Newman, M. E. J. (2001a). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64, 016132.

    Article  Google Scholar 

  • Newman, M. E. J. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physics Review E, 64(1). http://pre.aps.org/pdf/PRE/v64/i1/e016132.

  • Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45, 167–256.

    Article  MathSciNet  MATH  Google Scholar 

  • Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81, 719–745.

    Article  Google Scholar 

  • Price, D. J. S., & Gürsey, S. (2001). Studies in scientometrics I transience and continuance in scientific authorship. Ciência da Informação, 4. http://revista.ibict.br/cienciadainformacao/index.php/ciin-f/article/view/1611.

  • Qin, J., Costa, M., & Wang, J. (2014). Attributions from data authors to publications: Implications for data curation. In The 9th international digital curation conference, 2427 February 2014, San Francisco.

  • Qin, J., Lancaster, F. W., & Allen, B. (1997). Levels and types of collaboration in interdisciplinary research. Journal of the American Society for Information Science, 48, 893–916.

    Article  Google Scholar 

  • Rodriguez, H., Snyder, M., Uhlén, M., et al. (2009). Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles. Journal of Proteome Research, 8, 3689–3692.

    Article  Google Scholar 

  • Szalay, A. S., & Blakeley, J. A. (2009). Grey’s laws: Database-centric computing in science. In T. Hey & S. Tansley (Eds.), The fourth paragim: Data-intensive scientific discovery (pp. 5–11). Redmond, WA: Microsoft Research.

    Google Scholar 

Download references

Acknowledgments

This research is sponsored by the NSF’s Science of Science Policy Program, Grant Number 1262535. The authors thank Jun Wang, Qianqian Chen for their technical assistance in data processing and analysis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark R. Costa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Costa, M.R., Qin, J. & Bratt, S. Emergence of collaboration networks around large scale data repositories: a study of the genomics community using GenBank. Scientometrics 108, 21–40 (2016). https://doi.org/10.1007/s11192-016-1954-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-1954-x

Keywords

Navigation