Emergence of collaboration networks around large scale data repositories: a study of the genomics community using GenBank
The advent of large data repositories and the necessity of distributed skillsets have led to a need to study the scientific collaboration network emerging around cyber-infrastructure-enabled repositories. To explore the impact of scientific collaboration and large-scale repositories in the field of genomics, we analyze coauthorship patterns in NCBIs big data repository GenBank using trace metadata from coauthorship of traditional publications and coauthorship of datasets. We demonstrate that using complex network analysis to explore both networks independently and jointly provides a much richer description of the community, and addresses some of the methodological concerns discussed in previous literature regarding the use of coauthorship data to study scientific collaboration.
KeywordsTeam science Big data repository Scientific collaboration Complex network analysis Cyber-infrastructure enabled science
This research is sponsored by the NSF’s Science of Science Policy Program, Grant Number 1262535. The authors thank Jun Wang, Qianqian Chen for their technical assistance in data processing and analysis.
- Advanced Cyberinfrastructure Division, Cyberinfrastructure framework for 21st century science and engineering: Vision. http://www.nsf.gov/cise/aci/cif21/CIF21Vision2012current.pdf.
- Costa, M., Qin, J., & Wang, J. (2014). Research networks in data repositories. In Joint conference of digital libraries (JCDL) London, UK, September 8–10, 2014.Google Scholar
- Cronin, B., Shaw, D., & La Barre, K. (2003). A cast of thousands: Co-authorship and sub-authorship collaboration in the twentieth century as manifested in the scholarly literature of psychology and philosophy. Journal of the American Society for Information Science and Technology, 54, 855–871.CrossRefGoogle Scholar
- Glänzel, W., & Schubert, A. (2005). Analysing scientific networks through co-authorship. In Handbook of quantitative science and technology research (pp. 257–276). http://link.springer.com/chapter/10.1007%2F1-4020-2755-9_12.
- Hey, T., Tansley, S., & Tolle, K. (Eds.). (2009). “The fourth paradigm” data-intensive scientific discovery. Redmond, WA: Microsoft.Google Scholar
- King, C. (2012). Multiauthor papers: Onward and upward. Science Watch Newsletter, July 2012. http://archive.sciencewatch.com/newsletter/2012/201207/multiauthor_papers/.
- Newman, M. E. J. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physics Review E, 64(1). http://pre.aps.org/pdf/PRE/v64/i1/e016132.
- Price, D. J. S., & Gürsey, S. (2001). Studies in scientometrics I transience and continuance in scientific authorship. Ciência da Informação, 4. http://revista.ibict.br/cienciadainformacao/index.php/ciin-f/article/view/1611.
- Qin, J., Costa, M., & Wang, J. (2014). Attributions from data authors to publications: Implications for data curation. In The 9th international digital curation conference, 24–27 February 2014, San Francisco.Google Scholar
- Szalay, A. S., & Blakeley, J. A. (2009). Grey’s laws: Database-centric computing in science. In T. Hey & S. Tansley (Eds.), The fourth paragim: Data-intensive scientific discovery (pp. 5–11). Redmond, WA: Microsoft Research.Google Scholar