Rete-netzwerk-red: analyzing and visualizing scholarly networks using the Network Workbench Tool
The enormous increase in digital scholarly data and computing power combined with recent advances in text mining, linguistics, network science, and scientometrics make it possible to scientifically study the structure and evolution of science on a large scale. This paper discusses the challenges of this ‘BIG science of science’—also called ‘computational scientometrics’ research—in terms of data access, algorithm scalability, repeatability, as well as result communication and interpretation. It then introduces two infrastructures: (1) the Scholarly Database (SDB) (http://sdb.slis.indiana.edu), which provides free online access to 22 million scholarly records—papers, patents, and funding awards which can be cross-searched and downloaded as dumps, and (2) Scientometrics-relevant plug-ins of the open-source Network Workbench (NWB) Tool (http://nwb.slis.indiana.edu). The utility of these infrastructures is then exemplarily demonstrated in three studies: a comparison of the funding portfolios and co-investigator networks of different universities, an examination of paper-citation and co-author networks of major network science researchers, and an analysis of topic bursts in streams of text. The article concludes with a discussion of related work that aims to provide practically useful and theoretically grounded cyberinfrastructure in support of computational scientometrics research, education and practice.
KeywordsScientometrics Science of science Evolution of science Computational scientometrics Data access Algorithm scalability Cyberinfrastructure Scholarly Database Network Workbench Related tools Open source Open access
We would like to acknowledge the contributions and support by the NWB team and advisory board. This work is funded by the School of Library and Information Science and the Cyberinfrastructure for Network Science Center at Indiana University, the James S. McDonnell Foundation, and the National Science Foundation under Grants No. IIS-0715303, IIS-0534909, and IIS-0513650. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
- Adar, E. (2007). Guess: The graph exploration system. Accessed April 22, 2008, from http://graphexploration.cond.org/.
- AT&T Research Group. (2008). Graphviz-graph visualizaiton software. Accessed July 17, 2008, from http://www.graphviz.org/Credits.php.
- Atkins, D. E., Drogemeier, K. K., Feldman, S. I., Garcia-Molina, H., Klein, M. L., Messerschmitt, D. G., et al. (2003). Revolutionizing science and engineering through cyberinfrastructure: Report of the national science foundation blue-ribbon advisory panel on cyberinfrastructure. Arlington: National Science Foundation.Google Scholar
- Auber, D. (2003). Tulip: A huge graph visualisation framework. In P. Mutzel & M. Jünger (Eds.), Graph drawing softwares, mathematics and visualization (pp. 105–126). Berlin: Springer-Verlag.Google Scholar
- Batagelj, V., & Mrvar, A. (1998). Pajek-program for large network analysis. Connections, 21(2), 47–57.Google Scholar
- Borgatti, S. P., Everett, M. G., & Freeman, L. C. (2002). Ucinet for windows: Software for social network analysis. Accessed July 15, 2008, from http://www.analytictech.com/ucinet/ucinet_5_description.htm.
- Börner, K. (2008). Network Workbench Tool: For large scale network analysis, modeling, and visualization. (unpublished) http://ivl.slis.indiana.edu/km/pres/2008-borner-nwb-ws.pdf.
- Börner, K. (submitted). Plug-and-Play macroscopes. Communications of the ACM.Google Scholar
- Börner, K., Sanyal, S., & Vespignani, A. (2007). Network science. In B. Cronin (Ed.), Annual review of information science & technology (ARIST) (Vol. 41, pp. 537–607). Medford, NJ: Information Today, Inc./American Society for Information Science and Technology.Google Scholar
- Bornmann, L. (2006). H index: A new measure to quantify the research output of individual scientists. Accessed July 17, 2008, from http://www.forschungsinfo.de/iq/agora/H_Index/h_index.asp.
- Brandes, U., & Wagner, D. (2008). Analysis and visualization of social networks. Accessed July 15, 2008, from http://visone.info/.
- Csárdi, G., & Nepusz, T. (2006). The igraph software package for complex network research. Accessed July 17, 2008, from http://necsi.org/events/iccs6/papers/c1602a3c126ba822d0bc4293371c.pdf.
- Cyberinfrastructure for Network Science Center. (2008). Cyberinfrastructure shell. Accessed July 17, 2008, from http://cishell.org/.
- Cyberinfrastructure for Network Science Center. (2009). Network Workbench Tool: User manual, 1.0.0. (9/16). Accessed September 23, 2009, from http://nwb.slis.indiana.edu/Docs/NWBTool-Manual.pdf.
- Cytoscape Consortium. (2008). Cytoscape. Accessed September 14, 2008, from http://www.cytoscape.org/index.php.
- de Solla Price, D. J. (1963). Little science, big science. Unpublished Manuscript.Google Scholar
- Django Software Foundation. (2009). Django: The web framework for perfectionists with deadlines. Accessed January 13, 2008, from http://www.djangoproject.com/contact/foundation/.
- Emmott, S., Rison, S., Abiteboul, S., Bishop, C., Blakeley, J., Brun, R., et al. (2006). Towards 2020 science. The Microsoft Research Group and the 2020 Science Group. Accessed January 13, 2008, from http://research.microsoft.com/en-us/um/cambridge/projects/towards2020science/downloads/T2020S_ReportA4.pdf.
- Fekete, J.-D., & Börner-chairs, K. (2004). Workshop on information visualization software infrastructures Austin, Texas.Google Scholar
- Garfield, E. (2008). HistCite: bibliometric analysis and visualization software (Version 8.5.26). Bala Cynwyd, PA: HistCite Software LLC. Accessed July 15, 2008, from http://www.histcite.com/.
- Giles, C. L. (2006). The future of CiteSeeer: CiteSeerx, Lecture Notes in Computer Science (Vol. 4213). Berlin/Heidelberg: Springer.Google Scholar
- Harzing, A.-W. (2008). Publish or Perish: A citation analysis software program. Accessed April 22, 2008, from http://www.harzing.com/resources.htm.
- Heer, J., Card, S. K., & Landay, J. A. (2005). Prefuse: A toolkit for interactive information visualization. Conference on human factors in computing systems (pp. 421–430). Portland, OR/New York: ACM Press.Google Scholar
- Herr II, Bruce W., Huang, W. (Bonnie), Penumarthy, S., & Börner, K. (2007). Designing highly flexible and usable cyberinfrastructures for convergence. In W. S. Bainbridge & M. C. Roco (Eds.), Progress in convergence: Technologies for human wellbeing (Vol. 1093, pp. 161–179). Boston, MA: Annals of the New York Academy of Sciences.Google Scholar
- Huang, W. (Bonnie), Herr, B., Duhon, R., & Börner, K. (2007). Network Workbench—Using service-oriented architecture and component-based development to build a tool for network scientists. Presented at International Workshop and Conference on Network Science, Queens, NY.Google Scholar
- Ihaka, R., & Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3), 299–314. Accessed July 17, 2008, from http://www.amstat.org/publications/jcgs/.
- Kleinberg, J. M. (2002). Bursty and hierarchical structure in streams. 8th ACMSIGKDD international conference on knowledge discovery and data mining (pp. 91–101). ACM Press.Google Scholar
- Krebs, V. (2008). Orgnet.com: Software for social network analysis and organizational network analysis. Accessed July 17, 2008, from Accessed from http://www.orgnet.com/inflow3.html.
- LaRowe, G., Ambre, S. A., Burgoon, J. W., Ke, W., & Börner, K. (2009). The Scholarly Database and its utility for scientometrics research. Scientometrics, 79(2), 219–234. Accessed September 23, 2008, from http://ivl.slis.indiana.edu/km/pub/2009-larowe-sdb.pdf.
- Leydesdorff, L. (2008). Software and data of Loet Leydesdorff. Accessed July 15, 2008, from http://users.fmg.uva.nl/lleydesdorff/software.htm.
- Meho, L. I., & Yang, K. (2007). Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar. Journal of the American Society for Information Science and Technology, 58(13), 2105–2125. Accessed April 1, 2008, from http://dlist.sir.arizona.edu/1733/01/meho-yang-03.pdf.Google Scholar
- O’Madadhain, J., Fisher, D., & Nelson, T. (2008). Jung: Java universal network/graph framework. University of California, Irvine. Accessed from http://jung.sourceforge.net/.
- OSGi Alliance (2008). OSGi Alliance. Accessed July 15, 2008, from http://www.osgi.org/Main/HomePage.
- Pauly, D., & Stergiou, K. I. (2005). Equivalence of results from two citation analyses: Thomson ISI’s citation index and Google scholar’s service. Ethics in Science and Environmental Politics, 2005, 33–35.Google Scholar
- Persson, O. (2008). Bibexcel. Umeå, Sweden: Umeå University. Accessed July 15, 2008, from http://www.umu.se/inforsk/Bibexcel/.
- PostgreSQL Global Development Group (2009). PostgreSQL: The world’s most advanced open source database. Accessed January 13, 2008, from http://www.postgresql.org/about/.
- Python Software Foundation. (2008). Python programming language–Official website. Accessed January 13, 2008, from http://www.python.org/.
- Siek, J., Lee, L.-Q., & Lumsdaine, A. (2002). The boost graph library: User guide and reference manual. New York: Addison-Wesley.Google Scholar
- Thomson Reuters. (2009). Web of Science. Accessed September 23, 2009, from http://scientific.thomsonreuters.com/products/wos/.
- The Apache Software Foundation. (2007). Apache Solr. Accessed January 13, 2008, from http://lucene.apache.org/solr/.
- van Eck, N. J., & Waltman, L. (2009). VOSviewer. Accessed December 7, 2008, from http://www.vosviewer.com/.
- Williams, T., & Kelley, C. (2008). gnuplot homepage. Accessed July 17, 2008, from http://www.gnuplot.info/.