Warehousing and Studying Open Source Versioning Metadata
In this paper, we describe the downloading and warehousing of Open Source Software (OSS) versioning metadata from SourceForge, BerliOS Developer, and GNU Savannah. This data enables and supports research in areas such as software engineering, open source phenomena, social network analysis, data mining, and project management. This newly-formed database containing Concurrent Versions System (CVS) and Subversion (SVN) metadata offers new research opportunities for large-scale OSS development analysis. The CVS and SVN data is juxtaposed with the SourceForge.net Research Data Archive  for the purpose of performing more powerful and interesting queries. We also present an initial statistical analysis of some of the most active projects.
KeywordsVirtual Machine Open Source Software Social Network Analysis Version Control Open Source Software Project
- 1.Per Cederqvist. Version management with cvs (2002)Google Scholar
- 2.Fischer, M., Pinzger, M., Gall, H.: Populating a release history database from version control and bug tracking systems. In: Proceedings of the International Conference on Software Maintenance, pp. 23–32. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
- 5.Van Antwerp, M., Madey, G.: Advances in the sourceforge research data archive. In: Workshop on Public Data about Software Development (WoPDaSD) at The 4th International Conference on Open Source Systems, Milan, Italy (2008)Google Scholar
- 6.Xu, J., Huang, Y., Madey, G.: A research support system framework for web datamining research: Workshop on applications, products and services of web-based support systems. In: The Joint International Conference on Web Intelligence (2003 IEEE/WIC) and Intelligent Agent Technology, Halifax, Canada, October 2003, pp. 37–41 (2003)Google Scholar