Warehousing and Studying Open Source Versioning Metadata

  • Matthew Van Antwerp
  • Greg Madey
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 319)


In this paper, we describe the downloading and warehousing of Open Source Software (OSS) versioning metadata from SourceForge, BerliOS Developer, and GNU Savannah. This data enables and supports research in areas such as software engineering, open source phenomena, social network analysis, data mining, and project management. This newly-formed database containing Concurrent Versions System (CVS) and Subversion (SVN) metadata offers new research opportunities for large-scale OSS development analysis. The CVS and SVN data is juxtaposed with the Research Data Archive [5] for the purpose of performing more powerful and interesting queries. We also present an initial statistical analysis of some of the most active projects.


Virtual Machine Open Source Software Social Network Analysis Version Control Open Source Software Project 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Per Cederqvist. Version management with cvs (2002)Google Scholar
  2. 2.
    Fischer, M., Pinzger, M., Gall, H.: Populating a release history database from version control and bug tracking systems. In: Proceedings of the International Conference on Software Maintenance, pp. 23–32. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  3. 3.
    Rundensteiner, E.A., Koeller, A., Zhang, X.: Maintaining data warehouses over changing information sources. Commun. ACM 43(6), 57–62 (2000)CrossRefGoogle Scholar
  4. 4.
    Tichy, W.F.: Rcs—a system for version control. Softw. Pract. Exper. 15(7), 637–654 (1985)CrossRefGoogle Scholar
  5. 5.
    Van Antwerp, M., Madey, G.: Advances in the sourceforge research data archive. In: Workshop on Public Data about Software Development (WoPDaSD) at The 4th International Conference on Open Source Systems, Milan, Italy (2008)Google Scholar
  6. 6.
    Xu, J., Huang, Y., Madey, G.: A research support system framework for web datamining research: Workshop on applications, products and services of web-based support systems. In: The Joint International Conference on Web Intelligence (2003 IEEE/WIC) and Intelligent Agent Technology, Halifax, Canada, October 2003, pp. 37–41 (2003)Google Scholar

Copyright information

© IFIP 2010

Authors and Affiliations

  • Matthew Van Antwerp
    • 1
  • Greg Madey
    • 1
  1. 1.University of Notre Dame 

Personalised recommendations