Empirical Software Engineering

, Volume 23, Issue 2, pp 1123–1152 | Cite as

Data sets describing the circle of life in Ruby hosting, 2003–2016

  • Megan Squire


Studying software repositories and hosting services can provide valuable insights into the behaviors of large groups of software developers and their projects. Traditionally, most analysis of metadata collected from software project hosting services has been conducted by specifying some short window of time, typically just a few years. To date, few - if any - studies have been built from data comprising the entirety of a hosting facility’s lifespan: from its birth to its death, and rebirth in another form. Thus, the first contribution of this paper is to present two data sets that support the historical analysis of over ten years of collected metadata from the now-defunct RubyForge project hosting site, as well as the follow-on successor to RubyForge, the RubyGems package (“gem”) hosting facility. The data sets and samples of usage demonstrated in this paper include: analyses of overall forge growth over time, presentation of data and analyses of project-level characteristics on both forges and their changes over time (for example in licenses, languages, and so on), and demonstration of how to use developer-level metadata (for example counts of new developers and calculation of developer-project density) to assess changes in person-level activity on both sites over time. Finally, because RubyForge was phased out and the gem-hosting portion of it was replaced by RubyGems, all the gems within RubyForge projects were transferred by project owners and by the site owners themselves into the RubyGems hosting facility. Thus, the data sets in this paper represent a unique opportunity to study projects as they moved from one ecosystem to another, and as such we show several methods for locating related projects between the two forges, and for building a cross-forge, longitudinal project history using information from both forges. These data sets and sample analyses in this paper will be relevant to researchers studying long-term software evolution, and distributed, hosted, or collaborative software development environments.


Open source software Forge Repository Project hosting RubyForge RubyGems Ruby Software evolution Project metadata Developer metadata Data Dataset 



We gratefully acknowledge the National Science Foundation (grant number NSF-14-05643) for supporting this work.


  1. Blair P (2010) Replaces RubyForge as Gem Host. InfoQ. Mar 30. Available at
  2. Booch G, Brown AW (2003) Collaborative development environments. Advances in Computers (59):1–27Google Scholar
  3. Cooper P (2009) Gemcutter is the new official default ruby gem host. RubyInside (Oct 26)
  4. Delorey DP, Knutson CD, Giraud-Carrier C (2007) Programming language trends in open source development: An evaluation using data from all production phase SourceForge projects. In Proc. 2nd Workshop Public Data Software Dev. (WoPDaSD). Limerick, IrelandGoogle Scholar
  5. DiBona C (2015) Bidding farewell to Google code. Google Open Source Blog March 12. Retrieved April 18, 2017 from
  6. FLOSSmole (2004)
  7. GitHub (2017) About GitHub. Available at: https://GitHubcom/ about Retrieved April 20, 2017
  8. Gousios G (2013) The GHTorrent dataset and tool suite. Proc. 10th Int. Conf. On mining software repositories (MSR 2013). 233–236Google Scholar
  9. Harry B (2017) Shutting Down CodePlex. Brian Harry's Blog. March 31. Retrieved April 18, 2017 from
  10. Howison J, Crowston K, Conklin M (2006) FLOSSmole: a collaborative repository for FLOSS research data and analyses. Int J Information Technology and Web Engineering 1(3):17–26CrossRefGoogle Scholar
  11. Hyett PJ (2008) GitHub's RubyGem server. GitHub Blog. Available at
  12. Knuth DE (1973) The art of computer programming: volume 3, Sorting and Searching. Addison Wesley Longman Publishing Co., Inc., Redwood City, pp 391–92Google Scholar
  13. Krein JL, MacLean AC, Knutson CD, Delorey DP, Eggett DL (2009) Language entropy: A metric or characterization of author programming language distribution. In Proc. 4th Workshop Public Data Software Dev. (WoPDaSD). Skovde, SwedenGoogle Scholar
  14. Krein JL, MacLean AC, Knutson CD, Delorey DP, Eggett DL (2010) Impact of programming language fragmentation on developer productivity. Int. J. Open Source Sw. & Proc 2(2):41–61CrossRefGoogle Scholar
  15. Lerner J, Tirole J (2005) The scope of open source licensing. J. of Law, Economics, and. Policy 21(1):20–56Google Scholar
  16. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals, Doklady Akademii Nauk SSSR, 163(4):845-848, 1965 (Russian). English translation in Soviet Physics Doklady 10(8):707–710 Available at:
  17. McAlister N (2013) Study: most projects on GitHub not open source licensed. The Register April 13. Available at:
  18. Miller J (2012) History of the canonical gem host for Ruby gems. Available at
  19. Phipps S (2014) Why all software needs a license. InfoWorld November 7. Available at:
  20. Phoenix E (2013) Tweet. November 10. Available at
  21. Roberts R (2009) Gemcutter: a fast and easy approach to ruby gem hosting. RubyInside. Aug 20. Available at
  22. Ruby Rogues Podcast (2012) RubyGems with Nick Quaranto. Episodes 36. January 5. Available at:
  23. RubyGems (2016a) About RubyGems. Available at:
  24. RubyGems (2016b) RubyGems Data Dumps. Available at:
  25. RubyGems (2016c) RubyGems Guides: Make your own gem. Available at
  26. Schuster W (2009) RubyForge to be phased out. InfoQ Oct 26
  27. Squire M (2009) Integrating projects from multiple open source code forges. Int J Open Source Software & Proc 1(1):46–57CrossRefGoogle Scholar
  28. Squire M (2016a) Data Sets: The Circle of Life in Ruby Hosting, 2003–2015. In Proc. 13 th Int. Conference on Mining Software Repositories (MSR2016). Austin, TX, USA. 452–455Google Scholar
  29. Squire M (2016b) Mastering Data Mining with Python. Packt: London, UK. Program available at:
  30. Stocker M (2008) Pros and cons of GitHub vs RubyForge as gem source. InfoQ August 14. Available at
  31. Vasilescu B, Posnett D, Ray B, van den Brand MG, Serebrenik A, Devanbu P, Filkov V (2015) Gender and tenure diversity in GitHub teams. In Proc. CHI. ACMGoogle Scholar
  32. Vendome C (2015) A large scale study of license usage on GitHub. In Proc. 37th Int. Conf. Softw. Eng. (ICSE), 2, 772–774Google Scholar
  33. Villa L (2013) Younger developers reject licensing, risk chance for reform. Feb 13. Available at:
  34. Wanstrath C (2009) GitHub gem building is defunct. Oct 8.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Elon UniversityElonUSA

Personalised recommendations