Empirical Software Engineering

, Volume 14, Issue 3, pp 262-285

Open Access This content is freely available online to anyone, anywhere at any time.

Macro-level software evolution: a case study of a large software compilation

  • Jesus M. Gonzalez-BarahonaAffiliated withUniversidad Rey Juan Carlos Email author 
  • , Gregorio RoblesAffiliated withUniversidad Rey Juan Carlos
  • , Martin MichlmayrAffiliated withOpen Source Program Office, HP
  • , Juan José AmorAffiliated withUniversidad Rey Juan Carlos
  • , Daniel M. GermanAffiliated withUniversity of Victoria


Software evolution studies have traditionally focused on individual products. In this study we scale up the idea of software evolution by considering software compilations composed of a large quantity of independently developed products, engineered to work together. With the success of libre (free, open source) software, these compilations have become common in the form of ‘software distributions’, which group hundreds or thousands of software applications and libraries into an integrated system. We have performed an exploratory case study on one of them, Debian GNU/Linux, finding some significant results. First, Debian has been doubling in size every 2 years, totalling about 300 million lines of code as of 2007. Second, the mean size of packages has remained stable over time. Third, the number of dependencies between packages has been growing quickly. Finally, while C is still by far the most commonly used programming language for applications, use of the C++, Java, and Python languages have all significantly increased. The study helps not only to understand the evolution of Debian, but also yields insights into the evolution of mature libre software systems in general.


Mining software repositories Large software collections Software evolution Software integrators