Abstract
As software evolves, data types have to be extended, possibly with new data variants or new operations. Object-oriented design is well-known to support data extensions well. In fact, most popular books showcase data extensions to illustrate how objects adequately support software evolution. Conversely, operation extensions are typically better supported by a functional design. A large body of programming language research has been devoted to the challenge of properly supporting both kinds of extensions. While this challenge is well-known from a language design standpoint, it has not been studied empirically. We perform such a study on a large sample of Smalltalk projects (over half a billion lines of code) and their evolution over more than 130,000 committed changes. Our study of extensions during software evolution finds that extensions are indeed prevalent evolution tasks, and that both kinds of extensions are equally common in object-oriented software. We also discuss findings about: the evolution of the kinds of extensions over time; the viability of the Visitor pattern as an object-oriented solution to operation extensions; the change-proneness of extensions; and the prevalence of extensions by third parties. This study suggests that object-oriented design alone is not sufficient, and that practical support for both kinds of program decomposition approaches are in fact needed, either by the programming language or by the development environment.
Similar content being viewed by others
Notes
This paper extends our previous conference publication (Robbes et al. 2012b) with two new research questions, one related to the stability of extensions (Q5) and the other related to the analysis of third-party extensions (Q6).
Anticipating the fact that we study Smalltalk code, we present the example in a dynamically-typed class-based setting, using inheritance to define data variants.
Adding a new subclass of Object is not considered a data extension.
We discuss the relative prevalence of both kinds of extensions in Section 5.
Cases where both kinds of extensions overlap in the same hierarchy are especially interesting because they correspond to scenarios that no single data abstraction mechanism would be able to handle properly.
Cohen’s d varies from -1 to 1; the commonly accepted thresholds for effect size are 0.2 (small), 0.5 (medium), and 0.8 (strong). Negative values of d indicate an effect in the opposite direction, and have identical thresholds.
We contemplated splitting the sets of changes in equal time periods, instead of equal number of commits per period. However, determining the time periods involves computing the time interval based on the first and the last change of the hierarchies. This introduces a bias in the earlier and later periods (more changes are found in the very first and very last periods), hence we discarded that idea.
The discrepancy in number of hierarchies is because there may not be a one-to-one mapping between visitors and visited hierarchies.
Note that because data extensions are by definition modular in an object-oriented decomposition, it is unnecessary to study the distribution of their changes.
Unlike earlier, we are not able to do the analysis at the level of hierarchies, as it is not always clear to which hierarchy a third-party operation extension belongs. This issue is discussed in more details in Section 10.
References
Arcuri A, Briand LC (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd international conference on software engineering, (ICSE 2011). pp 1–10
Aversano L, Canfora G, Cerulo L, Del Grosso C, Di Penta M (2007) An empirical study on the evolution of design patterns. In: Proceedings of the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT international symposium on foundations of software engineering (ESEC/SIGSOFT FSE 2007). pp 385–394
Baxter G, Frean MR, Noble J, Rickerby M, Smith H, Visser M, Melton H, Tempero ED (2006) Understanding the shape of Java software. In: Proceedings of the 21st annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications (OOPSLA 2006). pp 397–412
Booch G (1994) Object-oriented analysis and design with applications, 2nd edn. Addison-Wesley, Reading
Callaú O, Robbes R, Tanter É, Roethlisberger D (2012) How (and why) developers use the dynamic features of programming languages: the case of Smalltalk. Empirical Software Engineering. Available Online: doi:10.1007/s10664-012-9203-2
Clifton C, Leavens GT, Chambers C, Millstein T (2000) MultiJava: modular open classes and symmetric multiple dispatch in java. In: Proceedings of the 15th international conference on object-oriented programming systems, languages and applications (OOPSLA 2000). ACM SIGPLAN notices, 35(11). ACM Press, Minneapolis, pp 130–145
Cook WR (1990) Object-oriented programming versus abstract data types. In: Proceedings of the REX workshop/school on the foundations of object-oriented languages, volume 73 of Lecture Notes in Computer Science. Springer
Cook WR (2009) On understanding data abstraction, revisited. ACM SIGPLAN Not 44(10):557–572
Erlikh L (2000) Leveraging legacy system dollars for e-business. IT Prof 2(3):17–23
Gamma E, Helm R, Johnson R, Vlissides J (1994) Design patterns: elements of reusable object-oriented software. Professional computing series. Addison-Wesley, Reading
Gîrba T, Lanza M, Ducasse S (2005) Characterizing the evolution of class hierarchies. In: Proceedings of the 9th European conference on software maintenance and reengineering (CSMR 2005). pp 2–11
Gorschek T, Tempero ED, Angelis L (2010) A large-scale empirical study of practitioners’ use of object-oriented concepts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering (ICSE 2010). pp 115–124
Grechanik M, McMillan C, DeFerrari L, Comi M, Crespi S, Poshyvanyk D, Fu C, Xie Q, Ghezzi C (2010) An empirical investigation into a large-scale Java open source code repository. In: Proceedings of the 4th international symposium on empirical software engineering and measurement (ESEM 2010). pp 11:1–11:10
Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, Washington DC, pp 78–88
Kapser CJ, Godfrey MW(2006) Supporting the analysis of clones in software systems: a case study. J Softw Maint Evol Res Pract 18(2):61–82
Kiczales G, Hilsdale E, Hugunin J, Kersten M, Palm J, Griswold W (2001) An overview of AspectJ. In: Knudsen JL (ed) Proceedings of the 15th European conference on object-oriented programming (ECOOP 2001), number 2072 of Lecture Notes in Computer Science. Springer, Budapest, pp 327–353
Krishnamurthi S, Felleisen M, Friedman DP (1998) Synthesizing object-oriented and function design to promote reuse. In: Jul E (ed) Proceedings of the 12th European conference on object-oriented programming (ECOOP 98), volume 1445 of Lecture Notes in Computer Science. Springer, Brussels, pp 91–113
Lehman M, Belady L (1985) Program evolution: processes of software change. London Academic Press, London
Louridas P, Spinellis D, Vlachos V (2008) Power laws in software. ACM Trans Softw Eng Methodol 18(1). Article No. 2
Mayrand J, Leblanc C, Merlo EM (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings on the 1996 international conference on software maintenance. pp 244 –253
Meyer B (2009) Software architecture: functional vs. object-oriented design. In: Spinellis D, Gousios G (eds) Beautiful Architecture. OReilly, pp 315–348
Oliveira BCDS (2009) Modular visitor components: a practical solution to the expression families problem. In: Drossopoulou S (ed) Proceedings of the 23rd European conference on object-oriented programming (ECOOP 2009), number 5653 in Lecture Notes in Computer Science. Springer, Genova, pp 269–293
Parnin C, Bird C, Murphy-Hill E (2012) Adoption and use of java generics. Empir Softw Eng 18(6):1047–1089. http://link.springer.com/article/10.1007%2Fs10664-012-9236-6
Posnett D, Filkov V, Devanbu P (2011) Ecological inference in empirical software engineering. In: Proceedings of the 26th ACM/IEEE international conference on automated software engineering (ASE 2011). pp 362–371
Reynolds JC (1975) User-defined types and procedural data structures as complementary approaches to data abstraction. In: Proceedings of the conference on new directions in algorithmic languages. Munich, pp 157–168
Robbes R, Lanza M (2005) Versioning systems for evolution research. In: IWPSE 2005: proceedings of the 8th international workshop on principles of software evolution. pp 155–164
Robbes R, Lungu M (2011) A study of ripple effects in software ecosystems. In: Proceedings of the 33rd ACM/IEEE international conference on software engineering (ICSE 2011), new ideas and emerging results track. ACM Press, Honolulu, pp 904–907
Robbes R, Lungu M, Röthlisberger D (2012a) How do developers react to API deprecation? The case of a Smalltalk ecosystem. In: FSE-20: proceedings of the symposium on the foundations of software engineering. p 56
Robbes R, Röthlisberger D, Tanter É (2012b) Extensions during software evolution: do objects meet their promise? In: Noble J (ed) Proceedings of the 26th European conference on object-oriented programming (ECOOP 2012), volume 7313 of Lecture Notes in Computer Science. Springer, Beijing, pp 28–52
Schwarz N, Lungu M, Robbes R (2012) On how often code is cloned across repositories. In: Proceedings of the 34th ACM/IEEE international conference on software engineering (ICSE 2012, NIER Track)
Shalloway A, Trott JR (2004) Design patterns explained: a new perspective on object-oriented design, 2nd edn. Addison-Wesley, Reading
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27
Tempero ED, Noble J, Melton H (2008) How do Java programs use inheritance? An empirical study of inheritance in Java software. In: Proceedings of the 22nd European conference on object-oriented programming (ECOOP 2008). pp 667–691
Torgersen M (2004) The expression problem revisited (four new solutions using generics). In: Odersky M (ed) Proceedings of the 18th European conference on object-oriented programming (ECOOP 2004), number 3086 in Lecture Notes in Computer Science. Springer, Oslo, pp 123–146
Van Rysselberghe F, Demeyer S (2007) Studying versioning information to understand inheritance hierarchy changes. In: Proceedings of the 4th international workshop on mining software repositories (MSR 2007). p 16
Vargha A, Delaney HD (2000) A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Educ Behav Stat 25(2):101–132
Wadler P (1998) The expression problem. Mail to the java-genericity mailing list
Zenger M, Odersky M (2005) Independently extensible solutions to the expression problem. In: Workshop on foundations of object-oriented languages (FOOL). Long Beach
Zimmermann T, Weißgerber P, Diehl S, Zeller A (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445
Acknowledgments
We thank the ECOOP and EMSE reviewers for their thorough and helpful comments. R. Robbes and É. Tanter are partially funded by FONDECYT Projects 11110463 and 1110051, respectively.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Arie van Deursen
Rights and permissions
About this article
Cite this article
Robbes, R., Röthlisberger, D. & Tanter, É. Object-oriented software extensions in practice. Empir Software Eng 20, 745–782 (2015). https://doi.org/10.1007/s10664-013-9298-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-013-9298-0