Skip to main content
Log in

Using structural and semantic measures to improve software modularization

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Changes during software evolution and poor design decisions often lead to packages that are hard to understand and maintain, because they usually group together classes with unrelated responsibilities. One way to improve such packages is to decompose them into smaller, more cohesive packages. The difficulty lies in the fact that most definitions and interpretations of cohesion are rather vague and the multitude of measures proposed by researchers usually capture only one aspect of cohesion. We propose a new technique for automatic re-modularization of packages, which uses structural and semantic measures to decompose a package into smaller, more cohesive ones. The paper presents the new approach as well as an empirical study, which evaluates the decompositions proposed by the new technique. The results of the evaluation indicate that the decomposed packages have better cohesion without a deterioration of coupling and the re-modularizations proposed by the tool are also meaningful from a functional point of view.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The method call are captured through static analysis of the source code.

  2. http://www.jhotdraw.org

  3. The complete results achieved with all the possible combinations of parameters can be found in Bavota et al. (2011a).

  4. The complete results achieved with the other systems can be found in Bavota et al. (2011a)

References

  • Abdeen H, Ducasse S, Sahraoui HA, Alloui I (2009) Automatic package coupling and cycle minimization. In: WCRE, pp 103–112

  • Anquetil N, Lethbridge T (1999) Experiments with clustering as a software remodularization method. In: WCRE, pp 235–255

  • Antoniol G, Penta MD, Casazza G, Merlo E (2001) A method to re-organize legacy systems via concept analysis. In: IWPC, pp 281–292

  • Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading, MA

    Google Scholar 

  • Basili V, Caldiera G, Rombach DH (1994) The goal question metric paradigm. Wiley, Inc., New York

    Google Scholar 

  • Bavota G, De Lucia A, Marcus A, Oliveto R (2010a) Software re-modularization based on structural and semantic metrics. In: Proceedings of the 17th working conference on reverse engineering. Beverly, MA, USA, pp 195–204

  • Bavota G, De Lucia A, Marcus A, Oliveto R (2010b) A two-step technique for extract class refactoring. In: Proceedings of 25th IEEE international conference on automated software engineering, pp 151–154

  • Bavota G, Oliveto R, De Lucia A, Antoniol G, Guéhéneuc YG (2010c) Playing with refactoring: identifying extract class opportunities through game theory. In: Proceedings of the 26th IEEE international conference on software maintenance

  • Bavota G, De Lucia A, Marcus A, Oliveto R (2011a) Software re-modularization based on structural and semantic metrics. Tech. rep., University of Salerno. http://www.sesa.dmi.unisa.it/TR2011_EMSE.pdf

  • Bavota G, De Lucia A, Oliveto R (2011b) Identifying extract class refactoring opportunities using structural and semantic cohesion measures. J syst softw 84(3):397–414

    Article  Google Scholar 

  • Bittencourt RA, Guerrero DDS (2009) Comparison of graph clustering algorithms for recovering software architecture module views. In: Proceedings of the 2009 European conference on software maintenance and reengineering. IEEE Computer Society, Washington, DC, USA pp 251–254

    Chapter  Google Scholar 

  • Canfora G, Cimitile A, De Lucia A, Di Lucca GA (2001) Decomposing legacy systems into objects: an eclectic approach. Inf Softw Technol 43(6):401–412

    Article  Google Scholar 

  • Cimitile A, Visaggio G (1995) Software salvaging and the call dominance tree. J Syst Softw 28(2):117–127

    Article  Google Scholar 

  • Corazza A, Martino SD, Scanniello G (2010) A probabilistic based approach towards software system clustering. In: CSMR, pp 88–96

  • Corazza A, Martino SD, Maggio V, Scanniello G (2011) Investigating the use of lexical information for software system clustering. In: CSMR, pp 35–44

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • De Lucia A, Oliveto R, Vorraro L (2008) Using structural and semantic metrics to improve class cohesion. In: Proceedings of international conference on software maintenance. Beijing, China, pp 27–36

  • De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011) Improving ir-based traceability recovery using smoothing filters. In: Proceedings of the 19th IEEE international conference on program comprehension. Kingston, ON, Canada, pp 21–30

  • Ducasse S, Pollet D, Suen M, Abdeen H, Alloui I (2007) Ackage surface blueprints: visually supporting the understanding of package relationships. In: Proceedings of international conference on software maintenance. Paris, France, pp 94–103

  • Harman M, Hierons RM, Proctor M (2002) A new representation and crossover operator for search-based optimization of software modularization. In: Proceedings of the 2002 conference on genetic and evolutionary computation, pp 1351–1358

  • Harman M, Swift S, Mahdavi K (2005) An empirical study of the robustness of two module clustering fitness functions. In: Proceedings of the 2005 conference on genetic and evolutionary computation. ACM Press, Washington DC, USA, pp 1029–1036

    Chapter  Google Scholar 

  • Hartigan JA (1975) Clustering algorithms. Wiley, New York

    MATH  Google Scholar 

  • Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632

    Article  MathSciNet  MATH  Google Scholar 

  • Koschke R, Canfora G, Czeranski J (2006) Revisiting the delta ic approach to component recovery. Sci Comput Program 60(2):171–188

    Article  MathSciNet  MATH  Google Scholar 

  • Kuhn A, Ducasse S, Gîrba T (2007) Semantic clustering: identifying topics in source code. Inf Softw Technol 49(3):230–243

    Article  Google Scholar 

  • Lee Y, Liang B, Wu S, Wang F (1995) Measuring the coupling and cohesion of an object-oriented program based on information flow. In: International conference on software quality

  • Lehman MM (1980) On understanding laws, evolution, and conservation in the large-program life cycle. J Syst Softw 1:213–221

    Article  Google Scholar 

  • Maletic JI, Marcus A (2001) Supporting program comprehension using semantic and structural information. In: Proceedings of 23rd international conference on software engineering. IEEE CS Press, Toronto, Ontario, Canada, pp 103–112

    Google Scholar 

  • Mancoridis S, Mitchell BS, Rorres C, Chen YF, Gansner ER (1998) Using automatic clustering to produce high-level system organizations of source code. In: IWPC, p 45

  • Maqbool O, Babri HA (2007) Hierarchical clustering for software architecture recovery. IEEE Trans Softw Eng 33(11):759–780

    Article  Google Scholar 

  • Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng 34(2):287–300

    Article  Google Scholar 

  • Mitchell BS, Mancoridis S (2001) Comparing the decompositions produced by software clustering algorithms using similarity measurements. In: Proceedings of 17th international conference of software maintenance. IEEE CS Press, Florence, Italy, pp 744–753

    Google Scholar 

  • Mitchell BS, Mancoridis S (2006) On the automatic modularization of software systems using the bunch tool. IEEE Trans Softw Eng 32(3):193–208

    Article  Google Scholar 

  • O’Keeffe M, O’Cinneide M (2006) Search-based software maintenance. In: Proceedings of 10th European conference on software maintenance and reengineering. IEEE CS Press, Bari, Italy, pp 249–260

    Google Scholar 

  • Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter Publishers

  • Poshyvanyk D, Marcus A, Ferenc R, Gyimóthy T (2009) Using information retrieval based coupling measures for impact analysis. Empir Software Eng 14(1):5–32

    Article  Google Scholar 

  • Praditwong K, Harman M, Yao X (2011) Software module clustering as a multi-objective search problem. IEEE Trans Softw Eng 37(2):264–282

    Article  Google Scholar 

  • Ricca F, Pianta E, Tonella P, Girardi C (2008) Improving web site understanding with keyword-based clustering. J Softw Maint Evol 20(1):1–29. doi:10.1002/smr.v20:1

    Article  Google Scholar 

  • Scanniello G, D’Amico A, D’Amico C, D’Amico T (2010) Using the kleinberg algorithm and vector space model for software system clustering. In: ICPC, pp 180–189

  • Seng O, Bauer M, Biehl M, Pache G (2005) Search-based improvement of subsystem decompositions. In: GECCO, pp 1045–1051

  • Seng O, Stammel J, Burkhart D (2006) Search-based determination of refactorings for improving the class structure of object-oriented systems. In: Genetic and evolutionary computation conference, pp 1909–1916

  • Shaw SC, Goldstein M, Munro M, Burd E (2003) Moral dominance relations for program comprehension. IEEE Trans Softw Eng 29(9):851–863

    Article  Google Scholar 

  • Shtern M, Tzerpos V (2009) Methods for selecting and improving software clustering algorithms. In: Proceedings of 17th IEEE international conference on program comprehension. IEEE CS Press, Vancouver, Canada, pp 248–252

    Chapter  Google Scholar 

  • Stevens WP, Myers GJ, Constantine LL (1974) Structured design. IBM Syst J 13(2):115–139

    Article  Google Scholar 

  • Tonella P (2001) Concept analysis for module restructuring. IEEE Trans Softw Eng 27(4):351–363

    Article  Google Scholar 

  • van Deursen A, Kuipers T (1999) Identifying objects using cluster and concept analysis. In: Proceedings of 21st international conference on software engineering. ACM Press, Los Angeles, California, USA, pp 246–255

    Chapter  Google Scholar 

  • Wiggerts TA (1997) Using clustering algorithms in legacy systems remodularization. In: WCRE ’97: proceedings of the fourth working conference on reverse engineering (WCRE ’97). IEEE Computer Society, p 33

  • Wu J, Hassan AE, Holt RC (2005) Comparison of clustering algorithms in the context of software evolution. In: ICSM, pp 525–535

  • Yin RK (2003) Case study research: design and methods, 3rd edn. SAGE Publications

Download references

Acknowledgements

We would like to thank all the students who participated to our study. We would also like to thank the anonymous reviewers for their careful reading of our manuscript and useful comments. Andrian Marcus was supported in part by grants from the US National Foundation: CCF-1017263 and CCF-0845706.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rocco Oliveto.

Additional information

Editors: Giuliano Antoniol and Martin Pinzger

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bavota, G., De Lucia, A., Marcus, A. et al. Using structural and semantic measures to improve software modularization. Empir Software Eng 18, 901–932 (2013). https://doi.org/10.1007/s10664-012-9226-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-012-9226-8

Keywords

Navigation