Empirical Software Engineering

, Volume 18, Issue 5, pp 901–932 | Cite as

Using structural and semantic measures to improve software modularization

  • Gabriele Bavota
  • Andrea De Lucia
  • Andrian Marcus
  • Rocco OlivetoEmail author


Changes during software evolution and poor design decisions often lead to packages that are hard to understand and maintain, because they usually group together classes with unrelated responsibilities. One way to improve such packages is to decompose them into smaller, more cohesive packages. The difficulty lies in the fact that most definitions and interpretations of cohesion are rather vague and the multitude of measures proposed by researchers usually capture only one aspect of cohesion. We propose a new technique for automatic re-modularization of packages, which uses structural and semantic measures to decompose a package into smaller, more cohesive ones. The paper presents the new approach as well as an empirical study, which evaluates the decompositions proposed by the new technique. The results of the evaluation indicate that the decomposed packages have better cohesion without a deterioration of coupling and the re-modularizations proposed by the tool are also meaningful from a functional point of view.


Software re-modularization Information-flow-based coupling Conceptual coupling between classes Empirical studies 



We would like to thank all the students who participated to our study. We would also like to thank the anonymous reviewers for their careful reading of our manuscript and useful comments. Andrian Marcus was supported in part by grants from the US National Foundation: CCF-1017263 and CCF-0845706.


  1. Abdeen H, Ducasse S, Sahraoui HA, Alloui I (2009) Automatic package coupling and cycle minimization. In: WCRE, pp 103–112Google Scholar
  2. Anquetil N, Lethbridge T (1999) Experiments with clustering as a software remodularization method. In: WCRE, pp 235–255Google Scholar
  3. Antoniol G, Penta MD, Casazza G, Merlo E (2001) A method to re-organize legacy systems via concept analysis. In: IWPC, pp 281–292Google Scholar
  4. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading, MAGoogle Scholar
  5. Basili V, Caldiera G, Rombach DH (1994) The goal question metric paradigm. Wiley, Inc., New YorkGoogle Scholar
  6. Bavota G, De Lucia A, Marcus A, Oliveto R (2010a) Software re-modularization based on structural and semantic metrics. In: Proceedings of the 17th working conference on reverse engineering. Beverly, MA, USA, pp 195–204Google Scholar
  7. Bavota G, De Lucia A, Marcus A, Oliveto R (2010b) A two-step technique for extract class refactoring. In: Proceedings of 25th IEEE international conference on automated software engineering, pp 151–154Google Scholar
  8. Bavota G, Oliveto R, De Lucia A, Antoniol G, Guéhéneuc YG (2010c) Playing with refactoring: identifying extract class opportunities through game theory. In: Proceedings of the 26th IEEE international conference on software maintenanceGoogle Scholar
  9. Bavota G, De Lucia A, Marcus A, Oliveto R (2011a) Software re-modularization based on structural and semantic metrics. Tech. rep., University of Salerno.
  10. Bavota G, De Lucia A, Oliveto R (2011b) Identifying extract class refactoring opportunities using structural and semantic cohesion measures. J syst softw 84(3):397–414CrossRefGoogle Scholar
  11. Bittencourt RA, Guerrero DDS (2009) Comparison of graph clustering algorithms for recovering software architecture module views. In: Proceedings of the 2009 European conference on software maintenance and reengineering. IEEE Computer Society, Washington, DC, USA pp 251–254CrossRefGoogle Scholar
  12. Canfora G, Cimitile A, De Lucia A, Di Lucca GA (2001) Decomposing legacy systems into objects: an eclectic approach. Inf Softw Technol 43(6):401–412CrossRefGoogle Scholar
  13. Cimitile A, Visaggio G (1995) Software salvaging and the call dominance tree. J Syst Softw 28(2):117–127CrossRefGoogle Scholar
  14. Corazza A, Martino SD, Scanniello G (2010) A probabilistic based approach towards software system clustering. In: CSMR, pp 88–96Google Scholar
  15. Corazza A, Martino SD, Maggio V, Scanniello G (2011) Investigating the use of lexical information for software system clustering. In: CSMR, pp 35–44Google Scholar
  16. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRefGoogle Scholar
  17. De Lucia A, Oliveto R, Vorraro L (2008) Using structural and semantic metrics to improve class cohesion. In: Proceedings of international conference on software maintenance. Beijing, China, pp 27–36Google Scholar
  18. De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011) Improving ir-based traceability recovery using smoothing filters. In: Proceedings of the 19th IEEE international conference on program comprehension. Kingston, ON, Canada, pp 21–30Google Scholar
  19. Ducasse S, Pollet D, Suen M, Abdeen H, Alloui I (2007) Ackage surface blueprints: visually supporting the understanding of package relationships. In: Proceedings of international conference on software maintenance. Paris, France, pp 94–103Google Scholar
  20. Harman M, Hierons RM, Proctor M (2002) A new representation and crossover operator for search-based optimization of software modularization. In: Proceedings of the 2002 conference on genetic and evolutionary computation, pp 1351–1358Google Scholar
  21. Harman M, Swift S, Mahdavi K (2005) An empirical study of the robustness of two module clustering fitness functions. In: Proceedings of the 2005 conference on genetic and evolutionary computation. ACM Press, Washington DC, USA, pp 1029–1036CrossRefGoogle Scholar
  22. Hartigan JA (1975) Clustering algorithms. Wiley, New YorkzbMATHGoogle Scholar
  23. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632MathSciNetzbMATHCrossRefGoogle Scholar
  24. Koschke R, Canfora G, Czeranski J (2006) Revisiting the delta ic approach to component recovery. Sci Comput Program 60(2):171–188MathSciNetzbMATHCrossRefGoogle Scholar
  25. Kuhn A, Ducasse S, Gîrba T (2007) Semantic clustering: identifying topics in source code. Inf Softw Technol 49(3):230–243CrossRefGoogle Scholar
  26. Lee Y, Liang B, Wu S, Wang F (1995) Measuring the coupling and cohesion of an object-oriented program based on information flow. In: International conference on software qualityGoogle Scholar
  27. Lehman MM (1980) On understanding laws, evolution, and conservation in the large-program life cycle. J Syst Softw 1:213–221CrossRefGoogle Scholar
  28. Maletic JI, Marcus A (2001) Supporting program comprehension using semantic and structural information. In: Proceedings of 23rd international conference on software engineering. IEEE CS Press, Toronto, Ontario, Canada, pp 103–112Google Scholar
  29. Mancoridis S, Mitchell BS, Rorres C, Chen YF, Gansner ER (1998) Using automatic clustering to produce high-level system organizations of source code. In: IWPC, p 45Google Scholar
  30. Maqbool O, Babri HA (2007) Hierarchical clustering for software architecture recovery. IEEE Trans Softw Eng 33(11):759–780CrossRefGoogle Scholar
  31. Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng 34(2):287–300CrossRefGoogle Scholar
  32. Mitchell BS, Mancoridis S (2001) Comparing the decompositions produced by software clustering algorithms using similarity measurements. In: Proceedings of 17th international conference of software maintenance. IEEE CS Press, Florence, Italy, pp 744–753Google Scholar
  33. Mitchell BS, Mancoridis S (2006) On the automatic modularization of software systems using the bunch tool. IEEE Trans Softw Eng 32(3):193–208CrossRefGoogle Scholar
  34. O’Keeffe M, O’Cinneide M (2006) Search-based software maintenance. In: Proceedings of 10th European conference on software maintenance and reengineering. IEEE CS Press, Bari, Italy, pp 249–260Google Scholar
  35. Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter PublishersGoogle Scholar
  36. Poshyvanyk D, Marcus A, Ferenc R, Gyimóthy T (2009) Using information retrieval based coupling measures for impact analysis. Empir Software Eng 14(1):5–32CrossRefGoogle Scholar
  37. Praditwong K, Harman M, Yao X (2011) Software module clustering as a multi-objective search problem. IEEE Trans Softw Eng 37(2):264–282CrossRefGoogle Scholar
  38. Ricca F, Pianta E, Tonella P, Girardi C (2008) Improving web site understanding with keyword-based clustering. J Softw Maint Evol 20(1):1–29. doi: 10.1002/smr.v20:1 CrossRefGoogle Scholar
  39. Scanniello G, D’Amico A, D’Amico C, D’Amico T (2010) Using the kleinberg algorithm and vector space model for software system clustering. In: ICPC, pp 180–189Google Scholar
  40. Seng O, Bauer M, Biehl M, Pache G (2005) Search-based improvement of subsystem decompositions. In: GECCO, pp 1045–1051Google Scholar
  41. Seng O, Stammel J, Burkhart D (2006) Search-based determination of refactorings for improving the class structure of object-oriented systems. In: Genetic and evolutionary computation conference, pp 1909–1916Google Scholar
  42. Shaw SC, Goldstein M, Munro M, Burd E (2003) Moral dominance relations for program comprehension. IEEE Trans Softw Eng 29(9):851–863CrossRefGoogle Scholar
  43. Shtern M, Tzerpos V (2009) Methods for selecting and improving software clustering algorithms. In: Proceedings of 17th IEEE international conference on program comprehension. IEEE CS Press, Vancouver, Canada, pp 248–252CrossRefGoogle Scholar
  44. Stevens WP, Myers GJ, Constantine LL (1974) Structured design. IBM Syst J 13(2):115–139CrossRefGoogle Scholar
  45. Tonella P (2001) Concept analysis for module restructuring. IEEE Trans Softw Eng 27(4):351–363CrossRefGoogle Scholar
  46. van Deursen A, Kuipers T (1999) Identifying objects using cluster and concept analysis. In: Proceedings of 21st international conference on software engineering. ACM Press, Los Angeles, California, USA, pp 246–255CrossRefGoogle Scholar
  47. Wiggerts TA (1997) Using clustering algorithms in legacy systems remodularization. In: WCRE ’97: proceedings of the fourth working conference on reverse engineering (WCRE ’97). IEEE Computer Society, p 33Google Scholar
  48. Wu J, Hassan AE, Holt RC (2005) Comparison of clustering algorithms in the context of software evolution. In: ICSM, pp 525–535Google Scholar
  49. Yin RK (2003) Case study research: design and methods, 3rd edn. SAGE PublicationsGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Gabriele Bavota
    • 1
  • Andrea De Lucia
    • 1
  • Andrian Marcus
    • 2
  • Rocco Oliveto
    • 3
    Email author
  1. 1.University of SalernoFiscianoItaly
  2. 2.Wayne State UniversityDetroitUSA
  3. 3.University of MolisePescheItaly

Personalised recommendations