Skip to main content
Log in

Multi-label learning for identifying co-occurring class code smells

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Code smell identification is crucial in software maintenance. The existing literature mostly focuses on single code smell identification. However, in practice, a software artefact typically exhibits multiple code smells simultaneously where their diffuseness has been assessed, suggesting that 59% of smelly classes are affected by more than one smell. So to meet this complexity found in real-world projects, we propose a multi-label learning-based approach to identify eight code smells at the class-level, i.e. the most sever software artefacts that need to be prioritized in the refactoring process. In our experiments, we have used 12 algorithms from different multi-label learning methods across 30 open-source Java projects, where significant findings have been presented. We have explored co-occurrences between class code smells and examined the impact of correlations on prediction results. Additionally, we assess multi-label learning methods to compare data adaptation versus algorithm adaptation. Our findings highlight the effectiveness of the Ensemble of Classifier Chains and Binary Relevance in achieving high-performance results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The dataset used in this study is publicly available and was obtained from the research study referenced in [3].

Notes

  1. https://github.com/mauricioaniche/ck

  2. https://github.com

  3. https://sourceforge.net/

References

  1. Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Pearson Education India

  2. Kaur A (2020) A systematic literature review on empirical analysis of the relationship between code smells and software quality attributes. Arch Comput Methods Eng 27(4):1267–1296

    Article  MathSciNet  Google Scholar 

  3. Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018) On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empir Softw Eng 23(3):1188–1221. https://doi.org/10.1007/s10664-017-9535-z

    Article  Google Scholar 

  4. Soh Z, Yamashita A, Khomh F, Guéhéneuc YG (2016) Do code smells impact the effort of different maintenance programming activities? In: IEEE 23rd international conference on software analysis, evolution, and reengineering, vol 1, pp 393–402

  5. Abbes M, Khomh F, Gueheneuc Y-G, Antoniol G (2011) An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 2011 15Th European conference on software maintenance and reengineering, pp 181–190. IEEE

  6. Politowski C, Khomh F, Romano S, Scanniello G, Petrillo F, Guéhéneuc Y-G, Maiga A (2020) A large scale empirical study of the impact of spaghetti code and blob anti-patterns on program comprehension. Inf Softw Technol 122:106278

    Article  Google Scholar 

  7. Sjøberg DI, Yamashita A, Anda BC, Mockus A, Dybå T (2012) Quantifying the effect of code smells on maintenance effort. IEEE Trans Softw Eng 39(8):1144–1156

    Article  Google Scholar 

  8. Khomh F, Di Penta M, Gueheneuc Y-G (2009) An exploratory study of the impact of code smells on software change-proneness. In: 2009 16th working conference on reverse engineering, pp 75–84. IEEE

  9. Cunningham W (1992) The wycash portfolio management system. ACM SIGPLAN OOPS Messenger 4(2):29–30

    Article  Google Scholar 

  10. Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088

    Article  Google Scholar 

  11. Dewangan S, Rao RS, Chowdhuri SR, Gupta M (2023) Severity classification of code smells using machine-learning methods. SN Comput Sci 4(5):564

    Article  Google Scholar 

  12. Fontana FA, Zanoni M (2017) Code smell severity classification using machine learning techniques. Knowl-Based Syst 128:43–58

    Article  Google Scholar 

  13. Moha N, Gueheneuc YG, Duchien L, Meur AFL (2010) DECOR: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36. https://doi.org/10.1109/TSE.2009.50

    Article  Google Scholar 

  14. Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: Proceedings of the 28th IEEE/ACM international conference on automated software engineering, pp 268–278. IEEE Press

  15. Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A (2016) Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng 21(3):1143–1191

    Article  Google Scholar 

  16. Hadj-Kacem M, Bouassida N (2018) A hybrid approach to detect code smells using deep learning. In: Proceedings of the 13th international conference on evaluation of novel approaches to software engineering, pp 137–146. SciTePress

  17. Sharma T, Efstathiou V, Louridas P, Spinellis D (2021) Code smell detection by deep direct-learning and transfer-learning. J Syst Softw 176:110936

    Article  Google Scholar 

  18. Mens T, Tourwe T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126–139. https://doi.org/10.1109/TSE.2004.1265817

    Article  Google Scholar 

  19. Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018) A large-scale empirical study on the lifecycle of code smell co-occurrences. Inf Softw Technol 99:1–10

    Article  Google Scholar 

  20. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Wareh Min (IJDWM) 3(3):1–13

    Article  Google Scholar 

  21. Kreimer J (2005) Adaptive detection of design flaws. Electron Not Theor Comput Sci 141(4):117–136

    Article  Google Scholar 

  22. Khomh F, Vaucher S, Guéhéneuc YG, Sahraoui H (2009) A Bayesian approach for the detection of code and design smells. In: Ninth international conference on quality software, pp 305–314 https://doi.org/10.1109/QSIC.2009.47

  23. Khomh F, Vaucher S, Yann-Gaël G, Sahraoui H (2011) BDTEX: a GQM-based Bayesian approach for the detection of antipatterns. J Syst Softw 84(4):559–572

    Article  Google Scholar 

  24. Hassaine S, Khomh F, Gueheneuc YG, Hamel S (2010) IDS: an immune-inspired approach for the detection of software design smells. In: Seventh international conference on the quality of information and communications technology, pp 343–348 https://doi.org/10.1109/QUATIC.2010.61

  25. Oliveto R, Khomh F, Antoniol G, Gueheneuc YG (2010) Numerical signatures of antipatterns: an approach based on B-splines. In: 14th European conference on software maintenance and reengineering, pp 248–251. https://doi.org/10.1109/CSMR.2010.47

  26. Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc YG, Aimeur E (2012) SMURF: a SVM-based incremental anti-pattern detection approach. In: 19th working conference on reverse engineering, pp 466–475. https://doi.org/10.1109/WCRE.2012.56

  27. Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc YG, Antoniol G, Aïmeur E (2012) Support vector machines for anti-pattern detection. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering, pp 278–281. https://doi.org/10.1145/2351676.2351723

  28. Dewangan S, Rao RS, Mishra A, Gupta M (2021) A novel approach for code smell detection: an empirical study. IEEE Access 9:162869–162883

    Article  Google Scholar 

  29. Barbez A, Khomh F, Guéhéneuc Y-G (2020) A machine-learning based ensemble method for anti-patterns detection. J Syst Softw 161:110486

    Article  Google Scholar 

  30. Guggulothu T, Moiz SA (2020) Code smell detection using multi-label classification approach. Softw Qual J 28(3):1063–1086

    Article  Google Scholar 

  31. Kiyak EO, Birant D, Birant KU (2019) Comparison of multi-label classification algorithms for code smell detection. In: 2019 3rd international symposium on multidisciplinary studies and innovative technologies (ISMSIT), pp 1–6. IEEE

  32. Boutaib S, Elarbi M, Bechikh S, Palomba F, Said LB (2022) A bi-level evolutionary approach for the multi-label detection of smelly classes. In: Proceedings of the genetic and evolutionary computation conference companion, pp 782–785

  33. Li Y, Zhang X (2022) Multi-label code smell detection with hybrid model based on deep learning. In: SEKE, pp 42–47

  34. Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  35. Azeem MI, Palomba F, Shi L, Wang Q (2019) Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Inf Softw Technol 108:115–138

    Article  Google Scholar 

  36. Aniche M (2015) Java code metrics calculator (ck). https://github.com/mauricioaniche

  37. Trindade RPF, Silva Bigonha MA, Ferreira KAM (2020) Oracles of bad smells: a systematic literature review. In: Proceedings of the 34th Brazilian symposium on software engineering, pp 62–71. Association for Computing Machinery

  38. Zakeri-Nasrabadi M, Parsa S, Esmaili E, Palomba F (2023) A systematic literature review on the code smells datasets and validation mechanisms. ACM J Comput Cult Herit 55(13s):1–48

    Google Scholar 

  39. Madeyski L, Lewowski T (2020) MLCQ: Industry-relevant code smell data set. In: Proceedings of the evaluation and assessment in software engineering. EASE ’20, pp 342–347. Association for Computing Machinery. https://doi.org/10.1145/3383219.3383264

  40. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333

    Article  MathSciNet  Google Scholar 

  41. Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. Data mining and knowledge discovery handbook, pp 667–685

  42. Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of 2008 New Zealand computer science research student conference (NZCSRS 2008), vol 143150, p 41

  43. Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08)

  44. Tsoumakas G, Katakis I, Vlahavas I (2011) Random k-labelsets for multi-label classification. IEEE Trans Knowl Data Eng 23(7):1079–1089

    Article  Google Scholar 

  45. Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: 2008 Eighth IEEE international conference on data mining, pp 995–1000. IEEE

  46. Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168

    Article  Google Scholar 

  47. Zhang ML, Zhou ZH (2006) Multi-label neural networks with applications to functional genomics and text categorization. IEEE Trans on Knowl Data Eng 18:1338–1351

    Article  Google Scholar 

  48. Spyromitros E, Tsoumakas G, Vlahavas I (2008) An empirical study of lazy multilabel classification algorithms. In: Proceedings of 5th hellenic conference on artificial intelligence (SETN 2008)

  49. Cheng W, Hullermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225

    Article  Google Scholar 

  50. Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    Article  Google Scholar 

  51. Charte F, Rivera AJ, Jesus MJ, Herrera F (2015) Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163:3–16. https://doi.org/10.1016/j.neucom.2014.08.091

    Article  Google Scholar 

  52. Charte F, Charte D (2015) Working with multilabel datasets in R: the mldr package. R J 7(2):149–162

    Article  Google Scholar 

  53. Charte F, Rivera AJ, Jesus MJ, Herrera F (2015) MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397

    Article  Google Scholar 

  54. Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: a java library for multi-label learning. J Mach Learn Res 12:2411–2414

    MathSciNet  Google Scholar 

  55. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  56. Zhang M-L, Zhou Z-H (2013) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837

    Article  Google Scholar 

  57. Gibaja E, Ventura S (2014) Multi-label learning: a review of the state of the art and ongoing research. Wiley Interdiscip Rev Data Min Knowl Discov 4(6):411–444

    Article  Google Scholar 

  58. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064

    Article  Google Scholar 

Download references

Funding

This study was not funded.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mouna Hadj-Kacem.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hadj-Kacem, M., Bouassida, N. Multi-label learning for identifying co-occurring class code smells. Computing (2024). https://doi.org/10.1007/s00607-024-01294-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00607-024-01294-x

Keywords

Mathematics Subject Classification

Navigation