Multi-label learning for identifying co-occurring class code smells

Hadj-Kacem, Mouna; Bouassida, Nadia

doi:10.1007/s00607-024-01294-x

Multi-label learning for identifying co-occurring class code smells

Regular Paper
Published: 27 May 2024

(2024)
Cite this article

Computing Aims and scope Submit manuscript

Mouna Hadj-Kacem¹ &
Nadia Bouassida¹

30 Accesses
Explore all metrics

Abstract

Code smell identification is crucial in software maintenance. The existing literature mostly focuses on single code smell identification. However, in practice, a software artefact typically exhibits multiple code smells simultaneously where their diffuseness has been assessed, suggesting that 59% of smelly classes are affected by more than one smell. So to meet this complexity found in real-world projects, we propose a multi-label learning-based approach to identify eight code smells at the class-level, i.e. the most sever software artefacts that need to be prioritized in the refactoring process. In our experiments, we have used 12 algorithms from different multi-label learning methods across 30 open-source Java projects, where significant findings have been presented. We have explored co-occurrences between class code smells and examined the impact of correlations on prediction results. Additionally, we assess multi-label learning methods to compare data adaptation versus algorithm adaptation. Our findings highlight the effectiveness of the Ensemble of Classifier Chains and Binary Relevance in achieving high-performance results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Code smell detection using multi-label classification approach

Article 04 April 2020

Categorical Analysis of Code Smell Detection Using Machine Learning Algorithms

Severity Classification of Code Smells Using Machine-Learning Methods

Article 29 July 2023

Data availability

The dataset used in this study is publicly available and was obtained from the research study referenced in [3].

Notes

References

Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Pearson Education India
Kaur A (2020) A systematic literature review on empirical analysis of the relationship between code smells and software quality attributes. Arch Comput Methods Eng 27(4):1267–1296
Article MathSciNet Google Scholar
Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018) On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empir Softw Eng 23(3):1188–1221. https://doi.org/10.1007/s10664-017-9535-z
Article Google Scholar
Soh Z, Yamashita A, Khomh F, Guéhéneuc YG (2016) Do code smells impact the effort of different maintenance programming activities? In: IEEE 23rd international conference on software analysis, evolution, and reengineering, vol 1, pp 393–402
Abbes M, Khomh F, Gueheneuc Y-G, Antoniol G (2011) An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 2011 15Th European conference on software maintenance and reengineering, pp 181–190. IEEE
Politowski C, Khomh F, Romano S, Scanniello G, Petrillo F, Guéhéneuc Y-G, Maiga A (2020) A large scale empirical study of the impact of spaghetti code and blob anti-patterns on program comprehension. Inf Softw Technol 122:106278
Article Google Scholar
Sjøberg DI, Yamashita A, Anda BC, Mockus A, Dybå T (2012) Quantifying the effect of code smells on maintenance effort. IEEE Trans Softw Eng 39(8):1144–1156
Article Google Scholar
Khomh F, Di Penta M, Gueheneuc Y-G (2009) An exploratory study of the impact of code smells on software change-proneness. In: 2009 16th working conference on reverse engineering, pp 75–84. IEEE
Cunningham W (1992) The wycash portfolio management system. ACM SIGPLAN OOPS Messenger 4(2):29–30
Article Google Scholar
Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088
Article Google Scholar
Dewangan S, Rao RS, Chowdhuri SR, Gupta M (2023) Severity classification of code smells using machine-learning methods. SN Comput Sci 4(5):564
Article Google Scholar
Fontana FA, Zanoni M (2017) Code smell severity classification using machine learning techniques. Knowl-Based Syst 128:43–58
Article Google Scholar
Moha N, Gueheneuc YG, Duchien L, Meur AFL (2010) DECOR: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36. https://doi.org/10.1109/TSE.2009.50
Article Google Scholar
Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: Proceedings of the 28th IEEE/ACM international conference on automated software engineering, pp 268–278. IEEE Press
Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A (2016) Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng 21(3):1143–1191
Article Google Scholar
Hadj-Kacem M, Bouassida N (2018) A hybrid approach to detect code smells using deep learning. In: Proceedings of the 13th international conference on evaluation of novel approaches to software engineering, pp 137–146. SciTePress
Sharma T, Efstathiou V, Louridas P, Spinellis D (2021) Code smell detection by deep direct-learning and transfer-learning. J Syst Softw 176:110936
Article Google Scholar
Mens T, Tourwe T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126–139. https://doi.org/10.1109/TSE.2004.1265817
Article Google Scholar
Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018) A large-scale empirical study on the lifecycle of code smell co-occurrences. Inf Softw Technol 99:1–10
Article Google Scholar
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Wareh Min (IJDWM) 3(3):1–13
Article Google Scholar
Kreimer J (2005) Adaptive detection of design flaws. Electron Not Theor Comput Sci 141(4):117–136
Article Google Scholar
Khomh F, Vaucher S, Guéhéneuc YG, Sahraoui H (2009) A Bayesian approach for the detection of code and design smells. In: Ninth international conference on quality software, pp 305–314 https://doi.org/10.1109/QSIC.2009.47
Khomh F, Vaucher S, Yann-Gaël G, Sahraoui H (2011) BDTEX: a GQM-based Bayesian approach for the detection of antipatterns. J Syst Softw 84(4):559–572
Article Google Scholar
Hassaine S, Khomh F, Gueheneuc YG, Hamel S (2010) IDS: an immune-inspired approach for the detection of software design smells. In: Seventh international conference on the quality of information and communications technology, pp 343–348 https://doi.org/10.1109/QUATIC.2010.61
Oliveto R, Khomh F, Antoniol G, Gueheneuc YG (2010) Numerical signatures of antipatterns: an approach based on B-splines. In: 14th European conference on software maintenance and reengineering, pp 248–251. https://doi.org/10.1109/CSMR.2010.47
Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc YG, Aimeur E (2012) SMURF: a SVM-based incremental anti-pattern detection approach. In: 19th working conference on reverse engineering, pp 466–475. https://doi.org/10.1109/WCRE.2012.56
Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc YG, Antoniol G, Aïmeur E (2012) Support vector machines for anti-pattern detection. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering, pp 278–281. https://doi.org/10.1145/2351676.2351723
Dewangan S, Rao RS, Mishra A, Gupta M (2021) A novel approach for code smell detection: an empirical study. IEEE Access 9:162869–162883
Article Google Scholar
Barbez A, Khomh F, Guéhéneuc Y-G (2020) A machine-learning based ensemble method for anti-patterns detection. J Syst Softw 161:110486
Article Google Scholar
Guggulothu T, Moiz SA (2020) Code smell detection using multi-label classification approach. Softw Qual J 28(3):1063–1086
Article Google Scholar
Kiyak EO, Birant D, Birant KU (2019) Comparison of multi-label classification algorithms for code smell detection. In: 2019 3rd international symposium on multidisciplinary studies and innovative technologies (ISMSIT), pp 1–6. IEEE
Boutaib S, Elarbi M, Bechikh S, Palomba F, Said LB (2022) A bi-level evolutionary approach for the multi-label detection of smelly classes. In: Proceedings of the genetic and evolutionary computation conference companion, pp 782–785
Li Y, Zhang X (2022) Multi-label code smell detection with hybrid model based on deep learning. In: SEKE, pp 42–47
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Article Google Scholar
Azeem MI, Palomba F, Shi L, Wang Q (2019) Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Inf Softw Technol 108:115–138
Article Google Scholar
Aniche M (2015) Java code metrics calculator (ck). https://github.com/mauricioaniche
Trindade RPF, Silva Bigonha MA, Ferreira KAM (2020) Oracles of bad smells: a systematic literature review. In: Proceedings of the 34th Brazilian symposium on software engineering, pp 62–71. Association for Computing Machinery
Zakeri-Nasrabadi M, Parsa S, Esmaili E, Palomba F (2023) A systematic literature review on the code smells datasets and validation mechanisms. ACM J Comput Cult Herit 55(13s):1–48
Google Scholar
Madeyski L, Lewowski T (2020) MLCQ: Industry-relevant code smell data set. In: Proceedings of the evaluation and assessment in software engineering. EASE ’20, pp 342–347. Association for Computing Machinery. https://doi.org/10.1145/3383219.3383264
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333
Article MathSciNet Google Scholar
Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. Data mining and knowledge discovery handbook, pp 667–685
Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of 2008 New Zealand computer science research student conference (NZCSRS 2008), vol 143150, p 41
Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08)
Tsoumakas G, Katakis I, Vlahavas I (2011) Random k-labelsets for multi-label classification. IEEE Trans Knowl Data Eng 23(7):1079–1089
Article Google Scholar
Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: 2008 Eighth IEEE international conference on data mining, pp 995–1000. IEEE
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168
Article Google Scholar
Zhang ML, Zhou ZH (2006) Multi-label neural networks with applications to functional genomics and text categorization. IEEE Trans on Knowl Data Eng 18:1338–1351
Article Google Scholar
Spyromitros E, Tsoumakas G, Vlahavas I (2008) An empirical study of lazy multilabel classification algorithms. In: Proceedings of 5th hellenic conference on artificial intelligence (SETN 2008)
Cheng W, Hullermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225
Article Google Scholar
Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Article Google Scholar
Charte F, Rivera AJ, Jesus MJ, Herrera F (2015) Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163:3–16. https://doi.org/10.1016/j.neucom.2014.08.091
Article Google Scholar
Charte F, Charte D (2015) Working with multilabel datasets in R: the mldr package. R J 7(2):149–162
Article Google Scholar
Charte F, Rivera AJ, Jesus MJ, Herrera F (2015) MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397
Article Google Scholar
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: a java library for multi-label learning. J Mach Learn Res 12:2411–2414
MathSciNet Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Article Google Scholar
Zhang M-L, Zhou Z-H (2013) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Article Google Scholar
Gibaja E, Ventura S (2014) Multi-label learning: a review of the state of the art and ongoing research. Wiley Interdiscip Rev Data Min Knowl Discov 4(6):411–444
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064
Article Google Scholar

Download references

Funding

This study was not funded.

Author information

Authors and Affiliations

Mir@cl Laboratory, Sfax University, Sfax, Tunisia
Mouna Hadj-Kacem & Nadia Bouassida

Authors

Mouna Hadj-Kacem
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Bouassida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mouna Hadj-Kacem.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hadj-Kacem, M., Bouassida, N. Multi-label learning for identifying co-occurring class code smells. Computing (2024). https://doi.org/10.1007/s00607-024-01294-x

Download citation

Received: 04 December 2023
Accepted: 08 May 2024
Published: 27 May 2024
DOI: https://doi.org/10.1007/s00607-024-01294-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-label learning for identifying co-occurring class code smells

Abstract

Access this article

Similar content being viewed by others

Code smell detection using multi-label classification approach

Categorical Analysis of Code Smell Detection Using Machine Learning Algorithms

Severity Classification of Code Smells Using Machine-Learning Methods

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Multi-label learning for identifying co-occurring class code smells

Abstract

Access this article

Similar content being viewed by others

Code smell detection using multi-label classification approach

Categorical Analysis of Code Smell Detection Using Machine Learning Algorithms

Severity Classification of Code Smells Using Machine-Learning Methods

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation