Abstract
Code smell identification is crucial in software maintenance. The existing literature mostly focuses on single code smell identification. However, in practice, a software artefact typically exhibits multiple code smells simultaneously where their diffuseness has been assessed, suggesting that 59% of smelly classes are affected by more than one smell. So to meet this complexity found in real-world projects, we propose a multi-label learning-based approach to identify eight code smells at the class-level, i.e. the most sever software artefacts that need to be prioritized in the refactoring process. In our experiments, we have used 12 algorithms from different multi-label learning methods across 30 open-source Java projects, where significant findings have been presented. We have explored co-occurrences between class code smells and examined the impact of correlations on prediction results. Additionally, we assess multi-label learning methods to compare data adaptation versus algorithm adaptation. Our findings highlight the effectiveness of the Ensemble of Classifier Chains and Binary Relevance in achieving high-performance results.
Similar content being viewed by others
Data availability
The dataset used in this study is publicly available and was obtained from the research study referenced in [3].
References
Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Pearson Education India
Kaur A (2020) A systematic literature review on empirical analysis of the relationship between code smells and software quality attributes. Arch Comput Methods Eng 27(4):1267–1296
Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018) On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empir Softw Eng 23(3):1188–1221. https://doi.org/10.1007/s10664-017-9535-z
Soh Z, Yamashita A, Khomh F, Guéhéneuc YG (2016) Do code smells impact the effort of different maintenance programming activities? In: IEEE 23rd international conference on software analysis, evolution, and reengineering, vol 1, pp 393–402
Abbes M, Khomh F, Gueheneuc Y-G, Antoniol G (2011) An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 2011 15Th European conference on software maintenance and reengineering, pp 181–190. IEEE
Politowski C, Khomh F, Romano S, Scanniello G, Petrillo F, Guéhéneuc Y-G, Maiga A (2020) A large scale empirical study of the impact of spaghetti code and blob anti-patterns on program comprehension. Inf Softw Technol 122:106278
Sjøberg DI, Yamashita A, Anda BC, Mockus A, Dybå T (2012) Quantifying the effect of code smells on maintenance effort. IEEE Trans Softw Eng 39(8):1144–1156
Khomh F, Di Penta M, Gueheneuc Y-G (2009) An exploratory study of the impact of code smells on software change-proneness. In: 2009 16th working conference on reverse engineering, pp 75–84. IEEE
Cunningham W (1992) The wycash portfolio management system. ACM SIGPLAN OOPS Messenger 4(2):29–30
Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088
Dewangan S, Rao RS, Chowdhuri SR, Gupta M (2023) Severity classification of code smells using machine-learning methods. SN Comput Sci 4(5):564
Fontana FA, Zanoni M (2017) Code smell severity classification using machine learning techniques. Knowl-Based Syst 128:43–58
Moha N, Gueheneuc YG, Duchien L, Meur AFL (2010) DECOR: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36. https://doi.org/10.1109/TSE.2009.50
Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: Proceedings of the 28th IEEE/ACM international conference on automated software engineering, pp 268–278. IEEE Press
Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A (2016) Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng 21(3):1143–1191
Hadj-Kacem M, Bouassida N (2018) A hybrid approach to detect code smells using deep learning. In: Proceedings of the 13th international conference on evaluation of novel approaches to software engineering, pp 137–146. SciTePress
Sharma T, Efstathiou V, Louridas P, Spinellis D (2021) Code smell detection by deep direct-learning and transfer-learning. J Syst Softw 176:110936
Mens T, Tourwe T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126–139. https://doi.org/10.1109/TSE.2004.1265817
Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018) A large-scale empirical study on the lifecycle of code smell co-occurrences. Inf Softw Technol 99:1–10
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Wareh Min (IJDWM) 3(3):1–13
Kreimer J (2005) Adaptive detection of design flaws. Electron Not Theor Comput Sci 141(4):117–136
Khomh F, Vaucher S, Guéhéneuc YG, Sahraoui H (2009) A Bayesian approach for the detection of code and design smells. In: Ninth international conference on quality software, pp 305–314 https://doi.org/10.1109/QSIC.2009.47
Khomh F, Vaucher S, Yann-Gaël G, Sahraoui H (2011) BDTEX: a GQM-based Bayesian approach for the detection of antipatterns. J Syst Softw 84(4):559–572
Hassaine S, Khomh F, Gueheneuc YG, Hamel S (2010) IDS: an immune-inspired approach for the detection of software design smells. In: Seventh international conference on the quality of information and communications technology, pp 343–348 https://doi.org/10.1109/QUATIC.2010.61
Oliveto R, Khomh F, Antoniol G, Gueheneuc YG (2010) Numerical signatures of antipatterns: an approach based on B-splines. In: 14th European conference on software maintenance and reengineering, pp 248–251. https://doi.org/10.1109/CSMR.2010.47
Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc YG, Aimeur E (2012) SMURF: a SVM-based incremental anti-pattern detection approach. In: 19th working conference on reverse engineering, pp 466–475. https://doi.org/10.1109/WCRE.2012.56
Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc YG, Antoniol G, Aïmeur E (2012) Support vector machines for anti-pattern detection. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering, pp 278–281. https://doi.org/10.1145/2351676.2351723
Dewangan S, Rao RS, Mishra A, Gupta M (2021) A novel approach for code smell detection: an empirical study. IEEE Access 9:162869–162883
Barbez A, Khomh F, Guéhéneuc Y-G (2020) A machine-learning based ensemble method for anti-patterns detection. J Syst Softw 161:110486
Guggulothu T, Moiz SA (2020) Code smell detection using multi-label classification approach. Softw Qual J 28(3):1063–1086
Kiyak EO, Birant D, Birant KU (2019) Comparison of multi-label classification algorithms for code smell detection. In: 2019 3rd international symposium on multidisciplinary studies and innovative technologies (ISMSIT), pp 1–6. IEEE
Boutaib S, Elarbi M, Bechikh S, Palomba F, Said LB (2022) A bi-level evolutionary approach for the multi-label detection of smelly classes. In: Proceedings of the genetic and evolutionary computation conference companion, pp 782–785
Li Y, Zhang X (2022) Multi-label code smell detection with hybrid model based on deep learning. In: SEKE, pp 42–47
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Azeem MI, Palomba F, Shi L, Wang Q (2019) Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Inf Softw Technol 108:115–138
Aniche M (2015) Java code metrics calculator (ck). https://github.com/mauricioaniche
Trindade RPF, Silva Bigonha MA, Ferreira KAM (2020) Oracles of bad smells: a systematic literature review. In: Proceedings of the 34th Brazilian symposium on software engineering, pp 62–71. Association for Computing Machinery
Zakeri-Nasrabadi M, Parsa S, Esmaili E, Palomba F (2023) A systematic literature review on the code smells datasets and validation mechanisms. ACM J Comput Cult Herit 55(13s):1–48
Madeyski L, Lewowski T (2020) MLCQ: Industry-relevant code smell data set. In: Proceedings of the evaluation and assessment in software engineering. EASE ’20, pp 342–347. Association for Computing Machinery. https://doi.org/10.1145/3383219.3383264
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333
Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. Data mining and knowledge discovery handbook, pp 667–685
Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of 2008 New Zealand computer science research student conference (NZCSRS 2008), vol 143150, p 41
Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08)
Tsoumakas G, Katakis I, Vlahavas I (2011) Random k-labelsets for multi-label classification. IEEE Trans Knowl Data Eng 23(7):1079–1089
Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: 2008 Eighth IEEE international conference on data mining, pp 995–1000. IEEE
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168
Zhang ML, Zhou ZH (2006) Multi-label neural networks with applications to functional genomics and text categorization. IEEE Trans on Knowl Data Eng 18:1338–1351
Spyromitros E, Tsoumakas G, Vlahavas I (2008) An empirical study of lazy multilabel classification algorithms. In: Proceedings of 5th hellenic conference on artificial intelligence (SETN 2008)
Cheng W, Hullermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225
Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Charte F, Rivera AJ, Jesus MJ, Herrera F (2015) Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163:3–16. https://doi.org/10.1016/j.neucom.2014.08.091
Charte F, Charte D (2015) Working with multilabel datasets in R: the mldr package. R J 7(2):149–162
Charte F, Rivera AJ, Jesus MJ, Herrera F (2015) MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: a java library for multi-label learning. J Mach Learn Res 12:2411–2414
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Zhang M-L, Zhou Z-H (2013) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Gibaja E, Ventura S (2014) Multi-label learning: a review of the state of the art and ongoing research. Wiley Interdiscip Rev Data Min Knowl Discov 4(6):411–444
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064
Funding
This study was not funded.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hadj-Kacem, M., Bouassida, N. Multi-label learning for identifying co-occurring class code smells. Computing (2024). https://doi.org/10.1007/s00607-024-01294-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00607-024-01294-x