Abstract
Design smells are symptoms of poorly designed solutions that may result in several maintenance issues. While various approaches, including traditional machine learning methods, have been proposed and shown to be effective in detecting design smells, they require extensive manually labeled data, which is expensive and challenging to scale. To leverage the vast amount of data that is now accessible, unsupervised semantic feature learning, or learning without requiring manual annotation labor, is essential. The goal of this paper is to propose a design smell detection method that is based on self-supervised learning. We propose Model Representation with Transformers (MoRT) to learn the UML class diagram features by training Transformers to recognize masked keywords. We empirically show how effective the defined proxy task is at learning semantic and structural properties. We thoroughly assess MoRT using four model smells: the Blob, Functional Decomposition, Spaghetti Code, and Swiss Army Knife. Furthermore, we compare our findings with supervised learning and feature-based methods. Finally, we ran a cross-project experiment to assess the generalizability of our approach. Results show that MoRT is highly effective in detecting design smells.
Similar content being viewed by others
Notes
References
AbuHassan, A., Alshayeb, M., Ghouti, L.: Software smell detection techniques: a systematic literature review. J. Softw. Evol. Process 33(3), e2320 (2021). https://doi.org/10.1002/smr.2320
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, in KDD ‘19. New York, NY, USA: Association for Computing Machinery, Jul. 2019, pp. 2623–2631. https://doi.org/10.1145/3292500.3330701
Alalfi, M.H., Antony, E.P., Cordy, J.R.: An approach to clone detection in sequence diagrams and its application to security analysis. Softw. Syst. Model. 17(4), 1287–1309 (2018). https://doi.org/10.1007/s10270-016-0557-6
Alazba, A., Aljamaan, H.: Code smell detection using feature selection and stacking ensemble: an empirical investigation. Inf. Softw. Technol. 138, 106648 (2021). https://doi.org/10.1016/j.infsof.2021.106648
Alazba, A., Aljamaan, H., Alshayeb, M.: Deep learning approaches for bad smell detection: a systematic literature review. Empir. Softw. Eng. 28(3), 77 (2023). https://doi.org/10.1007/s10664-023-10312-z
Alazba, A., Aljamaan, H., Alshayeb, M.: CoRT: transformer-based code representations with self-supervision by predicting reserved words for code smell detection. Empir. Softw. Eng. J. (2024)
Al-Shaaby, A., Aljamaan, H., Alshayeb, M.: Bad smell detection using machine learning techniques: a systematic literature review. Arab. J. Sci. Eng. 45(4), 2341–2369 (2020). https://doi.org/10.1007/s13369-019-04311-w
Alshayeb, M., Mumtaz, H., Mahmood, S., Niazi, M.: Improving the security of UML sequence diagram using genetic algorithm. IEEE Access 8, 62738–62761 (2020). https://doi.org/10.1109/ACCESS.2020.2981742
Barriga Rodriguez, A., Bettini, L., Iovino, L., Rutle, A., Heldal, R.: Addressing the trade off between smells and quality when refactoring class diagrams. J. Object Technol. 20, 1 (2021). https://doi.org/10.5381/jot.2021.20.3.a1
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv181004805 Cs, May 2019, Accessed: Mar. 07, 2022. [Online]. Available: http://arxiv.org/abs/1810.04805
Fourati, R., Bouassida, N., Abdallah, H.B.: A metric-based approach for anti-pattern detection in UML designs. In: Lee, R. (ed) Computer and Information Science 2011, in Studies in Computational Intelligence. Berlin, Heidelberg: Springer, 2011, pp. 17–33https://doi.org/10.1007/978-3-642-21378-6_2
Ghannem, A., El Boussaidi, G., Kessentini, M.: On the use of design defect examples to detect model refactoring opportunities. Softw. Qual. J. 24(4), 947–965 (2016). https://doi.org/10.1007/s11219-015-9271-9
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. arXiv, Mar. 20, 2018. https://doi.org/10.48550/arXiv.1803.07728
Hebig, R., Quang, T.H., Chaudron, M.R.V., Robles, G., Fernandez, M.A.: The quest for open source projects that use UML: mining GitHub. In: Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, in MODELS ‘16. New York, NY, USA: Association for Computing Machinery, 2016, pp. 173–183. https://doi.org/10.1145/2976767.2976778
Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D., Makedon, F.: A survey on contrastive self-supervised learning. Technologies 9(1), 1 (2021). https://doi.org/10.3390/technologies9010002
Khomh, F., Vaucher, S., Guéhéneuc, Y.-G., Sahraoui, H.: BDTEX: a GQM-based Bayesian approach for the detection of antipatterns. J. Syst. Softw. 84(4), 559–572 (2011). https://doi.org/10.1016/j.jss.2010.11.921
Lim, T.-S., Loh, W.-Y., Shih, Y.-S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000). https://doi.org/10.1023/A:1007608224229
Liu, H., Jin, J., Xu, Z., Bu, Y., Zou, Y., Zhang, L.: Deep learning based code smell detection. IEEE Trans. Softw. Eng. (2019). https://doi.org/10.1109/TSE.2019.2936376
Liu, X. et al., Self-supervised learning: generative or contrastive. ArXiv200608218 Cs Stat, Mar. 2021, Accessed: Apr. 26, 2021. [Online]. Available: http://arxiv.org/abs/2006.08218
López, J.A.H., Cánovas Izquierdo, J.L., Cuadrado, J.S.: ModelSet: a dataset for machine learning in model-driven engineering. Softw. Syst. Model. 21(3), 967–986 (2022). https://doi.org/10.1007/s10270-021-00929-3
Maddeh, M., Ayouni, S., Alyahya, S., Hajjej, F.: Decision tree-based design defects detection. IEEE Access 9, 71606–71614 (2021). https://doi.org/10.1109/ACCESS.2021.3078724
Maddeh, M., Ayouni, S.: Extracting and modeling design defects using gradual rules and UML profile. In: Maddeh, M. (ed.) Computer Science and its Applications, in IFIP Advances in Information and Communication Technology, pp. 574–583. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-19578-0_47
Maneerat, N., Muenchaisri, P.: Bad-smell prediction from software design model using machine learning techniques. In: 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE), May 2011, pp. 331–336. https://doi.org/10.1109/JCSSE.2011.5930143
Misbhauddin, M., Alshayeb, M.: UML model refactoring: a systematic literature review. Empir. Softw. Eng. 20(1), 206–251 (2015). https://doi.org/10.1007/s10664-013-9283-7
Misbhauddin, M., Alshayeb, M.: An integrated metamodel-based approach to software model refactoring. Softw. Syst. Model. 18(3), 2013–2050 (2019). https://doi.org/10.1007/s10270-017-0628-3
Moha, N., Gueheneuc, Y.-G., Duchien, L., Le Meur, A.-F.: DECOR: a method for the specification and detection of code and design smells. IEEE Trans. Softw. Eng. 36(1), 20–36 (2010). https://doi.org/10.1109/TSE.2009.50
Mumtaz, H., Alshayeb, M., Mahmood, S., Niazi, M.: A survey on UML model smells detection techniques for software refactoring. J. Softw. Evol. Process 31(3), e2154 (2019). https://doi.org/10.1002/smr.2154
Myung, I.J.: The importance of complexity in model selection. J. Math. Psychol. 44(1), 190–204 (2000). https://doi.org/10.1006/jmps.1999.1283
Rattan, D., Bhatia, R., Singh, M.: Model clone detection based on tree comparison. In: 2012 Annual IEEE India Conference (INDICON), pp. 1041–1046 (2012). https://doi.org/10.1109/INDCON.2012.6420770
Rosca, D., Domingues, L.: A systematic comparison of roundtrip software engineering approaches applied to UML class diagram. Procedia Comput. Sci. 181, 861–868 (2021). https://doi.org/10.1016/j.procs.2021.01.240
Roy, G.G., Veraart, V.E.: Software engineering education: from an engineering perspective. In: Proceedings 1996 International Conference Software Engineering: Education and Practice, 1996, pp. 256–262. https://doi.org/10.1109/SEEP.1996.534008
Sandouka, R., Aljamaan, H.: Python code smells detection using conventional machine learning models. PeerJ. Comput. Sci. 9, e1370 (2023). https://doi.org/10.7717/peerj-cs.1370
Sidhu, B.K., Singh, K., Sharma, N.: A machine learning approach to software model refactoring. Int. J. Comput. Appl. (2020). https://doi.org/10.1080/1206212X.2020.1711616
Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: The impact of automated parameter optimization on defect prediction models. IEEE Trans. Softw. Eng. 45(7), 683–711 (2019). https://doi.org/10.1109/TSE.2018.2794977
“TensorFlow | Google Open Source Projects,” Google Open Source. Accessed: Jan. 27, 2023. [Online]. Available: https://opensource.google/projects/tensorflow
Vaswani, A. et al., Attention is all you need. arXiv, (2017). https://doi.org/10.48550/arXiv.1706.03762
Watanabe, S., Hutter, F.: c-TPE: generalizing tree-structured parzen estimator with inequality constraints for continuous and categorical hyperparameter optimization. arXiv, (2022). https://doi.org/10.48550/arXiv.2211.14411
Yin, X., Shi, C., Zhao, S.: Local and global feature based explainable feature envy detection. In: 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), Madrid, Spain: IEEE, pp. 942–951. (2021). https://doi.org/10.1109/COMPSAC51774.2021.00127
Acknowledgements
The authors acknowledge the support of King Fahd University of Petroleum and Minerals in the development of this work.
Author information
Authors and Affiliations
Contributions
AA wrote the main manuscript text. HA and MA edited and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alazba, A., Aljamaan, H. & Alshayeb, M. Automated detection of class diagram smells using self-supervised learning. Autom Softw Eng 31, 29 (2024). https://doi.org/10.1007/s10515-024-00429-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-024-00429-w