Abstract
Technical debt (TD) refers to the phenomenon that developers choose a compromise solution from a short-term benefit perspective during design or architecture selection. TD-related issues, such as code smells, may have a critical impact on important non-functional requirements. Different severity levels of TD issues require different measures to be taken by developers in the future. Existing studies mainly focus on detecting TD in software projects through source code or comments, but usually ignore the severity degree of TD issues. As a matter of fact, it is very important to identify the severity of TD issues and clarify which TD should be prioritized. In this paper, we propose an approach that combines the semantic and structural information of the code snippets to identify their severity at method level. In the approach, we first transform each method affected by TD issues into an abstract syntax tree (AST) and use the paths in the AST to represent its semantic information. Then, we extract different code metrics to measure the size, coupling, and complexity of methods affected by TD issues to represent their structural information. Finally, we build a stacking ensemble model to identify the severity of TD issues by using Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) for the base classifiers and Support Vector Machine (SVM) for the meta-classifier. The evaluation results on the real dataset show that our approach achieves 65.77% in terms of precision, 68.18% in terms of recall, and 65.84% in terms of F1-score on average. In addition, the experimental results also demonstrate that the strategy of combining the semantic and structural information of code snippets is effective in improving the effectiveness of our approach.
Similar content being viewed by others
Data availability
The data of this study is openly available in Github at https://github.com/HduDBSI/SQJ-TD-Severity.
References
Alfayez, R., & Boehm, B. (2019). Technical debt prioritization: A search-based approach. In 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS), pp. 434–445. IEEE.
Alon, U., Zilberstein, M., Levy, O., & Yahav, E. (2019). code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL), 1–29.
Amanatidis, T., Mittas, N., Moschou, A., Chatzigeorgiou, A., Ampatzoglou, A., & Angelis, L. (2020). Evaluating the agreement among technical debt measurement tools: Building an empirical benchmark of technical debt liabilities. Empirical Software Engineering, 25, 4161–4204.
Aniche, M. (2015). Java code metrics calculator (CK). Available in https://github.com/mauricioaniche/ck/
Avgeriou, P. C., Taibi, D., Ampatzoglou, A., Arcelli Fontana, F., Besker, T., Chatzigeorgiou, A., Lenarduzzi, V., Martini, A., Moschou, A., Pigazzini, I., et al. (2020). An overview and comparison of technical debt measurement tools. IEEE Software, 38(3), 61–71.
Boutaib, S., Bechikh, S., Palomba, F., Elarbi, M., Makhlouf, M., & Said, L. B. (2021). Code smell detection and identification in imbalanced environments. Expert Systems with Applications, 166, 114076.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5–32.
Chatzigeorgiou, A., Ampatzoglou, A., Ampatzoglou, A., & Amanatidis, T. (2015). Estimating the breaking point for technical debt. In 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD), pp. 53–56. IEEE.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., et al. (2015). Xgboost: Extreme gradient boosting. R package version 0.4-2, 1(4), 1–4.
Chen, X., Yu, D., Fan, X., Wang, L., & Chen, J. (2021). Multiclass classification for self-admitted technical debt based on XGBoost. IEEE Transactions on Reliability.
Conejero, J. M., Rodríguez-Echeverría, R., Hernández, J., Clemente, P. J., Ortiz-Caraballo, C., Jurado, E., & Sánchez-Figueroa, F. (2018). Early evaluation of technical debt impact on maintainability. Journal of Systems and Software, 142, 92–114.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.
Cunningham, W. (1992). The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger, 4(2), 29–30.
da Silva, Maldonado E., Shihab, E., & Tsantalis, N. (2017). Using natural language processing to automatically detect self-admitted technical debt. IEEE Transactions on Software Engineering, 43(11), 1044–1062.
de Almeida, R. R., Kulesza, U., Treude, C., Higino Guedes Lima, A., et al. (2018). Aligning technical debt prioritization with business objectives: A multiple-case study. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 655–664. IEEE.
de Lima, B. S., & Garcia, R. E. (2020). Analyzing the rework time and severity of code debt: A case study using technical debt catalogs. arXiv preprint arXiv:2002.04695
de Lima, B. S., Garcia, R. E., & Eler, D. M. (2022). Toward prioritization of self-admitted technical debt: An approach to support decision to payment. Software Quality Journal, 30(3), 729–755.
Detofeno, T., Malucelli, A., & Reinehr, S. (2022). PriorTD: A method for prioritization technical debt. In Proceedings of the XXXVI Brazilian Symposium on Software Engineering, pp. 230–240.
Digkas, G., Lungu, M., Chatzigeorgiou, A., & Avgeriou, P. (2017). The evolution of technical debt in the apache ecosystem. In European Conference on Software Architecture, pages 51–66. Springer.
Falessi, D., & Reichel, A. (2015). Towards an open-source tool for measuring and visualizing the interest of technical debt. In 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD), pp. 1–8. IEEE.
Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). Smote for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863–905.
Flisar, J., & Podgorelec, V. (2018). Enhanced feature selection using word embeddings for self-admitted technical debt identification. In 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 230–233. IEEE.
Fontana, F. A., & Zanoni, M. (2017). Code smell severity classification using machine learning techniques. Knowledge-Based Systems, 128, 43–58.
Guggulothu, T., & Moiz, S. A. (2020). Code smell detection using multi-label classification approach. Software Quality Journal, 28, 1063–1086.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284.
Huang, Q., Shihab, E., Xia, X., Lo, D., & Li, S. (2018). Identifying self-admitted technical debt in open source projects using text mining. Empirical Software Engineering, 23(1), 418–451.
Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent data analysis, 6(5), 429–449.
Kamei, Y., Maldonado, E. D. S., Shihab, E., & Ubayashi, N. (2016). Using analytics to quantify interest of self-admitted technical debt. In QuASoQ/TDA@ APSEC, pp. 68–71.
Lenarduzzi, V., Sillitti, A., & Taibi, D. (2017). Analyzing forty years of software maintenance models. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 146–148. IEEE.
Lenarduzzi, V., Sillitti, A., & Taibi, D. (2020). A survey on code analysis tools for software maintenance prediction. In Proceedings of 6th International Conference in Software Engineering for Defence Applications: SEDA 2018 6, pp. 165–175. Springer.
Letouzey, J.-L., & Ilkiewicz, M. (2012). Managing technical debt with the sqale method. IEEE software, 29(6), 44–51.
Li, Z., Liang, P., Avgeriou, P., Guelfi, N., & Ampatzoglou, A. (2014). An empirical investigation of modularity metrics for indicating architectural technical debt. In Proceedings of the 10th international ACM Sigsoft conference on Quality of software architectures, pp. 119–128.
Li, Z., Avgeriou, P., & Liang, P. (2015). A systematic mapping study on technical debt and its management. Journal of Systems and Software, 101, 193–220.
Liu, X. -Y., Wu, J., & Zhou, Z. -H. (2008). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539–550.
Maipradit, R., Treude, C., Hata, H., & Matsumoto, K. (2020). Wait for it: Identifying (on-hold) self-admitted technical debt. Empirical Software Engineering, 25(5), 3770–3798.
Maldonado, E. D. S., & Shihab, E. (2015). Detecting and quantifying different types of self-admitted technical debt. In 2015 IEEE 7Th international workshop on managing technical debt (MTD), pp. 9–15. IEEE.
Martini, A., & Bosch, J. (2017). On the interest of architectural technical debt: Uncovering the contagious debt phenomenon. Journal of Software: Evolution and Process, 29(10), e1877.
Mensah, S., Keung, J., Svajlenko, J., Bennin, K. E., & Mi, Q. (2018). On the value of a prioritization scheme for resolving self-admitted technical debt. Journal of Systems and Software, 135, 37–54.
Ramač, R., Mandić, V., Taušan, N., Rios, N., Freire, S., Pérez, B., Castellanos, C., Correal, D., Pacheco, A., Lopez, G., et al. (2022). Prevalence, common causes and effects of technical debt: Results from a family of surveys with the it industry. Journal of Systems and Software, 184, 111114.
Ren, X., Xing, Z., Xia, X., Lo, D., Wang, X., & Grundy, J. (2019). Neural network-based detection of self-admitted technical debt: From performance to explainability. ACM transactions on software engineering and methodology (TOSEM), 28(3), 1–45.
Ribeiro, L. F., de Freitas Farias, M. A., Mendonça, M. G., & Spínola, R. O. (2016). Decision criteria for the payment of technical debt in software projects: A systematic mapping study. ICEIS, 1, 572–579.
Rios, N., de Mendonça Neto, M. G., & Spínola, R. O. (2018a). A tertiary study on technical debt: Types, management strategies, research trends, and base information for practitioners. Information and Software Technology, 102, 117–145.
Rios, N., Spínola, R. O., Mendonça, M., & Seaman, C. (2018b). The most common causes and effects of technical debt: First results from a global family of industrial surveys. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–10.
Sae-Lim, N., Hayashi, S., & Saeki, M. (2018). Context-based approach to prioritize code smells for prefactoring. Journal of Software: Evolution and Process, 30(6), e1886.
Tan, J., Feitosa, D., Avgeriou, P., & Lungu, M. (2021). Evolution of technical debt remediation in python: A case study on the apache software ecosystem. Journal of Software: Evolution and Process, 33(4), e2319.
Tsoukalas, D., Chatzigeorgiou, A., Ampatzoglou, A., Mittas, N., & Kehagias, D. (2022). TD classifier: Automatic identification of Java classes with high technical debt. In Proceedings of the International Conference on Technical Debt, pp. 76–80.
Vassallo, C., Panichella, S., Palomba, F., Proksch, S., Gall, H. C., & Zaidman, A. (2020). How developers engage with static analysis tools in different contexts. Empirical Software Engineering, 25(2), 1419–1457.
Wang, X., Liu, J., Li, L., Chen, X., Liu, X., & Wu, H. (2020). Detecting and explaining self-admitted technical debts with attention-based neural networks. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 871–882.
Wehaibi, S., Shihab, E., & Guerrouj, L. (2016). Examining the impact of self-admitted technical debt on software quality. In 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), 1, 179–188. IEEE.
Wilson, D. R., & Martinez, T. R. (2000). Reduction techniques for instance-based learning algorithms. Machine learning, 38(3), 257–286.
Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241–259.
Xia, X., Shihab, E., Kamei, Y., Lo, D., & Wang, X. (2016). Predicting crashing releases of mobile applications. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–10.
Yan, M., Xia, X., Shihab, E., Lo, D., Yin, J., & Yang, X. (2018). Automating change-level self-admitted technical debt determination. IEEE Transactions on Software Engineering, 45(12), 1211–1229.
Yli-Huumo, J., Maglyas, A., & Smolander, K. (2016). How do software development teams manage technical debt? An empirical study. Journal of Systems and Software, 120, 195–218.
Yu, D., Wang, L., Chen, X., & Chen, J. (2021). Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt. Frontiers of Computer Science, 15(4), 1–12.
Zampetti, F., Noiseux, C., Antoniol, G., Khomh, F., & Di Penta, M. (2017). Recommending when design technical debt should be self-admitted. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 216–226. IEEE.
Zampetti, F., Serebrenik, A., & Di Penta, M. (2020). Automatically learning patterns for self-admitted technical debt removal. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 355–366. IEEE.
Funding
This work was supported by the National Natural Science Foundation of China under Grants 62372145 and 61902096, the Natural Science Foundation of Zhejiang Province under Grant LY21F020020, and the Key Research and Development Program of Zhejiang Province under Grants 2023C03200 and 2023C03179.
Author information
Authors and Affiliations
Contributions
Dongjin Yu: conceptualization, methodology. Sicheng Li: data curation, methodology, writing original draft, software. Xin Chen: validation, reviewing. Tian Sun: data curation, investigation.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, D., Li, S., Chen, X. et al. Identifying the severity of technical debt issues based on semantic and structural information. Software Qual J 31, 1499–1526 (2023). https://doi.org/10.1007/s11219-023-09651-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-023-09651-3