Abstract
Developers sometimes choose design and implementation shortcuts due to the pressure from tight release schedules. However, shortcuts introduce technical debt that increases as the software evolves. The debt needs to be repaid as fast as possible to minimize its impact on software development and software quality. Sometimes, technical debt is admitted by developers in comments and commit messages. Such debt is known as self-admitted technical debt (SATD). In data-intensive systems, where data manipulation is a critical functionality, the presence of SATD in the data access logic could seriously harm performance and maintainability. Understanding the composition and distribution of the SATDs across software systems and their evolution could provide insights into managing technical debt efficiently. We present a large-scale empirical study on the prevalence, composition, and evolution of SATD in data-intensive systems. We analyzed 83 open-source systems relying on relational databases as well as 19 systems relying on NoSQL databases. We detected SATD in source code comments obtained from different snapshots of the subject systems. To understand the evolution dynamics of SATDs, we conducted a survival analysis. Next, we performed a manual analysis of 361 sample data-access SATDs, investigating the composition of data-access SATDs and the reasons behind their introduction and removal. We identified 15 new SATD categories, out of which 11 are specific to database access operations. We found that most of the data-access SATDs are introduced in the later stages of change history rather than at the beginning. We also observed that bug fixing and refactoring are the main reasons behind the introduction of data-access SATDs.
Similar content being viewed by others
Notes
Blaze-Persistence, https://bit.ly/3qGaXbb
OpenL Tablets, https://bit.ly/3sioFkX
Snowstorm, https://bit.ly/3dEUMXN
Sqlg, https://bit.ly/3wxbAqW
GnuCash Android, https://bit.ly/37H1PeV
Sqlg, https://bit.ly/3pRCOnK
Foxtrot, https://bit.ly/3urWcey
UPortal, https://bit.ly/3qY50X2
Sqlg, https://bit.ly/3aIqEcc
Robolectric, https://bit.ly/3umvXpD
Carbon-apimgt, https://bit.ly/2NvDZvQ
References
Al-Barak M, Bahsoon R (2016) Database design debts through examining schema evolution. In: 2016 IEEE 8th international workshop on managing technical debt (MTD). https://doi.org/10.1109/MTD.2016.9, pp 17–23
Albarak M, Bahsoon R (2018) Prioritizing technical debt in database normalization using portfolio theory and data quality metrics. In: Proceedings of the 2018 international conference on technical debt, TechDebt ’18. https://doi.org/10.1145/3194164.3194170. Association for Computing Machinery, pp 31–40
Alfayez R, Alwehaibi W, Winn R, Venson E, Boehm B (2020) A systematic literature review of technical debt prioritization. In: Proceedings of the 3rd international conference on technical debt, TechDebt ’20. https://doi.org/10.1145/3387906.3388630. Association for Computing Machinery, pp 1–10
Alves NSR, Ribeiro LF, Caires V, Mendes TS, Spíanol RO (2014) Towards an ontology of terms on technical debt. In: 2014 Sixth international workshop on managing technical debt. https://doi.org/10.1109/MTD.2014.9, pp 1–7
Alves NS, Mendes TS, de Mendonça MG, Spínola RO, Shull F, Seaman C (2016) Identification and management of technical debt. Inf Softw Technol 70(C):100–121. https://doi.org/10.1016/j.infsof.2015.10.008
Aniche M, Bavota G, Treude C, Gerosa MA, van Deursen A (2018) Code smells for model-view-controller architectures. Empir Softw Eng 23 (4):2121–2157
Bavota G, Russo B (2016) A large-scale empirical study on self-admitted technical debt. In: Proceedings of the 13th international conference on mining software repositories, pp 315–326
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Chang J, Gerrish S, Wang C, Boyd-graber J, Blei D (2009) Reading tea leaves: how humans interpret topic models. In: Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A (eds) Advances in neural information processing systems, vol 22. Curran Associates Inc
Cleve A, Mens T, Hainaut J (2010) Data-intensive system evolution. Computer 43(8):110–112. https://doi.org/10.1109/MC.2010.227
Cunningham W (1992) The wycash portfolio management system. In: Addendum to the proceedings on object-oriented programming systems, languages, and applications (addendum), OOPSLA ’92. https://doi.org/10.1145/157709.157715. Association for Computing Machinery, pp 29–30
da Silva Maldonado E, Shihab E, Tsantalis N (2017) Using natural language processing to automatically detect self-admitted technical debt. IEEE Trans Softw Eng 43(11):1044–1062
De Freitas Farias MA, de Mendonça Neto MG, da Silva AB, Spínola RO (2015) A contextualized vocabulary model for identifying technical debt on code comments. In: 2015 IEEE 7th international workshop on managing technical debt (MTD). IEEE, pp 25–32
de Freitas Farias MA, Santos JA, Kalinowski M, Mendonça M, Spínola RO (2016) Investigating the identification of technical debt through code comment analysis. In: International conference on enterprise information systems. Springer, pp 284–309
Foidl H, Felderer M, Biffl S (2019) Technical debt in data-intensive software systems. In: 2019 45th Euromicro conference on software engineering and advanced applications (SEAA). https://doi.org/10.1109/SEAA.2019.00058, pp 338–341
GitHub Inc (2019) Search API. https://developer.github.com/v3/search/
Gokhale M, Cohen J, Yoo A, Miller WM, Jacob A, Ulmer C, Pearce R (2008) Hardware technologies for high-performance data-intensive computing. Computer 41(4):60–68
Huang Q, Shihab E, Xia X, Lo D, Li S (2018) Identifying self-admitted technical debt in open source projects using text mining. Empir Softw Eng 23(1):418–451
Hummel O, Eichelberger H, Giloj A, Werle D, Schmid K (2018) A collection of software engineering challenges for big data system development. In: 2018 44th Euromicro conference on software engineering and advanced applications (SEAA). https://doi.org/10.1109/SEAA.2018.00066, pp 362–369
Johannes D, Khomh F, Antoniol G (2019) A large-scale empirical study of code smells in javascript projects. Softw Qual J:1–44
Kamei Y, Maldonado EDS, Shihab E, Ubayashi N (2016) Using analytics to quantify interest of self-admitted technical debt. In: QuASoq/TDA@ APSEC, pp 68–71
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assocs 53(282):457–481
Kuutila M, Mäntylä M, Farooq U, Claes M (2020) Time pressure in software engineering: a systematic review. Inf Softw Technol 121:106257. https://doi.org/10.1016/j.infsof.2020.106257
Li Z, Avgeriou P, Liang P (2015) A systematic mapping study on technical debt and its management. J Syst Softw 101(C):193–220. https://doi.org/10.1016/j.jss.2014.12.027
Lim E, Taksande N, Seaman C (2012) A balancing act: what software practitioners have to say about technical debt. IEEE Softw 29(6):22–27. https://doi.org/10.1109/MS.2012.130
Lin D, Neamtiu I (2009) Collateral evolution of applications and databases. In: Proceedings of the joint international and annual ERCIM workshops on principles of software evolution (IWPSE) and software evolution (Evol) workshops. https://doi.org/10.1145/1595808.1595817. ACM, pp 31–40
Liu Z, Huang Q, Xia X, Shihab E, Lo D, Li S (2018) Satd detector: a text-mining-based self-admitted technical debt detection tool. In: Proceedings of the 40th international conference on software engineering: companion proceedings, pp 9–12
Maipradit R, Treude C, Hata H, Matsumoto K (2020) Wait for it: identifying “on-hold” self-admitted technical debt. Empir Softw Eng 25(5):3770–3798
Maldonado EDS, Abdalkareem R, Shihab E, Serebrenik A (2017) An empirical study on the removal of self-admitted technical debt. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 238–248
Meurice L, Nagy C, Cleve A (2016) Detecting and preventing program inconsistencies under database schema evolution. In: Proceedings of the 2016 IEEE international conference on software quality, reliability and security (QRS 2016). https://doi.org/10.1109/QRS.2016.38. IEEE, pp 262–273
Miller RG Jr (2011) Survival analysis. Wiley, New York
Muse BA, Rahman MM, Nagy C, Cleve A, Khomh F, Antoniol G (2020) On the prevalence, impact, and evolution of sql code smells in data-intensive systems. In: Proceedings of the 17th international conference on mining software repositories, MSR ’20. https://doi.org/10.1145/3379597.3387467. Association for Computing Machinery, New York, pp 327–338
Muse BA, Nagy C, Khomh F, Cleve A, Antoniol G (2022) Replication package for: FIXME: synchronize with database. An empirical study of data access self-admitted technical debt. https://doi.org/10.5281/zenodo.5825671
Nagy C, Cleve A (2018) SQLInspect: a static analyzer to inspect database usage in Java applications. In: Proceedings of the 40th international conference on software engineering: companion proceedings. ACM, pp 93–96
Papadimitriou CH, Raghavan P, Tamaki H, Vempala S (2000) Latent semantic indexing: a probabilistic analysis. J Comput Syst Sci 61(2):217–235
Park B, Rao DL, Gudivada VN (2021) Dangers of bias in data-intensive information systems. In: Deshpande P, Abraham A, Iyer B, Ma K (eds) Next generation information processing system. Springer Singapore, Singapore, pp 259–271
Potdar A, Shihab E (2014) An exploratory study on self-admitted technical debt. In: 2014 IEEE international conference on software maintenance and evolution. IEEE, pp 91–100
Ramasubbu N, Kemerer CF (2016) Technical debt and the reliability of enterprise software systems: a competing risks analysis. Manag Sci 62(5):1487–1510. https://doi.org/10.1287/mnsc.2015.2196
Rios N, de Mendonça Neto MG, Spínola R O (2018) A tertiary study on technical debt: types, management strategies, research trends, and base information for practitioners. Inf Softw Technol 102:117–145. https://doi.org/10.1016/j.infsof.2018.05.010
Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on Web search and data mining, pp 399–408
Sadalage PJ, Fowler M (2014) NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Addison-Wesley
Scherzinger S, Klettke M (2013) Managing schema evolution in noSQL data stores. In: Proceedings of the 14th international symposium on database programming languages (DBPL 2013)
Scherzinger S, Sidortschuck S (2020) An empirical study on the design and evolution of noSQL database schemas. In: Dobbie G, Frank U, Kappel G, Liddle SW, Mayr HC (eds) Conceptual modeling. Springer International Publishing, Cham, pp 441–455
Sierra G, Shihab E, Kamei Y (2019) A survey of self-admitted technical debt. J Syst Softw 152:70–82
Spadini D, Aniche M, Bacchelli A (2018) Pydriller: Python framework for mining software repositories. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2018. https://doi.org/10.1145/3236024.3264598. Association for Computing Machinery, pp 908–911
Stonebraker M, Deng D, Brodie ML (2017) Application-database co-evolution: a new design and development paradigm. In: New England database day
Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2015) When and why your code starts to smell bad. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 403–414
Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088
Vassiliadis P (2021) Profiles of schema evolution in free open source software projects. In: Proceedings of the 2021 IEEE 37th international conference on data engineering (ICDE), pp 1–12
Weber JH, Cleve A, Meurice L, Ruiz FJB (2014) Managing technical debt in database schemas of critical software. In: 2014 Sixth international workshop on managing technical debt. https://doi.org/10.1109/MTD.2014.17, pp 43–46
Wehaibi S, Shihab E, Guerrouj L (2016) Examining the impact of self-admitted technical debt on software quality. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 179–188
Xavier L, Ferreira F, Brito R, Valente MT (2020) Beyond the code: mining self-admitted technical debt in issue tracker systems. In: Proceedings of the 17th international conference on mining software repositories, MSR ’20. https://doi.org/10.1145/3379597.3387459. Association for Computing Machinery, pp 137–146
Yan M, Xia X, Shihab E, Lo D, Yin J, Yang X (2018) Automating change-level self-admitted technical debt determination. IEEE Trans Softw Eng 45(12):1211–1229
Yu Z, Fahid FM, Tu H, Menzies T (2020) Identifying self-admitted technical debts with jitterbug: a two-step approach. arXiv:2002.11049
Zampetti F, Noiseux C, Antoniol G, Khomh F, Di Penta M (2017) Recommending when design technical debt should be self-admitted. In: 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 216–226
Zampetti F, Serebrenik A, Di Penta M (2018) Was self-admitted technical debt removal a real removal? An in-depth perspective. In: Proceedings of the 15th international conference on mining software repositories, MSR ’18. https://doi.org/10.1145/3196398.3196423. Association for Computing Machinery, pp 526–536
Zampetti F, Serebrenik A, Di Penta M (2020) Automatically learning patterns for self-admitted technical debt removal. In: 2020 IEEE 27th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 355–366
Zhao W, Chen JJ, Perkins R, Liu Z, Ge W, Ding Y, Zou W (2015) A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinform 16(13):S8
Funding
This workss is partly funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds de Recherche du Québec (FRQ), the Swiss National Science Foundation (SNF) and the Fonds de la Recherche Scientifique (F.R.S.-FNRS) project “INSTINCT” (190113), and the F.R.S.-FNRS and FWO EOS project SECO-ASSIST (30446992).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare no competing interests.
Additional information
Communicated by: Nachiappan Nagappan
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Muse, B.A., Nagy, C., Cleve, A. et al. FIXME: synchronize with database! An empirical study of data access self-admitted technical debt. Empir Software Eng 27, 130 (2022). https://doi.org/10.1007/s10664-022-10119-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10119-4