Skip to main content
Log in

FIXME: synchronize with database! An empirical study of data access self-admitted technical debt

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Developers sometimes choose design and implementation shortcuts due to the pressure from tight release schedules. However, shortcuts introduce technical debt that increases as the software evolves. The debt needs to be repaid as fast as possible to minimize its impact on software development and software quality. Sometimes, technical debt is admitted by developers in comments and commit messages. Such debt is known as self-admitted technical debt (SATD). In data-intensive systems, where data manipulation is a critical functionality, the presence of SATD in the data access logic could seriously harm performance and maintainability. Understanding the composition and distribution of the SATDs across software systems and their evolution could provide insights into managing technical debt efficiently. We present a large-scale empirical study on the prevalence, composition, and evolution of SATD in data-intensive systems. We analyzed 83 open-source systems relying on relational databases as well as 19 systems relying on NoSQL databases. We detected SATD in source code comments obtained from different snapshots of the subject systems. To understand the evolution dynamics of SATDs, we conducted a survival analysis. Next, we performed a manual analysis of 361 sample data-access SATDs, investigating the composition of data-access SATDs and the reasons behind their introduction and removal. We identified 15 new SATD categories, out of which 11 are specific to database access operations. We found that most of the data-access SATDs are introduced in the later stages of change history rather than at the beginning. We also observed that bug fixing and refactoring are the main reasons behind the introduction of data-access SATDs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://bit.ly/2YLrLnU

  2. https://bit.ly/3jj5JAH

  3. http://www.jnosql.org/docs/introduction.html

  4. https://www.srcml.org/

  5. https://bit.ly/3siSWzX

  6. https://github.com/bbossgroups/bboss

  7. https://github.com/Blazebit/blaze-persistence

  8. https://github.com/SmartDataAnalytics/jena-sparql-api

  9. https://github.com/denimgroup/threadfix

  10. https://github.com/wordpress-mobile/WordPress-Android

  11. https://github.com/ControlSystemStudio/cs-studio

  12. Blaze-Persistence, https://bit.ly/3qGaXbb

  13. OpenL Tablets, https://bit.ly/3sioFkX

  14. Snowstorm, https://bit.ly/3dEUMXN

  15. Sqlg, https://bit.ly/3wxbAqW

  16. GnuCash Android, https://bit.ly/37H1PeV

  17. Sqlg, https://bit.ly/3pRCOnK

  18. Foxtrot, https://bit.ly/3urWcey

  19. UPortal, https://bit.ly/3qY50X2

  20. Sqlg, https://bit.ly/3aIqEcc

  21. Robolectric, https://bit.ly/3umvXpD

  22. Carbon-apimgt, https://bit.ly/2NvDZvQ

References

  • Al-Barak M, Bahsoon R (2016) Database design debts through examining schema evolution. In: 2016 IEEE 8th international workshop on managing technical debt (MTD). https://doi.org/10.1109/MTD.2016.9, pp 17–23

  • Albarak M, Bahsoon R (2018) Prioritizing technical debt in database normalization using portfolio theory and data quality metrics. In: Proceedings of the 2018 international conference on technical debt, TechDebt ’18. https://doi.org/10.1145/3194164.3194170. Association for Computing Machinery, pp 31–40

  • Alfayez R, Alwehaibi W, Winn R, Venson E, Boehm B (2020) A systematic literature review of technical debt prioritization. In: Proceedings of the 3rd international conference on technical debt, TechDebt ’20. https://doi.org/10.1145/3387906.3388630. Association for Computing Machinery, pp 1–10

  • Alves NSR, Ribeiro LF, Caires V, Mendes TS, Spíanol RO (2014) Towards an ontology of terms on technical debt. In: 2014 Sixth international workshop on managing technical debt. https://doi.org/10.1109/MTD.2014.9, pp 1–7

  • Alves NS, Mendes TS, de Mendonça MG, Spínola RO, Shull F, Seaman C (2016) Identification and management of technical debt. Inf Softw Technol 70(C):100–121. https://doi.org/10.1016/j.infsof.2015.10.008

    Article  Google Scholar 

  • Aniche M, Bavota G, Treude C, Gerosa MA, van Deursen A (2018) Code smells for model-view-controller architectures. Empir Softw Eng 23 (4):2121–2157

    Article  Google Scholar 

  • Bavota G, Russo B (2016) A large-scale empirical study on self-admitted technical debt. In: Proceedings of the 13th international conference on mining software repositories, pp 315–326

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Chang J, Gerrish S, Wang C, Boyd-graber J, Blei D (2009) Reading tea leaves: how humans interpret topic models. In: Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A (eds) Advances in neural information processing systems, vol 22. Curran Associates Inc

  • Cleve A, Mens T, Hainaut J (2010) Data-intensive system evolution. Computer 43(8):110–112. https://doi.org/10.1109/MC.2010.227

    Article  Google Scholar 

  • Cunningham W (1992) The wycash portfolio management system. In: Addendum to the proceedings on object-oriented programming systems, languages, and applications (addendum), OOPSLA ’92. https://doi.org/10.1145/157709.157715. Association for Computing Machinery, pp 29–30

  • da Silva Maldonado E, Shihab E, Tsantalis N (2017) Using natural language processing to automatically detect self-admitted technical debt. IEEE Trans Softw Eng 43(11):1044–1062

    Article  Google Scholar 

  • De Freitas Farias MA, de Mendonça Neto MG, da Silva AB, Spínola RO (2015) A contextualized vocabulary model for identifying technical debt on code comments. In: 2015 IEEE 7th international workshop on managing technical debt (MTD). IEEE, pp 25–32

  • de Freitas Farias MA, Santos JA, Kalinowski M, Mendonça M, Spínola RO (2016) Investigating the identification of technical debt through code comment analysis. In: International conference on enterprise information systems. Springer, pp 284–309

  • Foidl H, Felderer M, Biffl S (2019) Technical debt in data-intensive software systems. In: 2019 45th Euromicro conference on software engineering and advanced applications (SEAA). https://doi.org/10.1109/SEAA.2019.00058, pp 338–341

  • GitHub Inc (2019) Search API. https://developer.github.com/v3/search/

  • Gokhale M, Cohen J, Yoo A, Miller WM, Jacob A, Ulmer C, Pearce R (2008) Hardware technologies for high-performance data-intensive computing. Computer 41(4):60–68

    Article  Google Scholar 

  • Huang Q, Shihab E, Xia X, Lo D, Li S (2018) Identifying self-admitted technical debt in open source projects using text mining. Empir Softw Eng 23(1):418–451

    Article  Google Scholar 

  • Hummel O, Eichelberger H, Giloj A, Werle D, Schmid K (2018) A collection of software engineering challenges for big data system development. In: 2018 44th Euromicro conference on software engineering and advanced applications (SEAA). https://doi.org/10.1109/SEAA.2018.00066, pp 362–369

  • Johannes D, Khomh F, Antoniol G (2019) A large-scale empirical study of code smells in javascript projects. Softw Qual J:1–44

  • Kamei Y, Maldonado EDS, Shihab E, Ubayashi N (2016) Using analytics to quantify interest of self-admitted technical debt. In: QuASoq/TDA@ APSEC, pp 68–71

  • Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assocs 53(282):457–481

    Article  MathSciNet  Google Scholar 

  • Kuutila M, Mäntylä M, Farooq U, Claes M (2020) Time pressure in software engineering: a systematic review. Inf Softw Technol 121:106257. https://doi.org/10.1016/j.infsof.2020.106257

    Article  Google Scholar 

  • Li Z, Avgeriou P, Liang P (2015) A systematic mapping study on technical debt and its management. J Syst Softw 101(C):193–220. https://doi.org/10.1016/j.jss.2014.12.027

    Article  Google Scholar 

  • Lim E, Taksande N, Seaman C (2012) A balancing act: what software practitioners have to say about technical debt. IEEE Softw 29(6):22–27. https://doi.org/10.1109/MS.2012.130

    Article  Google Scholar 

  • Lin D, Neamtiu I (2009) Collateral evolution of applications and databases. In: Proceedings of the joint international and annual ERCIM workshops on principles of software evolution (IWPSE) and software evolution (Evol) workshops. https://doi.org/10.1145/1595808.1595817. ACM, pp 31–40

  • Liu Z, Huang Q, Xia X, Shihab E, Lo D, Li S (2018) Satd detector: a text-mining-based self-admitted technical debt detection tool. In: Proceedings of the 40th international conference on software engineering: companion proceedings, pp 9–12

  • Maipradit R, Treude C, Hata H, Matsumoto K (2020) Wait for it: identifying “on-hold” self-admitted technical debt. Empir Softw Eng 25(5):3770–3798

    Article  Google Scholar 

  • Maldonado EDS, Abdalkareem R, Shihab E, Serebrenik A (2017) An empirical study on the removal of self-admitted technical debt. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 238–248

  • Meurice L, Nagy C, Cleve A (2016) Detecting and preventing program inconsistencies under database schema evolution. In: Proceedings of the 2016 IEEE international conference on software quality, reliability and security (QRS 2016). https://doi.org/10.1109/QRS.2016.38. IEEE, pp 262–273

  • Miller RG Jr (2011) Survival analysis. Wiley, New York

    Google Scholar 

  • Muse BA, Rahman MM, Nagy C, Cleve A, Khomh F, Antoniol G (2020) On the prevalence, impact, and evolution of sql code smells in data-intensive systems. In: Proceedings of the 17th international conference on mining software repositories, MSR ’20. https://doi.org/10.1145/3379597.3387467. Association for Computing Machinery, New York, pp 327–338

  • Muse BA, Nagy C, Khomh F, Cleve A, Antoniol G (2022) Replication package for: FIXME: synchronize with database. An empirical study of data access self-admitted technical debt. https://doi.org/10.5281/zenodo.5825671

  • Nagy C, Cleve A (2018) SQLInspect: a static analyzer to inspect database usage in Java applications. In: Proceedings of the 40th international conference on software engineering: companion proceedings. ACM, pp 93–96

  • Papadimitriou CH, Raghavan P, Tamaki H, Vempala S (2000) Latent semantic indexing: a probabilistic analysis. J Comput Syst Sci 61(2):217–235

    Article  MathSciNet  Google Scholar 

  • Park B, Rao DL, Gudivada VN (2021) Dangers of bias in data-intensive information systems. In: Deshpande P, Abraham A, Iyer B, Ma K (eds) Next generation information processing system. Springer Singapore, Singapore, pp 259–271

  • Potdar A, Shihab E (2014) An exploratory study on self-admitted technical debt. In: 2014 IEEE international conference on software maintenance and evolution. IEEE, pp 91–100

  • Ramasubbu N, Kemerer CF (2016) Technical debt and the reliability of enterprise software systems: a competing risks analysis. Manag Sci 62(5):1487–1510. https://doi.org/10.1287/mnsc.2015.2196

    Article  Google Scholar 

  • Rios N, de Mendonça Neto MG, Spínola R O (2018) A tertiary study on technical debt: types, management strategies, research trends, and base information for practitioners. Inf Softw Technol 102:117–145. https://doi.org/10.1016/j.infsof.2018.05.010

    Article  Google Scholar 

  • Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on Web search and data mining, pp 399–408

  • Sadalage PJ, Fowler M (2014) NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Addison-Wesley

  • Scherzinger S, Klettke M (2013) Managing schema evolution in noSQL data stores. In: Proceedings of the 14th international symposium on database programming languages (DBPL 2013)

  • Scherzinger S, Sidortschuck S (2020) An empirical study on the design and evolution of noSQL database schemas. In: Dobbie G, Frank U, Kappel G, Liddle SW, Mayr HC (eds) Conceptual modeling. Springer International Publishing, Cham, pp 441–455

  • Sierra G, Shihab E, Kamei Y (2019) A survey of self-admitted technical debt. J Syst Softw 152:70–82

    Article  Google Scholar 

  • Spadini D, Aniche M, Bacchelli A (2018) Pydriller: Python framework for mining software repositories. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2018. https://doi.org/10.1145/3236024.3264598. Association for Computing Machinery, pp 908–911

  • Stonebraker M, Deng D, Brodie ML (2017) Application-database co-evolution: a new design and development paradigm. In: New England database day

  • Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2015) When and why your code starts to smell bad. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 403–414

  • Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088

    Article  Google Scholar 

  • Vassiliadis P (2021) Profiles of schema evolution in free open source software projects. In: Proceedings of the 2021 IEEE 37th international conference on data engineering (ICDE), pp 1–12

  • Weber JH, Cleve A, Meurice L, Ruiz FJB (2014) Managing technical debt in database schemas of critical software. In: 2014 Sixth international workshop on managing technical debt. https://doi.org/10.1109/MTD.2014.17, pp 43–46

  • Wehaibi S, Shihab E, Guerrouj L (2016) Examining the impact of self-admitted technical debt on software quality. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 179–188

  • Xavier L, Ferreira F, Brito R, Valente MT (2020) Beyond the code: mining self-admitted technical debt in issue tracker systems. In: Proceedings of the 17th international conference on mining software repositories, MSR ’20. https://doi.org/10.1145/3379597.3387459. Association for Computing Machinery, pp 137–146

  • Yan M, Xia X, Shihab E, Lo D, Yin J, Yang X (2018) Automating change-level self-admitted technical debt determination. IEEE Trans Softw Eng 45(12):1211–1229

    Article  Google Scholar 

  • Yu Z, Fahid FM, Tu H, Menzies T (2020) Identifying self-admitted technical debts with jitterbug: a two-step approach. arXiv:2002.11049

  • Zampetti F, Noiseux C, Antoniol G, Khomh F, Di Penta M (2017) Recommending when design technical debt should be self-admitted. In: 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 216–226

  • Zampetti F, Serebrenik A, Di Penta M (2018) Was self-admitted technical debt removal a real removal? An in-depth perspective. In: Proceedings of the 15th international conference on mining software repositories, MSR ’18. https://doi.org/10.1145/3196398.3196423. Association for Computing Machinery, pp 526–536

  • Zampetti F, Serebrenik A, Di Penta M (2020) Automatically learning patterns for self-admitted technical debt removal. In: 2020 IEEE 27th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 355–366

  • Zhao W, Chen JJ, Perkins R, Liu Z, Ge W, Ding Y, Zou W (2015) A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinform 16(13):S8

    Article  Google Scholar 

Download references

Funding

This workss is partly funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds de Recherche du Québec (FRQ), the Swiss National Science Foundation (SNF) and the Fonds de la Recherche Scientifique (F.R.S.-FNRS) project “INSTINCT” (190113), and the F.R.S.-FNRS and FWO EOS project SECO-ASSIST (30446992).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biruk Asmare Muse.

Ethics declarations

Conflict of Interests

The authors declare no competing interests.

Additional information

Communicated by: Nachiappan Nagappan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muse, B.A., Nagy, C., Cleve, A. et al. FIXME: synchronize with database! An empirical study of data access self-admitted technical debt. Empir Software Eng 27, 130 (2022). https://doi.org/10.1007/s10664-022-10119-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10119-4

Keywords

Navigation