Comparing four approaches for technical debt identification
Software systems accumulate technical debt (TD) when short-term goals in software development are traded for long-term goals (e.g., quick-and-dirty implementation to reach a release date versus a well-refactored implementation that supports the long-term health of the project). Some forms of TD accumulate over time in the form of source code that is difficult to work with and exhibits a variety of anomalies. A number of source code analysis techniques and tools have been proposed to potentially identify the code-level debt accumulated in a system. What has not yet been studied is if using multiple tools to detect TD can lead to benefits, that is, if different tools will flag the same or different source code components. Further, these techniques also lack investigation into the symptoms of TD “interest” that they lead to. To address this latter question, we also investigated whether TD, as identified by the source code analysis techniques, correlates with interest payments in the form of increased defect- and change-proneness. Comparing the results of different TD identification approaches to understand their commonalities and differences and to evaluate their relationship to indicators of future TD “interest.” We selected four different TD identification techniques (code smells, automatic static analysis issues, grime buildup, and Modularity violations) and applied them to 13 versions of the Apache Hadoop open source software project. We collected and aggregated statistical measures to investigate whether the different techniques identified TD indicators in the same or different classes and whether those classes in turn exhibited high interest (in the form of a large number of defects and higher change-proneness). The outputs of the four approaches have very little overlap and are therefore pointing to different problems in the source code. Dispersed Coupling and Modularity violations were co-located in classes with higher defect-proneness. We also observed a strong relationship between Modularity violations and change-proneness. Our main contribution is an initial overview of the TD landscape, showing that different TD techniques are loosely coupled and therefore indicate problems in different locations of the source code. Moreover, our proxy interest indicators (change- and defect-proneness) correlate with only a small subset of TD indicators.
KeywordsTechnical debt Software maintenance Software quality Source code analysis Modularity violations Grime Code smells ASA
- Altman, D. G. (1990) Practical Statistics for Medical Research (Statistics texts), 1st ed. Chapman & Hall/CRC, Nov. 1990. [Online]. Available: http://www.worldcat.org/-isbn/-0412276305.
- Ayewah, N., Pugh, W. (2010) The google findbugs fixit, In Proceedings of the 19th International Symposium on Software Testing and Analysis (pp. 241–252), ser. ISSTA’10. New York, NY, USA: ACM. [Online]. Available: 10.1145/-1831708.1831738.
- Izurieta C., Bieman, J. (2008) Testing consequences of grime buildup in object oriented design patterns, In Software Testing, Verification, and Validation, 2008 1st International Conference on, April 2008, pp. 171–179.Google Scholar
- Bieman, J.M., Straw, G., Wang, H., Munger, P.W., Alexander, R.T. (2003) Design patterns and change proneness: An examination of five evolving systems, Software Metrics Symposium, 2003. Proceedings. Ninth International, pp. 40–49, 3–5 Sept. 2003.Google Scholar
- Boogerd, C., Moonen, L. (2009) Evaluating the relation between coding standard violations and faultswithin and across software versions, In Mining Software Repositories, 2009. MSR’09. 6th IEEE International Working Conference on, May 2009, pp. 41–50.Google Scholar
- Brown, N., Cai, Y., Guo, Y., Kazman, R., Kim, M., Kruchten, P., Lim, E., MacCormack, A., Nord, R., Ozkaya, I., Sangwan, R., Seaman, C., Sullivan, K., Zazworka, N. (2010) Managing technical debt in software-reliant systems. In Proceedings of the FSE/SDP workshop on Future of Software Engineering Research (pp. 47–52), ser. FoSER’10. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1882362.1882373.
- Brown, W. J., Malveau, R. C., Mowbray, T. J. (1998) AntiPatterns: Refactoring software, architectures, and projects in crisis. Wiley, Mar. 1998. [Online]. Available: http://www.worldcat.org/-isbn/-0471197130.
- CAST, (2010) Cast worldwide application software quality study: Summary of key findings, Tech. Rep.Google Scholar
- Cohen, J. (1988) Statistical power analysis for the behavioral sciences: Jacob Cohen., 2nd ed. Lawrence Erlbaum, Jan. 1988. [Online]. Available: http://www.worldcat.org/-isbn/-0805802835.
- Cunningham W (1992) The wycash portfolio management system, In Addendum to the Proceedings on Object-Oriented Programming Systems, Languages, and Applications (Addendum), ser. OOPSLA’92 (pp. 29–30). New York, NY: ACM. [Online]. Available: 10.1145/-157709.157715.
- D’Ambros, M., Bacchelli, A., Lanza, M. (2010) On the impact of design flaws on software defects, In Quality Software (QSIC), 2010 10th International Conference on, July 2010 (pp. 23–31).Google Scholar
- El Emam, K., Wieczorek, I. (1998) The repeatability of code defect classifications, In Software Reliability Engineering, 1998. Proceedings. The Ninth International Symposium on, nov 1998 (pp. 322–333).Google Scholar
- Evans, J. (1996) Straightforward Statistics for the Behavioral Sciences. Brooks/Cole Pub. Co., 1996. [Online]. Available: http://books.google.com/-books?id=8Ca2AAAAIAAJ.
- Fenton, N., Neil, M., Marsh, W., Hearty, P., Marquez, D., Krause, P., Mishra, R. (2007) Predicting software defects in varying development lifecycles using bayesian nets, Information and Software Technology, 49:(1): 32–43, Most Cited Journal Articles in Software Engineering—2000. [Online]. Available: http://www.sciencedirect.com/-science/-article/-pii/-S0950584906001194.
- Fleiss, J. L. (1981) Statistical Methods for Rates and Proportions, 2nd ed., Wiley series in probability and mathematical statistics. New York: Wiley.Google Scholar
- Fowler, M., Beck, K., Brant, J., Opdyke, W., & Roberts, D. (1999). Refactoring: Improving the design of existing code (1st ed.). Jul: Addison-Wesley Professional.Google Scholar
- Gat, I., Heintz, J.D., (2011) From assessment to reduction: How cutter consortium helps rein in millions of dollars in technical debt, In Proceedings of the 2nd Workshop on Managing Technical Debt (pp. 24–26), ser. MTD’11. New York, NY, USA: ACM. [Online]. Available: 10.1145/-1985362.1985368.
- Guéhéneuc, Y.-G., Albin-Amiot, H. (2001) Using design patterns and constraints to automate the detection and correction of inter-class design defects, In Proceedings of the 39th International Conference and Exhibition on Technology of Object-Oriented Languages and Systems (TOOLS39) (p 296), ser. TOOLS’01. Washington, DC, USA: IEEE Computer Society, 2001, [Online]. Available: http://dl.acm.org/-citation.cfm?id=882501.884740.
- Hovemeyer, D., Pugh, W. (2004) Finding bugs is easy, SIGPLAN Not., 39: 92–106. [Online]. Available: 10.1145/-1052883.1052895.
- Izurieta, C., Bieman, J. (2007) How software designs decay: A pilot study of pattern evolution, In Empirical Software Engineering and Measurement, 2007. ESEM 2007. First International Symposium on, sept. 2007 (pp. 449–451).Google Scholar
- Izurieta, C., Bieman, J. (2012) A multiple case study of design pattern decay, grime, and rot in evolving software systems, Springer Software Quality Journal. February 2012, 10.1007/s11219-012-9175-x.
- Khomh, F., Di Penta, M., Gueheneuc, Y.-G. (2009) An Exploratory study of the impact of code smells on software change-proneness, Reverse Engineering, 2009. WCRE ‘09. 16th Working Conference on, pp.75–84, 13–16 Oct. 2009.Google Scholar
- Kim, S., Ernst, M. D. (2007) Prioritizing warning categories by analyzing software history, In Proceedings of the Fourth International Workshop on Mining Software Repositories (p. 27), ser. MSR’07. Washington, DC, USA: IEEE Computer Society, [Online]. Available: 10.1109/-MSR.2007.26.
- Kim, S., Ernst, M. D. (2007) Which warnings should i fix first? In Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (pp. 45–54), ser. ESEC-FSE’07. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1287624.1287633.
- Marinescu, R. (2004). Detection strategies: Metrics-based rules for detecting design flaws. Software Maintenance, IEEE International Conference on, 0, 350–359.Google Scholar
- Muthanna, S., Kontogiannis, K., Ponnambalam, K. Stacey, B. (2000) A maintainability model for industrial software systems using design level metrics, In Reverse Engineering, 2000. Proceedings. Seventh Working Conference on, 2000, pp. 248–256.Google Scholar
- Nagappan, N., Ball, T. (2005) Static analysis tools as early indicators of pre-release defect density, In Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on, may 2005.Google Scholar
- Nagappan, N., Ball, T., Zeller, A. (2006) Mining metrics to predict component failures, In Proceedings of the 28th International Conference on Software Engineering (pp. 452–461), ser. ICSE’06. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1134285.1134349.
- Nugroho, A., Visser, J., Kuipers, T. (2011) An empirical model of technical debt and interest, In Proceedings of the 2nd Workshop on Managing Technical Debt (pp. 1–8), ser. MTD’11. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1985362.1985364.
- Olbrich, S. M., Cruzes, D. S., Sjoberg, D. I. K. (2010) Are all code smells harmful? a study of god classes and brain classes in the evolution of three open source systems, In Proceedings of the 2010 IEEE International Conference on Software Maintenance (pp. 1–10), ser. ICSM’10. Washington, DC, USA: IEEE Computer Society. [Online]. Available: 10.1109/-ICSM.2010.5609564.
- Park, H.-M., Jung, H.-W. (2003) Evaluating interrater agreement with intraclass correlation coefficient in spice-based software process assessment, In Quality Software, 2003. Proceedings of Third International Conference on, Nov. 2003 (pp. 308–314).Google Scholar
- Riaz, M., Mendes, E., Tempero, E. (2009) A systematic review of software maintainability prediction and metrics, In Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on, Oct. 2009 (pp. 367–377).Google Scholar
- Schumacher, J., Zazworka, N., Shull, F., Seaman, C., Shaw, M. (2010) Building empirical support for automated code smell detection, In Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (pp. 8:1–8:10), ser. ESEM’10. New York, NY, USA: ACM. [Online]. Available: 10.1145/-1852786.1852797.
- Vetro’, A., Morisio, M., Torchiano, M. (2011) An empirical validation of findbugs issues related to defects, IET Seminar Digests, 2011(1):144–153. [Online]. Available: http://-link.aip.org/-link/-abstract/-IEESEM/-v2011/-i1/-p144/-s1.Google Scholar
- Vetro’, A., Torchiano, M., Morisio, M. (2010) Assessing the precision of findbugs by mining java projects developed at a university, In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, may 2010, pp. 110–113.Google Scholar
- Wagner, S., Jürjens, J., Koller, C., Trischberger, P. (2005) Comparing bug finding tools with reviews and tests, In Proceedings of the International Conference on Testing of Communications Systems (pp. 40–55).Google Scholar
- Wong, S., Cai, Y., Kim, M., Dalton, M. (2011) Detecting software modularity violations, In Proceedings of 33th International Conference on Software Engineering (pp. 411–420), May 2011.Google Scholar
- Zazworka, N., Shaw, M. A., Shull, F., Seaman, C. (2011) Investigating the impact of design debt on software quality, In Proceeding of the 2nd Working on Managing Technical Debt (pp. 17–23), ser. MTD’11. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1985362.1985366.