Software Quality Journal

, Volume 22, Issue 3, pp 403–426 | Cite as

Comparing four approaches for technical debt identification

  • Nico Zazworka
  • Antonio Vetro’
  • Clemente Izurieta
  • Sunny Wong
  • Yuanfang Cai
  • Carolyn Seaman
  • Forrest Shull
Article

Abstract

Software systems accumulate technical debt (TD) when short-term goals in software development are traded for long-term goals (e.g., quick-and-dirty implementation to reach a release date versus a well-refactored implementation that supports the long-term health of the project). Some forms of TD accumulate over time in the form of source code that is difficult to work with and exhibits a variety of anomalies. A number of source code analysis techniques and tools have been proposed to potentially identify the code-level debt accumulated in a system. What has not yet been studied is if using multiple tools to detect TD can lead to benefits, that is, if different tools will flag the same or different source code components. Further, these techniques also lack investigation into the symptoms of TD “interest” that they lead to. To address this latter question, we also investigated whether TD, as identified by the source code analysis techniques, correlates with interest payments in the form of increased defect- and change-proneness. Comparing the results of different TD identification approaches to understand their commonalities and differences and to evaluate their relationship to indicators of future TD “interest.” We selected four different TD identification techniques (code smells, automatic static analysis issues, grime buildup, and Modularity violations) and applied them to 13 versions of the Apache Hadoop open source software project. We collected and aggregated statistical measures to investigate whether the different techniques identified TD indicators in the same or different classes and whether those classes in turn exhibited high interest (in the form of a large number of defects and higher change-proneness). The outputs of the four approaches have very little overlap and are therefore pointing to different problems in the source code. Dispersed Coupling and Modularity violations were co-located in classes with higher defect-proneness. We also observed a strong relationship between Modularity violations and change-proneness. Our main contribution is an initial overview of the TD landscape, showing that different TD techniques are loosely coupled and therefore indicate problems in different locations of the source code. Moreover, our proxy interest indicators (change- and defect-proneness) correlate with only a small subset of TD indicators.

Keywords

Technical debt Software maintenance Software quality Source code analysis Modularity violations Grime Code smells ASA 

References

  1. Altman, D. G. (1990) Practical Statistics for Medical Research (Statistics texts), 1st ed. Chapman & Hall/CRC, Nov. 1990. [Online]. Available: http://www.worldcat.org/-isbn/-0412276305.
  2. Ayewah, N., Pugh, W. (2010) The google findbugs fixit, In Proceedings of the 19th International Symposium on Software Testing and Analysis (pp. 241–252), ser. ISSTA’10. New York, NY, USA: ACM. [Online]. Available: 10.1145/-1831708.1831738.
  3. Basili, V. R., & Weiss, D. M. (1984). A methodology for collecting valid software engineering data. Software Engineering, IEEE Transactions on, SE-10(6), 728–738.CrossRefGoogle Scholar
  4. Izurieta C., Bieman, J. (2008) Testing consequences of grime buildup in object oriented design patterns, In Software Testing, Verification, and Validation, 2008 1st International Conference on, April 2008, pp. 171–179.Google Scholar
  5. Bieman, J.M., Straw, G., Wang, H., Munger, P.W., Alexander, R.T. (2003) Design patterns and change proneness: An examination of five evolving systems, Software Metrics Symposium, 2003. Proceedings. Ninth International, pp. 40–49, 3–5 Sept. 2003.Google Scholar
  6. Boogerd, C., Moonen, L. (2009) Evaluating the relation between coding standard violations and faultswithin and across software versions, In Mining Software Repositories, 2009. MSR’09. 6th IEEE International Working Conference on, May 2009, pp. 41–50.Google Scholar
  7. Brown, N., Cai, Y., Guo, Y., Kazman, R., Kim, M., Kruchten, P., Lim, E., MacCormack, A., Nord, R., Ozkaya, I., Sangwan, R., Seaman, C., Sullivan, K., Zazworka, N. (2010) Managing technical debt in software-reliant systems. In Proceedings of the FSE/SDP workshop on Future of Software Engineering Research (pp. 47–52), ser. FoSER’10. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1882362.1882373.
  8. Brown, W. J., Malveau, R. C., Mowbray, T. J. (1998) AntiPatterns: Refactoring software, architectures, and projects in crisis. Wiley, Mar. 1998. [Online]. Available: http://www.worldcat.org/-isbn/-0471197130.
  9. CAST, (2010) Cast worldwide application software quality study: Summary of key findings, Tech. Rep.Google Scholar
  10. Cohen, J. (1988) Statistical power analysis for the behavioral sciences: Jacob Cohen., 2nd ed. Lawrence Erlbaum, Jan. 1988. [Online]. Available: http://www.worldcat.org/-isbn/-0805802835.
  11. Cunningham W (1992) The wycash portfolio management system, In Addendum to the Proceedings on Object-Oriented Programming Systems, Languages, and Applications (Addendum), ser. OOPSLA’92 (pp. 29–30). New York, NY: ACM. [Online]. Available: 10.1145/-157709.157715.
  12. D’Ambros, M., Bacchelli, A., Lanza, M. (2010) On the impact of design flaws on software defects, In Quality Software (QSIC), 2010 10th International Conference on, July 2010 (pp. 23–31).Google Scholar
  13. El Emam, K., Wieczorek, I. (1998) The repeatability of code defect classifications, In Software Reliability Engineering, 1998. Proceedings. The Ninth International Symposium on, nov 1998 (pp. 322–333).Google Scholar
  14. Evans, J. (1996) Straightforward Statistics for the Behavioral Sciences. Brooks/Cole Pub. Co., 1996. [Online]. Available: http://books.google.com/-books?id=8Ca2AAAAIAAJ.
  15. Fenton, N., Neil, M., Marsh, W., Hearty, P., Marquez, D., Krause, P., Mishra, R. (2007) Predicting software defects in varying development lifecycles using bayesian nets, Information and Software Technology, 49:(1): 32–43, Most Cited Journal Articles in Software Engineering—2000. [Online]. Available: http://www.sciencedirect.com/-science/-article/-pii/-S0950584906001194.
  16. Fleiss, J. L. (1981) Statistical Methods for Rates and Proportions, 2nd ed., Wiley series in probability and mathematical statistics. New York: Wiley.Google Scholar
  17. Fowler, M., Beck, K., Brant, J., Opdyke, W., & Roberts, D. (1999). Refactoring: Improving the design of existing code (1st ed.). Jul: Addison-Wesley Professional.Google Scholar
  18. Gat, I., Heintz, J.D., (2011) From assessment to reduction: How cutter consortium helps rein in millions of dollars in technical debt, In Proceedings of the 2nd Workshop on Managing Technical Debt (pp. 24–26), ser. MTD’11. New York, NY, USA: ACM. [Online]. Available: 10.1145/-1985362.1985368.
  19. Guéhéneuc, Y.-G., Albin-Amiot, H. (2001) Using design patterns and constraints to automate the detection and correction of inter-class design defects, In Proceedings of the 39th International Conference and Exhibition on Technology of Object-Oriented Languages and Systems (TOOLS39) (p 296), ser. TOOLS’01. Washington, DC, USA: IEEE Computer Society, 2001, [Online]. Available: http://dl.acm.org/-citation.cfm?id=882501.884740.
  20. Hovemeyer, D., Pugh, W. (2004) Finding bugs is easy, SIGPLAN Not., 39: 92–106. [Online]. Available: 10.1145/-1052883.1052895.
  21. Izurieta, C., Bieman, J. (2007) How software designs decay: A pilot study of pattern evolution, In Empirical Software Engineering and Measurement, 2007. ESEM 2007. First International Symposium on, sept. 2007 (pp. 449–451).Google Scholar
  22. Izurieta, C., Bieman, J. (2012) A multiple case study of design pattern decay, grime, and rot in evolving software systems, Springer Software Quality Journal. February 2012, 10.1007/s11219-012-9175-x.
  23. Khomh, F., Di Penta, M., Gueheneuc, Y.-G. (2009) An Exploratory study of the impact of code smells on software change-proneness, Reverse Engineering, 2009. WCRE ‘09. 16th Working Conference on, pp.75–84, 13–16 Oct. 2009.Google Scholar
  24. Kim, S., Ernst, M. D. (2007) Prioritizing warning categories by analyzing software history, In Proceedings of the Fourth International Workshop on Mining Software Repositories (p. 27), ser. MSR’07. Washington, DC, USA: IEEE Computer Society, [Online]. Available: 10.1109/-MSR.2007.26.
  25. Kim, S., Ernst, M. D. (2007) Which warnings should i fix first? In Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (pp. 45–54), ser. ESEC-FSE’07. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1287624.1287633.
  26. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.CrossRefMATHMathSciNetGoogle Scholar
  27. Lanza, M., & Marinescu, R. (2006). Object-oriented metrics in practice. Berlin: Springer-Verlag.MATHGoogle Scholar
  28. Marinescu, R. (2004). Detection strategies: Metrics-based rules for detecting design flaws. Software Maintenance, IEEE International Conference on, 0, 350–359.Google Scholar
  29. Muthanna, S., Kontogiannis, K., Ponnambalam, K. Stacey, B. (2000) A maintainability model for industrial software systems using design level metrics, In Reverse Engineering, 2000. Proceedings. Seventh Working Conference on, 2000, pp. 248–256.Google Scholar
  30. Nagappan, N., Ball, T. (2005) Static analysis tools as early indicators of pre-release defect density, In Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on, may 2005.Google Scholar
  31. Nagappan, N., Ball, T., Zeller, A. (2006) Mining metrics to predict component failures, In Proceedings of the 28th International Conference on Software Engineering (pp. 452–461), ser. ICSE’06. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1134285.1134349.
  32. Nugroho, A., Visser, J., Kuipers, T. (2011) An empirical model of technical debt and interest, In Proceedings of the 2nd Workshop on Managing Technical Debt (pp. 1–8), ser. MTD’11. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1985362.1985364.
  33. Olbrich, S. M., Cruzes, D. S., Sjoberg, D. I. K. (2010) Are all code smells harmful? a study of god classes and brain classes in the evolution of three open source systems, In Proceedings of the 2010 IEEE International Conference on Software Maintenance (pp. 1–10), ser. ICSM’10. Washington, DC, USA: IEEE Computer Society. [Online]. Available: 10.1109/-ICSM.2010.5609564.
  34. Park, H.-M., Jung, H.-W. (2003) Evaluating interrater agreement with intraclass correlation coefficient in spice-based software process assessment, In Quality Software, 2003. Proceedings of Third International Conference on, Nov. 2003 (pp. 308–314).Google Scholar
  35. Riaz, M., Mendes, E., Tempero, E. (2009) A systematic review of software maintainability prediction and metrics, In Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on, Oct. 2009 (pp. 367–377).Google Scholar
  36. Schumacher, J., Zazworka, N., Shull, F., Seaman, C., Shaw, M. (2010) Building empirical support for automated code smell detection, In Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (pp. 8:1–8:10), ser. ESEM’10. New York, NY, USA: ACM. [Online]. Available: 10.1145/-1852786.1852797.
  37. Shull, F. (2011). Perfectionists in a world of finite resources. IEEE Software, 28(2), 4–6.CrossRefGoogle Scholar
  38. Vetro’, A., Morisio, M., Torchiano, M. (2011) An empirical validation of findbugs issues related to defects, IET Seminar Digests, 2011(1):144–153. [Online]. Available: http://-link.aip.org/-link/-abstract/-IEESEM/-v2011/-i1/-p144/-s1.Google Scholar
  39. Vetro’, A., Torchiano, M., Morisio, M. (2010) Assessing the precision of findbugs by mining java projects developed at a university, In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, may 2010, pp. 110–113.Google Scholar
  40. Wagner, S., Jürjens, J., Koller, C., Trischberger, P. (2005) Comparing bug finding tools with reviews and tests, In Proceedings of the International Conference on Testing of Communications Systems (pp. 40–55).Google Scholar
  41. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslén, A. (2000). Experimentation in Software Engineering: An Introduction. Norwell, MA, USA: Kluwer Academic Publishers.CrossRefGoogle Scholar
  42. Wong, S., Cai, Y., Kim, M., Dalton, M. (2011) Detecting software modularity violations, In Proceedings of 33th International Conference on Software Engineering (pp. 411–420), May 2011.Google Scholar
  43. Zazworka, N., Shaw, M. A., Shull, F., Seaman, C. (2011) Investigating the impact of design debt on software quality, In Proceeding of the 2nd Working on Managing Technical Debt (pp. 17–23), ser. MTD’11. New York, NY, USA: ACM, [Online]. Available: 10.1145/-1985362.1985366.

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Nico Zazworka
    • 1
    • 2
  • Antonio Vetro’
    • 2
    • 3
  • Clemente Izurieta
    • 4
  • Sunny Wong
    • 5
  • Yuanfang Cai
    • 6
  • Carolyn Seaman
    • 2
    • 7
  • Forrest Shull
    • 2
  1. 1.Elsevier Information Systems GmbHFrankfurt am MainGermany
  2. 2.Fraunhofer CESECollege ParkUSA
  3. 3.Automatics and Informatics DepartmentPolitecnico di TorinoTurinItaly
  4. 4.Department of Computer ScienceMontana State UniversityBozemanUSA
  5. 5.Siemens HealthcareMalvernUSA
  6. 6.Department of Computer ScienceDrexel UniversityPhiladelphiaUSA
  7. 7.Department of Information SystemsUMBCBaltimoreUSA

Personalised recommendations