Advertisement

Software Quality Journal

, Volume 21, Issue 2, pp 377–416 | Cite as

Influence of confirmation biases of developers on software quality: an empirical study

  • Gül ÇalıklıEmail author
  • Ayşe Başar Bener
Article

Abstract

The thought processes of people have a significant impact on software quality, as software is designed, developed and tested by people. Cognitive biases, which are defined as patterned deviations of human thought from the laws of logic and mathematics, are a likely cause of software defects. However, there is little empirical evidence to date to substantiate this assertion. In this research, we focus on a specific cognitive bias, confirmation bias, which is defined as the tendency of people to seek evidence that verifies a hypothesis rather than seeking evidence to falsify a hypothesis. Due to this confirmation bias, developers tend to perform unit tests to make their program work rather than to break their code. Therefore, confirmation bias is believed to be one of the factors that lead to an increased software defect density. In this research, we present a metric scheme that explores the impact of developers’ confirmation bias on software defect density. In order to estimate the effectiveness of our metric scheme in the quantification of confirmation bias within the context of software development, we performed an empirical study that addressed the prediction of the defective parts of software. In our empirical study, we used confirmation bias metrics on five datasets obtained from two companies. Our results provide empirical evidence that human thought processes and cognitive aspects deserve further investigation to improve decision making in software development for effective process management and resource allocation.

Keywords

Human factors Software psychology Defect prediction Confirmation bias 

Notes

Acknowledgments

We would like to thank Turkcell A. Ş., and Turgay Aytaç and Ayhan Inal from Logo Business Solutions for their support in sharing data. We would like to extend our gratitude to Dr. Michelle Mattern for editing the final manuscript.

References

  1. Bell, R. M., Ostrand, T. J., & Weyuker, E. J. (2006). Looking for bugs in all the right places. In Proceedings of the 2006 international symposium on software testing and analysis, pp. 61–71.Google Scholar
  2. Bird, C., Nagappan, N., Gall, H., Murphy, B., & Devanbu, P. (2009). Putting it all together: Using socio-technical networks to predict failures. In Proceedings of the 17th international symposium on software reliability engineering.Google Scholar
  3. Boehm, B., & Basili, V. R.(2001). Software defect reduction top 10 list. IEEE Software, pp. 135–137.Google Scholar
  4. Bullard, L. A., & Gao, K. (2007). An application of a rule-based model in software quality classification. In Proceedings of the 6th international conference on machine learning and applications, pp. 204–210.Google Scholar
  5. Calikli, G., Bener, A., & Arslan, B. (2010a). An analysis of the effects of company culture, education and experience on confirmation bias levels of software developers and testers. In Proceedings of 32nd international conference on software engineering.Google Scholar
  6. Calikli, G., Arslan, B., & Bener, A. (2010b). Confirmation bias in software development and testing: An analysis of the effects of company size, experience and reasoning skills. In Proceedings of the 22nd annual psychology of programming interest group workshop.Google Scholar
  7. Calikli, G., & Bener, A. (2010). Empirical analyses factors affecting confirmation bias and the effects of confirmation bias on software developer/tester performance. In Proceedings of 5th international workshop on predictor models in software engineering.Google Scholar
  8. Cheng, P. W., & Holyoak, K. J. (1985). Pragmatic reasoning schemas. Cognitive Psychology, 17, 391–416.CrossRefGoogle Scholar
  9. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, New Jersey: Lawrence Erlbaum Associates Publishers.zbMATHGoogle Scholar
  10. Cohen, J. (1992). A power primer. Psychology Bulletin, 112(1), 155–159.CrossRefGoogle Scholar
  11. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin.Google Scholar
  12. Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped how humans reason? Studies with Wason’s selection task. Cognition, 31, 187–276.CrossRefGoogle Scholar
  13. Cox, J. R., & Griggs, R. A. (1982). The effects of experience on performance in Wasons selection task. Memory and Cognition, 10, 496–502.CrossRefGoogle Scholar
  14. Drummond, C., & Holte, R. C. (2003). C4.5, Class imbalance and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of 2nd workshop on learning from imbalanced datasets.Google Scholar
  15. Evans, J. St. B. T., Lynch, J. S. (1973). Matching bias in the selection task. British Journal of Psychology, 64, 391–397.CrossRefGoogle Scholar
  16. Evans, J. St. B. T., Newstead, S. E., & Byrne, R. M. (1993). Human reasoning: The psychology of deduction. East Sussex, UK: Lawrence Erlbaum Associates Ltd.Google Scholar
  17. Graves, T. L., Karr, A. F., Marron, J. S., & Siy, H. (2000). Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26(7), 653–661.CrossRefGoogle Scholar
  18. Griggs, R. A. (1983). The role of problem content in the selection task and in the THOG problem, thinking and reasoning: psychological approaches. London: Routledge and Kegan Paul London.Google Scholar
  19. Griggs, R. A., & Cox, J. R. (1982). The elusive thematic materials effect in Wasons selection task. British Journal of Psychology, 73, 407–420.CrossRefGoogle Scholar
  20. Griggs, R. A., & Ransdell, S. E. (1986). Scientists and the selection task. Social Studies of Science, 16, 319–330.CrossRefGoogle Scholar
  21. Hall, M. A., & Holmes, G., (2003). Benchmarking attribute selection for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering, 15, 1437–1447.CrossRefGoogle Scholar
  22. Halstead, M. (1977). Elements of software science. New York: Elsevier.zbMATHGoogle Scholar
  23. Harrold, M. (2000). Testing: A roadmap. In Proceedings of the conference on the future of software engineering, pp. 61–72.Google Scholar
  24. Jackson, S. L., & Griggs, R. A. (1988). Education and the selection task. Bulletin of Psychometric Society, 26, 327–330.Google Scholar
  25. Jiang, Y., Cuki, B., Menzies, T., Bartlow, N. (2008). Comparing design and code metrics for software quality prediction. In Proceedings of the 4th international workshop on predictor models in software engineering.Google Scholar
  26. Johnson-Laird, P. N., & Tridgell, J. M. (1972). When negation is easier than affirmation. Quarterly Journal of Experimental Psychology, 24, 87–91.CrossRefGoogle Scholar
  27. Johnson-Laird, P. N., & Wason, P. C. (1970). A theoretical analysis of insight into a reasoning task. Cognitive Psychology, 1, 134–148.CrossRefGoogle Scholar
  28. Jørgensen, M. (2007). Estimation on software development work effort: Evidence on expert judgment and formal models. International Journal of Forecasting, 23(3), 449–462.CrossRefGoogle Scholar
  29. Jørgensen, M. (2010). Identification of more risks can lead to increased over-optimism of and over-confidence in software development effort estimates. Journal of Information and Software Technology, 52(5), 506–516.CrossRefGoogle Scholar
  30. Kahneman D., Slovic P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press.Google Scholar
  31. Kamei, Y., Monden, A., Matsumoto, T., & Matsumoto, K. (2007). The effects of over and under-sampling on fault prone module detection. In Proceedings of the 1st international symposium on empirical software engineering and measurement, pp. 196–204.Google Scholar
  32. Khoshgoftaar, T. M. (2003). Building decision tree software quality classification models using genetic programming. In Proceedings of the genetic and evolutionary computation conference.Google Scholar
  33. Khoshgoftaar, T. M., & Allen, E. B. (1999). Predicting fault-prone software modules in embedded systems with classification trees. In Proceedings of the 4th IEEE international symposium on high-assurance systems engineering.Google Scholar
  34. Khoshgoftaar, T. M., & Szabo, R. M. (1996). Using neural networks to predict software faults during testing. IEEE Transactions on Reliability, 45, 456–462.CrossRefGoogle Scholar
  35. Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). Supervised neural network modeling: An empirical investigation into learning from imbalanced data with labeling errors. IEEE Transactions on Neural Networks, 21(5), 813–830.CrossRefGoogle Scholar
  36. Kocaguneli, E., Tosun, A., Bener, A., Turhan, B., & Caglayan, B. (2009). Prest: An intelligent software metrics extraction, analysis and defect prediction tool. Proceedings of 21st international conference on software engineering and knowledge engineering, pp. 637–642.Google Scholar
  37. Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 485–496.CrossRefGoogle Scholar
  38. Mair, C., & Shepperd, M. (2011). Human judgment and software metrics: Vision for the future. In Proceedings of the 2nd international workshop on emerging trends in software metrics.Google Scholar
  39. Manktelow, K. I., & Evans, J. St. B. T. (1979). Facilitation of reasoning by realism: Effect or non-effect? British Journal of Psychology, 70, 477–488.CrossRefGoogle Scholar
  40. Manktelow, K. I., & Over, D. E. (1990). Inference and understanding: A philosophical and psychological perspective.Google Scholar
  41. Mataraso-Roth, E. (1979). Facilitating insight in a reasoning task. British Journal of Psychology, 70, 265–271.CrossRefGoogle Scholar
  42. McCabe, T. (1976). A complexity measure, IEEE Transactions on Software Engineering, 2, 308–320.zbMATHCrossRefMathSciNetGoogle Scholar
  43. Meneely, A., Williams, L, Snipes, W., & Osborne, J. (2008). Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, pp. 13–23.Google Scholar
  44. Menzies, T., Turhan, B., Bener, A., Gay, G., Cukic, B., & Jiang, Y. (2008). Implications of ceiling effects in defect predictors. Proceedings of the 3rd workshop on predictive models in software engineering, pp. 47–54.Google Scholar
  45. Menzies, T. Z., Hihn, C. J., & Lum, K. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.CrossRefGoogle Scholar
  46. Misirli-Tosun, A., Caglayan, B., Mirasky, A., Bener, A., & Ruffolo, N. (2011). Different strokes for different folks: A case study on software metrics for different defect categories. In Proceedings of the 2nd workshop on emerging trends in software metrics, pp. 45–51.Google Scholar
  47. Mockus, A., & Weiss, D. M. (2000). Predicting risk of software changes. Bell Labs Technical Journal, 5, 169–180.Google Scholar
  48. Munson, J. C., & Khoshgoftaar, T. M. (1992). Detection of fault prone programs. IEEE Transactions on Software Engineering, 18(5), 423–433.CrossRefGoogle Scholar
  49. Nagappan, N. (2004). Toward a software testing and reliability early warning metric suite. In Proceedings of the 26th international conference on software engineering, pp. 60–62.Google Scholar
  50. Nagappan, N., & Ball, T. (2007). Using software dependencies and churn metrics to predict field failures: An empirical case study. In Proceedings of the 1st international symposium on empirical software engineering and measurement, pp. 364–373.Google Scholar
  51. Nagappan, N., Murphy, B., & Basili, V. R. (2008). The influence of organizational structure on software quality: An empirical case study. In Proceedings of the 30th international conference on software engineering, pp. 521–530.Google Scholar
  52. Ostrand, T. J., & Weyuker, E. J., & Bell, R. M. (2007). Automating algorithms for the identification of fault prone files. In Proceedings of the 2007 international symposium on software testing and analysis, pp. 219–227.Google Scholar
  53. Ostrand, T. J., Weyuker, E. J., & Bell, R. M. (2010). Programmer-based fault prediction. Proceedings of the 3rd workshop on predictor models in software engineering, pp 1–7.Google Scholar
  54. Ostrand, T. J., Weyuker, E. J., & Bell, R. M. (2004). Where the bugs are. In Proceedings of the 2004 ACM SIGSOFT international symposium on software testing and analysis, pp. 86–96.Google Scholar
  55. Parsons, J., & Saunders, C. (2004). Cognitive heuristics in software engineering: applying and extending anchoring and adjustment to artifact reuse. IEEE Transactions on Software Engineering, 30(12), 873–888.CrossRefGoogle Scholar
  56. Pinzger, M., Nagappan, N., & Murphy, B. (2008). Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, pp. 13–23.Google Scholar
  57. Poletiek, F.(2001). Hypothesis-testing behaviour. East Sussex, UK: Psychology Press.Google Scholar
  58. Reich, S., & Ruth, P. (1982). Wason’s selection task: Verification, falsification and matching. British Journal of Psychology, 73, 395–405.CrossRefGoogle Scholar
  59. Stacy, W., & MacMillan, J. (1995). Cognitive bias in software engineering. Communication of the ACM, 38(6), 57–63.Google Scholar
  60. Tahat, L. H., Vaysburg, B., Korel, B., & Bader, A. (2001). Requirement based automated black-box test generation. In Proceedings of the 25th annual international computer software and applications conference, pp. 489–495.Google Scholar
  61. Teasley, B., Leventhal, L. M., & Rohlman, S. (1993). Positive test bias in software engineering professionals: What is right and what’s wrong. In Proceedings of the 5th workshop on empirical studies of programmers.Google Scholar
  62. Teasley, B. F., Leventhal, L. M., Mynatt, C. R., & Rohlman D. S. (1994). Why software testing is sometimes ineffective: Two applied studies of positive test strategy. Journal of Applied Psychology, 79(1), 142–155.CrossRefGoogle Scholar
  63. Tosun, A., Turhan, B., & Bener, A. (2009). Practical considerations in deploying AI for defect prediction: A case study within the Turkish telecommunication industry. In Proceedings of 5th international conference on predictor models in software engineering.Google Scholar
  64. Tosun, A., Turhan, B., & Bener, A. (2008). Ensemble of software defect predictors: A case study. In Proceedings of 2nd international symposium on empirical software engineering and measurement.Google Scholar
  65. Turhan, B., Kocak, G., & Bener, A. (2008). Software defect prediction using call graph based ranking (CGBR) framework. In Proceedings of. 34th international EUROMICRO software engineering and advanced applications conference.Google Scholar
  66. Turhan, B., & Bener, A. (2007). A multivariate analysis of static code attributes for defect prediction. In Proceedings of the 7th international conference on quality software, pp. 231–237.Google Scholar
  67. Turhan, B., & Bener, A. (2008). Weighted static code attributes for software defect prediction. In Proceedings of the 20th international conference on software engineering and knowledge engineering, pp. 143–148.Google Scholar
  68. Turhan, B., Bener, A., & Menzies, T. (2008). Nearest neighbor sampling for cross company defect predictors. Proceedings of the 1st international workshop on defects in large software systems.Google Scholar
  69. Valentine, E. R. (1975). Performance on two reasoning tasks in relation to intelligence, divergence and interference proneness. British Journal of Educational Psychology, 45, 198–205.CrossRefGoogle Scholar
  70. Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12, 129–140.CrossRefGoogle Scholar
  71. Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20, 273–281.CrossRefGoogle Scholar
  72. Wason P. C., & Shapiro, D. (1971). Natural and contrived experience in a reasoning problem. Quarterly Journal of Experimental Psychology, 23, 63–71.CrossRefGoogle Scholar
  73. Weyuker, E. J., Ostrand, T. J., & Bell, R. M. (2007). Using developer information as a factor for fault prediction. Proceedings of the 1st international workshop on predictor models in software engineering, pp. 1–7.Google Scholar
  74. Weyuker, E. J., Ostrand, T. J., & Bell, R. M. (2008). Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Journal of Empirical Software Engineering, 13, 539–559.CrossRefGoogle Scholar
  75. Ye, M. (2011). Does gradualism build coordination? Evidence from laboratory experiments. Job Market Paper.Google Scholar
  76. Zhao, M., Wohlin, C., Ohlsson, N., & Xie, M. (1998). A comparison between software design and code metrics for the prediction of software fault content. Information and Software Technology, 40(14), 801–809.CrossRefGoogle Scholar
  77. Zimmerman, T., & Nagappan, N. (2007). Predicting subsystem failures using dependency graph complexities. In Proceedings of the 18th IEEE international symposium on software reliability, pp. 227–236.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Computer EngineeringBoğaziçi UniversityBebekTurkey
  2. 2.Ted Rogers School of Information Technology ManagementRyerson UniversityTorontoCanada

Personalised recommendations