Skip to main content
Log in

Influence of confirmation biases of developers on software quality: an empirical study

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

The thought processes of people have a significant impact on software quality, as software is designed, developed and tested by people. Cognitive biases, which are defined as patterned deviations of human thought from the laws of logic and mathematics, are a likely cause of software defects. However, there is little empirical evidence to date to substantiate this assertion. In this research, we focus on a specific cognitive bias, confirmation bias, which is defined as the tendency of people to seek evidence that verifies a hypothesis rather than seeking evidence to falsify a hypothesis. Due to this confirmation bias, developers tend to perform unit tests to make their program work rather than to break their code. Therefore, confirmation bias is believed to be one of the factors that lead to an increased software defect density. In this research, we present a metric scheme that explores the impact of developers’ confirmation bias on software defect density. In order to estimate the effectiveness of our metric scheme in the quantification of confirmation bias within the context of software development, we performed an empirical study that addressed the prediction of the defective parts of software. In our empirical study, we used confirmation bias metrics on five datasets obtained from two companies. Our results provide empirical evidence that human thought processes and cognitive aspects deserve further investigation to improve decision making in software development for effective process management and resource allocation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Bell, R. M., Ostrand, T. J., & Weyuker, E. J. (2006). Looking for bugs in all the right places. In Proceedings of the 2006 international symposium on software testing and analysis, pp. 61–71.

  • Bird, C., Nagappan, N., Gall, H., Murphy, B., & Devanbu, P. (2009). Putting it all together: Using socio-technical networks to predict failures. In Proceedings of the 17th international symposium on software reliability engineering.

  • Boehm, B., & Basili, V. R.(2001). Software defect reduction top 10 list. IEEE Software, pp. 135–137.

  • Bullard, L. A., & Gao, K. (2007). An application of a rule-based model in software quality classification. In Proceedings of the 6th international conference on machine learning and applications, pp. 204–210.

  • Calikli, G., Bener, A., & Arslan, B. (2010a). An analysis of the effects of company culture, education and experience on confirmation bias levels of software developers and testers. In Proceedings of 32nd international conference on software engineering.

  • Calikli, G., Arslan, B., & Bener, A. (2010b). Confirmation bias in software development and testing: An analysis of the effects of company size, experience and reasoning skills. In Proceedings of the 22nd annual psychology of programming interest group workshop.

  • Calikli, G., & Bener, A. (2010). Empirical analyses factors affecting confirmation bias and the effects of confirmation bias on software developer/tester performance. In Proceedings of 5th international workshop on predictor models in software engineering.

  • Cheng, P. W., & Holyoak, K. J. (1985). Pragmatic reasoning schemas. Cognitive Psychology, 17, 391–416.

    Article  Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, New Jersey: Lawrence Erlbaum Associates Publishers.

    MATH  Google Scholar 

  • Cohen, J. (1992). A power primer. Psychology Bulletin, 112(1), 155–159.

    Article  Google Scholar 

  • Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin.

    Google Scholar 

  • Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped how humans reason? Studies with Wason’s selection task. Cognition, 31, 187–276.

    Article  Google Scholar 

  • Cox, J. R., & Griggs, R. A. (1982). The effects of experience on performance in Wasons selection task. Memory and Cognition, 10, 496–502.

    Article  Google Scholar 

  • Drummond, C., & Holte, R. C. (2003). C4.5, Class imbalance and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of 2nd workshop on learning from imbalanced datasets.

  • Evans, J. St. B. T., Lynch, J. S. (1973). Matching bias in the selection task. British Journal of Psychology, 64, 391–397.

    Article  Google Scholar 

  • Evans, J. St. B. T., Newstead, S. E., & Byrne, R. M. (1993). Human reasoning: The psychology of deduction. East Sussex, UK: Lawrence Erlbaum Associates Ltd.

    Google Scholar 

  • Graves, T. L., Karr, A. F., Marron, J. S., & Siy, H. (2000). Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26(7), 653–661.

    Article  Google Scholar 

  • Griggs, R. A. (1983). The role of problem content in the selection task and in the THOG problem, thinking and reasoning: psychological approaches. London: Routledge and Kegan Paul London.

    Google Scholar 

  • Griggs, R. A., & Cox, J. R. (1982). The elusive thematic materials effect in Wasons selection task. British Journal of Psychology, 73, 407–420.

    Article  Google Scholar 

  • Griggs, R. A., & Ransdell, S. E. (1986). Scientists and the selection task. Social Studies of Science, 16, 319–330.

    Article  Google Scholar 

  • Hall, M. A., & Holmes, G., (2003). Benchmarking attribute selection for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering, 15, 1437–1447.

    Article  Google Scholar 

  • Halstead, M. (1977). Elements of software science. New York: Elsevier.

    MATH  Google Scholar 

  • Harrold, M. (2000). Testing: A roadmap. In Proceedings of the conference on the future of software engineering, pp. 61–72.

  • Jackson, S. L., & Griggs, R. A. (1988). Education and the selection task. Bulletin of Psychometric Society, 26, 327–330.

    Google Scholar 

  • Jiang, Y., Cuki, B., Menzies, T., Bartlow, N. (2008). Comparing design and code metrics for software quality prediction. In Proceedings of the 4th international workshop on predictor models in software engineering.

  • Johnson-Laird, P. N., & Tridgell, J. M. (1972). When negation is easier than affirmation. Quarterly Journal of Experimental Psychology, 24, 87–91.

    Article  Google Scholar 

  • Johnson-Laird, P. N., & Wason, P. C. (1970). A theoretical analysis of insight into a reasoning task. Cognitive Psychology, 1, 134–148.

    Article  Google Scholar 

  • Jørgensen, M. (2007). Estimation on software development work effort: Evidence on expert judgment and formal models. International Journal of Forecasting, 23(3), 449–462.

    Article  Google Scholar 

  • Jørgensen, M. (2010). Identification of more risks can lead to increased over-optimism of and over-confidence in software development effort estimates. Journal of Information and Software Technology, 52(5), 506–516.

    Article  Google Scholar 

  • Kahneman D., Slovic P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press.

    Google Scholar 

  • Kamei, Y., Monden, A., Matsumoto, T., & Matsumoto, K. (2007). The effects of over and under-sampling on fault prone module detection. In Proceedings of the 1st international symposium on empirical software engineering and measurement, pp. 196–204.

  • Khoshgoftaar, T. M. (2003). Building decision tree software quality classification models using genetic programming. In Proceedings of the genetic and evolutionary computation conference.

  • Khoshgoftaar, T. M., & Allen, E. B. (1999). Predicting fault-prone software modules in embedded systems with classification trees. In Proceedings of the 4th IEEE international symposium on high-assurance systems engineering.

  • Khoshgoftaar, T. M., & Szabo, R. M. (1996). Using neural networks to predict software faults during testing. IEEE Transactions on Reliability, 45, 456–462.

    Article  Google Scholar 

  • Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). Supervised neural network modeling: An empirical investigation into learning from imbalanced data with labeling errors. IEEE Transactions on Neural Networks, 21(5), 813–830.

    Article  Google Scholar 

  • Kocaguneli, E., Tosun, A., Bener, A., Turhan, B., & Caglayan, B. (2009). Prest: An intelligent software metrics extraction, analysis and defect prediction tool. Proceedings of 21st international conference on software engineering and knowledge engineering, pp. 637–642.

  • Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 485–496.

    Article  Google Scholar 

  • Mair, C., & Shepperd, M. (2011). Human judgment and software metrics: Vision for the future. In Proceedings of the 2nd international workshop on emerging trends in software metrics.

  • Manktelow, K. I., & Evans, J. St. B. T. (1979). Facilitation of reasoning by realism: Effect or non-effect? British Journal of Psychology, 70, 477–488.

    Article  Google Scholar 

  • Manktelow, K. I., & Over, D. E. (1990). Inference and understanding: A philosophical and psychological perspective.

  • Mataraso-Roth, E. (1979). Facilitating insight in a reasoning task. British Journal of Psychology, 70, 265–271.

    Article  Google Scholar 

  • McCabe, T. (1976). A complexity measure, IEEE Transactions on Software Engineering, 2, 308–320.

    Article  MATH  MathSciNet  Google Scholar 

  • Meneely, A., Williams, L, Snipes, W., & Osborne, J. (2008). Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, pp. 13–23.

  • Menzies, T., Turhan, B., Bener, A., Gay, G., Cukic, B., & Jiang, Y. (2008). Implications of ceiling effects in defect predictors. Proceedings of the 3rd workshop on predictive models in software engineering, pp. 47–54.

  • Menzies, T. Z., Hihn, C. J., & Lum, K. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.

    Article  Google Scholar 

  • Misirli-Tosun, A., Caglayan, B., Mirasky, A., Bener, A., & Ruffolo, N. (2011). Different strokes for different folks: A case study on software metrics for different defect categories. In Proceedings of the 2nd workshop on emerging trends in software metrics, pp. 45–51.

  • Mockus, A., & Weiss, D. M. (2000). Predicting risk of software changes. Bell Labs Technical Journal, 5, 169–180.

    Google Scholar 

  • Munson, J. C., & Khoshgoftaar, T. M. (1992). Detection of fault prone programs. IEEE Transactions on Software Engineering, 18(5), 423–433.

    Article  Google Scholar 

  • Nagappan, N. (2004). Toward a software testing and reliability early warning metric suite. In Proceedings of the 26th international conference on software engineering, pp. 60–62.

  • Nagappan, N., & Ball, T. (2007). Using software dependencies and churn metrics to predict field failures: An empirical case study. In Proceedings of the 1st international symposium on empirical software engineering and measurement, pp. 364–373.

  • Nagappan, N., Murphy, B., & Basili, V. R. (2008). The influence of organizational structure on software quality: An empirical case study. In Proceedings of the 30th international conference on software engineering, pp. 521–530.

  • Ostrand, T. J., & Weyuker, E. J., & Bell, R. M. (2007). Automating algorithms for the identification of fault prone files. In Proceedings of the 2007 international symposium on software testing and analysis, pp. 219–227.

  • Ostrand, T. J., Weyuker, E. J., & Bell, R. M. (2010). Programmer-based fault prediction. Proceedings of the 3rd workshop on predictor models in software engineering, pp 1–7.

  • Ostrand, T. J., Weyuker, E. J., & Bell, R. M. (2004). Where the bugs are. In Proceedings of the 2004 ACM SIGSOFT international symposium on software testing and analysis, pp. 86–96.

  • Parsons, J., & Saunders, C. (2004). Cognitive heuristics in software engineering: applying and extending anchoring and adjustment to artifact reuse. IEEE Transactions on Software Engineering, 30(12), 873–888.

    Article  Google Scholar 

  • Pinzger, M., Nagappan, N., & Murphy, B. (2008). Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, pp. 13–23.

  • Poletiek, F.(2001). Hypothesis-testing behaviour. East Sussex, UK: Psychology Press.

    Google Scholar 

  • Reich, S., & Ruth, P. (1982). Wason’s selection task: Verification, falsification and matching. British Journal of Psychology, 73, 395–405.

    Article  Google Scholar 

  • Stacy, W., & MacMillan, J. (1995). Cognitive bias in software engineering. Communication of the ACM, 38(6), 57–63.

    Google Scholar 

  • Tahat, L. H., Vaysburg, B., Korel, B., & Bader, A. (2001). Requirement based automated black-box test generation. In Proceedings of the 25th annual international computer software and applications conference, pp. 489–495.

  • Teasley, B., Leventhal, L. M., & Rohlman, S. (1993). Positive test bias in software engineering professionals: What is right and what’s wrong. In Proceedings of the 5th workshop on empirical studies of programmers.

  • Teasley, B. F., Leventhal, L. M., Mynatt, C. R., & Rohlman D. S. (1994). Why software testing is sometimes ineffective: Two applied studies of positive test strategy. Journal of Applied Psychology, 79(1), 142–155.

    Article  Google Scholar 

  • Tosun, A., Turhan, B., & Bener, A. (2009). Practical considerations in deploying AI for defect prediction: A case study within the Turkish telecommunication industry. In Proceedings of 5th international conference on predictor models in software engineering.

  • Tosun, A., Turhan, B., & Bener, A. (2008). Ensemble of software defect predictors: A case study. In Proceedings of 2nd international symposium on empirical software engineering and measurement.

  • Turhan, B., Kocak, G., & Bener, A. (2008). Software defect prediction using call graph based ranking (CGBR) framework. In Proceedings of. 34th international EUROMICRO software engineering and advanced applications conference.

  • Turhan, B., & Bener, A. (2007). A multivariate analysis of static code attributes for defect prediction. In Proceedings of the 7th international conference on quality software, pp. 231–237.

  • Turhan, B., & Bener, A. (2008). Weighted static code attributes for software defect prediction. In Proceedings of the 20th international conference on software engineering and knowledge engineering, pp. 143–148.

  • Turhan, B., Bener, A., & Menzies, T. (2008). Nearest neighbor sampling for cross company defect predictors. Proceedings of the 1st international workshop on defects in large software systems.

  • Valentine, E. R. (1975). Performance on two reasoning tasks in relation to intelligence, divergence and interference proneness. British Journal of Educational Psychology, 45, 198–205.

    Article  Google Scholar 

  • Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12, 129–140.

    Article  Google Scholar 

  • Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20, 273–281.

    Article  Google Scholar 

  • Wason P. C., & Shapiro, D. (1971). Natural and contrived experience in a reasoning problem. Quarterly Journal of Experimental Psychology, 23, 63–71.

    Article  Google Scholar 

  • Weyuker, E. J., Ostrand, T. J., & Bell, R. M. (2007). Using developer information as a factor for fault prediction. Proceedings of the 1st international workshop on predictor models in software engineering, pp. 1–7.

  • Weyuker, E. J., Ostrand, T. J., & Bell, R. M. (2008). Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Journal of Empirical Software Engineering, 13, 539–559.

    Article  Google Scholar 

  • Ye, M. (2011). Does gradualism build coordination? Evidence from laboratory experiments. Job Market Paper.

  • Zhao, M., Wohlin, C., Ohlsson, N., & Xie, M. (1998). A comparison between software design and code metrics for the prediction of software fault content. Information and Software Technology, 40(14), 801–809.

    Article  Google Scholar 

  • Zimmerman, T., & Nagappan, N. (2007). Predicting subsystem failures using dependency graph complexities. In Proceedings of the 18th IEEE international symposium on software reliability, pp. 227–236.

Download references

Acknowledgments

We would like to thank Turkcell A. Ş., and Turgay Aytaç and Ayhan Inal from Logo Business Solutions for their support in sharing data. We would like to extend our gratitude to Dr. Michelle Mattern for editing the final manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gül Çalıklı.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Çalıklı, G., Bener, A.B. Influence of confirmation biases of developers on software quality: an empirical study. Software Qual J 21, 377–416 (2013). https://doi.org/10.1007/s11219-012-9180-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-012-9180-0

Keywords

Navigation