Influence of confirmation biases of developers on software quality: an empirical study

Çalıklı, Gül; Bener, Ayşe Başar

doi:10.1007/s11219-012-9180-0

Influence of confirmation biases of developers on software quality: an empirical study

Published: 26 July 2012

Volume 21, pages 377–416, (2013)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Gül Çalıklı¹ &
Ayşe Başar Bener²

1294 Accesses
26 Citations
Explore all metrics

Abstract

The thought processes of people have a significant impact on software quality, as software is designed, developed and tested by people. Cognitive biases, which are defined as patterned deviations of human thought from the laws of logic and mathematics, are a likely cause of software defects. However, there is little empirical evidence to date to substantiate this assertion. In this research, we focus on a specific cognitive bias, confirmation bias, which is defined as the tendency of people to seek evidence that verifies a hypothesis rather than seeking evidence to falsify a hypothesis. Due to this confirmation bias, developers tend to perform unit tests to make their program work rather than to break their code. Therefore, confirmation bias is believed to be one of the factors that lead to an increased software defect density. In this research, we present a metric scheme that explores the impact of developers’ confirmation bias on software defect density. In order to estimate the effectiveness of our metric scheme in the quantification of confirmation bias within the context of software development, we performed an empirical study that addressed the prediction of the defective parts of software. In our empirical study, we used confirmation bias metrics on five datasets obtained from two companies. Our results provide empirical evidence that human thought processes and cognitive aspects deserve further investigation to improve decision making in software development for effective process management and resource allocation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Literature reviews as independent studies: guidelines for academic practice

Article Open access 14 October 2022

What is Qualitative in Research

Article Open access 28 October 2021

What Is the Function of Confirmation Bias?

Article Open access 20 April 2020

References

Bell, R. M., Ostrand, T. J., & Weyuker, E. J. (2006). Looking for bugs in all the right places. In Proceedings of the 2006 international symposium on software testing and analysis, pp. 61–71.
Bird, C., Nagappan, N., Gall, H., Murphy, B., & Devanbu, P. (2009). Putting it all together: Using socio-technical networks to predict failures. In Proceedings of the 17th international symposium on software reliability engineering.
Boehm, B., & Basili, V. R.(2001). Software defect reduction top 10 list. IEEE Software, pp. 135–137.
Bullard, L. A., & Gao, K. (2007). An application of a rule-based model in software quality classification. In Proceedings of the 6th international conference on machine learning and applications, pp. 204–210.
Calikli, G., Bener, A., & Arslan, B. (2010a). An analysis of the effects of company culture, education and experience on confirmation bias levels of software developers and testers. In Proceedings of 32nd international conference on software engineering.
Calikli, G., Arslan, B., & Bener, A. (2010b). Confirmation bias in software development and testing: An analysis of the effects of company size, experience and reasoning skills. In Proceedings of the 22nd annual psychology of programming interest group workshop.
Calikli, G., & Bener, A. (2010). Empirical analyses factors affecting confirmation bias and the effects of confirmation bias on software developer/tester performance. In Proceedings of 5th international workshop on predictor models in software engineering.
Cheng, P. W., & Holyoak, K. J. (1985). Pragmatic reasoning schemas. Cognitive Psychology, 17, 391–416.
Article Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, New Jersey: Lawrence Erlbaum Associates Publishers.
MATH Google Scholar
Cohen, J. (1992). A power primer. Psychology Bulletin, 112(1), 155–159.
Article Google Scholar
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin.
Google Scholar
Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped how humans reason? Studies with Wason’s selection task. Cognition, 31, 187–276.
Article Google Scholar
Cox, J. R., & Griggs, R. A. (1982). The effects of experience on performance in Wasons selection task. Memory and Cognition, 10, 496–502.
Article Google Scholar
Drummond, C., & Holte, R. C. (2003). C4.5, Class imbalance and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of 2nd workshop on learning from imbalanced datasets.
Evans, J. St. B. T., Lynch, J. S. (1973). Matching bias in the selection task. British Journal of Psychology, 64, 391–397.
Article Google Scholar
Evans, J. St. B. T., Newstead, S. E., & Byrne, R. M. (1993). Human reasoning: The psychology of deduction. East Sussex, UK: Lawrence Erlbaum Associates Ltd.
Google Scholar
Graves, T. L., Karr, A. F., Marron, J. S., & Siy, H. (2000). Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26(7), 653–661.
Article Google Scholar
Griggs, R. A. (1983). The role of problem content in the selection task and in the THOG problem, thinking and reasoning: psychological approaches. London: Routledge and Kegan Paul London.
Google Scholar
Griggs, R. A., & Cox, J. R. (1982). The elusive thematic materials effect in Wasons selection task. British Journal of Psychology, 73, 407–420.
Article Google Scholar
Griggs, R. A., & Ransdell, S. E. (1986). Scientists and the selection task. Social Studies of Science, 16, 319–330.
Article Google Scholar
Hall, M. A., & Holmes, G., (2003). Benchmarking attribute selection for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering, 15, 1437–1447.
Article Google Scholar
Halstead, M. (1977). Elements of software science. New York: Elsevier.
MATH Google Scholar
Harrold, M. (2000). Testing: A roadmap. In Proceedings of the conference on the future of software engineering, pp. 61–72.
Jackson, S. L., & Griggs, R. A. (1988). Education and the selection task. Bulletin of Psychometric Society, 26, 327–330.
Google Scholar
Jiang, Y., Cuki, B., Menzies, T., Bartlow, N. (2008). Comparing design and code metrics for software quality prediction. In Proceedings of the 4th international workshop on predictor models in software engineering.
Johnson-Laird, P. N., & Tridgell, J. M. (1972). When negation is easier than affirmation. Quarterly Journal of Experimental Psychology, 24, 87–91.
Article Google Scholar
Johnson-Laird, P. N., & Wason, P. C. (1970). A theoretical analysis of insight into a reasoning task. Cognitive Psychology, 1, 134–148.
Article Google Scholar
Jørgensen, M. (2007). Estimation on software development work effort: Evidence on expert judgment and formal models. International Journal of Forecasting, 23(3), 449–462.
Article Google Scholar
Jørgensen, M. (2010). Identification of more risks can lead to increased over-optimism of and over-confidence in software development effort estimates. Journal of Information and Software Technology, 52(5), 506–516.
Article Google Scholar
Kahneman D., Slovic P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press.
Google Scholar
Kamei, Y., Monden, A., Matsumoto, T., & Matsumoto, K. (2007). The effects of over and under-sampling on fault prone module detection. In Proceedings of the 1st international symposium on empirical software engineering and measurement, pp. 196–204.
Khoshgoftaar, T. M. (2003). Building decision tree software quality classification models using genetic programming. In Proceedings of the genetic and evolutionary computation conference.
Khoshgoftaar, T. M., & Allen, E. B. (1999). Predicting fault-prone software modules in embedded systems with classification trees. In Proceedings of the 4th IEEE international symposium on high-assurance systems engineering.
Khoshgoftaar, T. M., & Szabo, R. M. (1996). Using neural networks to predict software faults during testing. IEEE Transactions on Reliability, 45, 456–462.
Article Google Scholar
Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). Supervised neural network modeling: An empirical investigation into learning from imbalanced data with labeling errors. IEEE Transactions on Neural Networks, 21(5), 813–830.
Article Google Scholar
Kocaguneli, E., Tosun, A., Bener, A., Turhan, B., & Caglayan, B. (2009). Prest: An intelligent software metrics extraction, analysis and defect prediction tool. Proceedings of 21st international conference on software engineering and knowledge engineering, pp. 637–642.
Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 485–496.
Article Google Scholar
Mair, C., & Shepperd, M. (2011). Human judgment and software metrics: Vision for the future. In Proceedings of the 2nd international workshop on emerging trends in software metrics.
Manktelow, K. I., & Evans, J. St. B. T. (1979). Facilitation of reasoning by realism: Effect or non-effect? British Journal of Psychology, 70, 477–488.
Article Google Scholar
Manktelow, K. I., & Over, D. E. (1990). Inference and understanding: A philosophical and psychological perspective.
Mataraso-Roth, E. (1979). Facilitating insight in a reasoning task. British Journal of Psychology, 70, 265–271.
Article Google Scholar
McCabe, T. (1976). A complexity measure, IEEE Transactions on Software Engineering, 2, 308–320.
Article MATH MathSciNet Google Scholar
Meneely, A., Williams, L, Snipes, W., & Osborne, J. (2008). Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, pp. 13–23.
Menzies, T., Turhan, B., Bener, A., Gay, G., Cukic, B., & Jiang, Y. (2008). Implications of ceiling effects in defect predictors. Proceedings of the 3rd workshop on predictive models in software engineering, pp. 47–54.
Menzies, T. Z., Hihn, C. J., & Lum, K. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.
Article Google Scholar
Misirli-Tosun, A., Caglayan, B., Mirasky, A., Bener, A., & Ruffolo, N. (2011). Different strokes for different folks: A case study on software metrics for different defect categories. In Proceedings of the 2nd workshop on emerging trends in software metrics, pp. 45–51.
Mockus, A., & Weiss, D. M. (2000). Predicting risk of software changes. Bell Labs Technical Journal, 5, 169–180.
Google Scholar
Munson, J. C., & Khoshgoftaar, T. M. (1992). Detection of fault prone programs. IEEE Transactions on Software Engineering, 18(5), 423–433.
Article Google Scholar
Nagappan, N. (2004). Toward a software testing and reliability early warning metric suite. In Proceedings of the 26th international conference on software engineering, pp. 60–62.
Nagappan, N., & Ball, T. (2007). Using software dependencies and churn metrics to predict field failures: An empirical case study. In Proceedings of the 1st international symposium on empirical software engineering and measurement, pp. 364–373.
Nagappan, N., Murphy, B., & Basili, V. R. (2008). The influence of organizational structure on software quality: An empirical case study. In Proceedings of the 30th international conference on software engineering, pp. 521–530.
Ostrand, T. J., & Weyuker, E. J., & Bell, R. M. (2007). Automating algorithms for the identification of fault prone files. In Proceedings of the 2007 international symposium on software testing and analysis, pp. 219–227.
Ostrand, T. J., Weyuker, E. J., & Bell, R. M. (2010). Programmer-based fault prediction. Proceedings of the 3rd workshop on predictor models in software engineering, pp 1–7.
Ostrand, T. J., Weyuker, E. J., & Bell, R. M. (2004). Where the bugs are. In Proceedings of the 2004 ACM SIGSOFT international symposium on software testing and analysis, pp. 86–96.
Parsons, J., & Saunders, C. (2004). Cognitive heuristics in software engineering: applying and extending anchoring and adjustment to artifact reuse. IEEE Transactions on Software Engineering, 30(12), 873–888.
Article Google Scholar
Pinzger, M., Nagappan, N., & Murphy, B. (2008). Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, pp. 13–23.
Poletiek, F.(2001). Hypothesis-testing behaviour. East Sussex, UK: Psychology Press.
Google Scholar
Reich, S., & Ruth, P. (1982). Wason’s selection task: Verification, falsification and matching. British Journal of Psychology, 73, 395–405.
Article Google Scholar
Stacy, W., & MacMillan, J. (1995). Cognitive bias in software engineering. Communication of the ACM, 38(6), 57–63.
Google Scholar
Tahat, L. H., Vaysburg, B., Korel, B., & Bader, A. (2001). Requirement based automated black-box test generation. In Proceedings of the 25th annual international computer software and applications conference, pp. 489–495.
Teasley, B., Leventhal, L. M., & Rohlman, S. (1993). Positive test bias in software engineering professionals: What is right and what’s wrong. In Proceedings of the 5th workshop on empirical studies of programmers.
Teasley, B. F., Leventhal, L. M., Mynatt, C. R., & Rohlman D. S. (1994). Why software testing is sometimes ineffective: Two applied studies of positive test strategy. Journal of Applied Psychology, 79(1), 142–155.
Article Google Scholar
Tosun, A., Turhan, B., & Bener, A. (2009). Practical considerations in deploying AI for defect prediction: A case study within the Turkish telecommunication industry. In Proceedings of 5th international conference on predictor models in software engineering.
Tosun, A., Turhan, B., & Bener, A. (2008). Ensemble of software defect predictors: A case study. In Proceedings of 2nd international symposium on empirical software engineering and measurement.
Turhan, B., Kocak, G., & Bener, A. (2008). Software defect prediction using call graph based ranking (CGBR) framework. In Proceedings of. 34th international EUROMICRO software engineering and advanced applications conference.
Turhan, B., & Bener, A. (2007). A multivariate analysis of static code attributes for defect prediction. In Proceedings of the 7th international conference on quality software, pp. 231–237.
Turhan, B., & Bener, A. (2008). Weighted static code attributes for software defect prediction. In Proceedings of the 20th international conference on software engineering and knowledge engineering, pp. 143–148.
Turhan, B., Bener, A., & Menzies, T. (2008). Nearest neighbor sampling for cross company defect predictors. Proceedings of the 1st international workshop on defects in large software systems.
Valentine, E. R. (1975). Performance on two reasoning tasks in relation to intelligence, divergence and interference proneness. British Journal of Educational Psychology, 45, 198–205.
Article Google Scholar
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12, 129–140.
Article Google Scholar
Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20, 273–281.
Article Google Scholar
Wason P. C., & Shapiro, D. (1971). Natural and contrived experience in a reasoning problem. Quarterly Journal of Experimental Psychology, 23, 63–71.
Article Google Scholar
Weyuker, E. J., Ostrand, T. J., & Bell, R. M. (2007). Using developer information as a factor for fault prediction. Proceedings of the 1st international workshop on predictor models in software engineering, pp. 1–7.
Weyuker, E. J., Ostrand, T. J., & Bell, R. M. (2008). Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Journal of Empirical Software Engineering, 13, 539–559.
Article Google Scholar
Ye, M. (2011). Does gradualism build coordination? Evidence from laboratory experiments. Job Market Paper.
Zhao, M., Wohlin, C., Ohlsson, N., & Xie, M. (1998). A comparison between software design and code metrics for the prediction of software fault content. Information and Software Technology, 40(14), 801–809.
Article Google Scholar
Zimmerman, T., & Nagappan, N. (2007). Predicting subsystem failures using dependency graph complexities. In Proceedings of the 18th IEEE international symposium on software reliability, pp. 227–236.

Download references

Acknowledgments

We would like to thank Turkcell A. Ş., and Turgay Aytaç and Ayhan Inal from Logo Business Solutions for their support in sharing data. We would like to extend our gratitude to Dr. Michelle Mattern for editing the final manuscript.

Author information

Authors and Affiliations

Department of Computer Engineering, Boğaziçi University, 34342, Bebek, Istanbul, Turkey
Gül Çalıklı
Ted Rogers School of Information Technology Management, Ryerson University, Toronto, ON, M5B 2K3, Canada
Ayşe Başar Bener

Authors

Gül Çalıklı
View author publications
You can also search for this author in PubMed Google Scholar
Ayşe Başar Bener
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gül Çalıklı.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Çalıklı, G., Bener, A.B. Influence of confirmation biases of developers on software quality: an empirical study. Software Qual J 21, 377–416 (2013). https://doi.org/10.1007/s11219-012-9180-0

Download citation

Published: 26 July 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s11219-012-9180-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Influence of confirmation biases of developers on software quality: an empirical study

Abstract

Access this article

Similar content being viewed by others

Literature reviews as independent studies: guidelines for academic practice

What is Qualitative in Research

What Is the Function of Confirmation Bias?

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Influence of confirmation biases of developers on software quality: an empirical study

Abstract

Access this article

Similar content being viewed by others

Literature reviews as independent studies: guidelines for academic practice

What is Qualitative in Research

What Is the Function of Confirmation Bias?

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation