Empirical Software Engineering

, Volume 20, Issue 3, pp 844–878 | Cite as

An experiment on the effectiveness and efficiency of exploratory testing

  • Wasif AfzalEmail author
  • Ahmad Nauman Ghazi
  • Juha Itkonen
  • Richard Torkar
  • Anneliese Andrews
  • Khurram Bhatti


The exploratory testing (ET) approach is commonly applied in industry, but lacks scientific research. The scientific community needs quantitative results on the performance of ET taken from realistic experimental settings. The objective of this paper is to quantify the effectiveness and efficiency of ET vs. testing with documented test cases (test case based testing, TCT). We performed four controlled experiments where a total of 24 practitioners and 46 students performed manual functional testing using ET and TCT. We measured the number of identified defects in the 90-minute testing sessions, the detection difficulty, severity and types of the detected defects, and the number of false defect reports. The results show that ET found a significantly greater number of defects. ET also found significantly more defects of varying levels of difficulty, types and severity levels. However, the two testing approaches did not differ significantly in terms of the number of false defect reports submitted. We conclude that ET was more efficient than TCT in our experiment. ET was also more effective than TCT when detection difficulty, type of defects and severity levels are considered. The two approaches are comparable when it comes to the number of false defect reports submitted.


Software testing Experiment Exploratory testing Efficiency Effectiveness 


  1. Abran A, Bourque P, Dupuis R, Moore JW, Tripp LL (eds) (2004) Guide to the software engineering body of knowledge – SWEBOK. IEEE Press, PiscatawayGoogle Scholar
  2. Agruss C, Johnson B (2000) Ad hoc software testing: a perspective on exploration and improvisation.
  3. Ahonen J, Junttila T, Sakkinen M (2004) Impacts of the organizational model on testing: three industrial cases. Empir Softw Eng 9(4):275–296CrossRefGoogle Scholar
  4. Ali S, Briand L, Hemmati H, Panesar-Walawege R (2010) A systematic review of the application and empirical investigation of search-based test case generation. IEEE Trans Softw Eng 36(6):742–762CrossRefGoogle Scholar
  5. Andersson C, Runeson P (2002) Verification and validation in industry – a qualitative survey on the state of practice. In: Proceedings of the 2002 international symposium on empirical software engineering (ISESE’02). IEEE Computer Society, Washington, DCGoogle Scholar
  6. Arisholm E, Gallis H, Dybå T, Sjøberg DIK (2007) Evaluating pair programming with respect to system complexity and programmer expertise. IEEE Trans Softw Eng 33:65–86CrossRefGoogle Scholar
  7. Bach J (2000) Session-based test management. Software Testing and Quality Engineering Magazine, vol 2, no 6Google Scholar
  8. Bach J (2003) Exploratory testing explained.
  9. Basili V, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473CrossRefGoogle Scholar
  10. Beer A, Ramler R (2008) The role of experience in software testing practice. In: Proceedings of euromicro conference on software engineering and advanced applicationsGoogle Scholar
  11. Berner S, Weber R, Keller RK (2005) Observations and lessons learned from automated testing. In: Proceedings of the 27th international conference on software engineering (ICSE’05). ACM, New YorkGoogle Scholar
  12. Bertolino A (2007) Software testing research: achievements, challenges, dreams. In: Proceedings of the 2007 international conference on future of software engineering (FOSE’07)Google Scholar
  13. Bertolino A (2008) Software testing forever: old and new processes and techniques for validating today’s applications. In: Jedlitschka A, Salo O (eds) Product-focused software process improvement, lecture notes in computer science, vol 5089. Springer, Berlin HeidelbergGoogle Scholar
  14. Bhatti K, Ghazi AN (2010) Effectiveness of exploratory testing: an empirical scrutiny of the challenges and factors affecting the defect detection efficiency. Master’s thesis, Blekinge Institute of TechnologyGoogle Scholar
  15. Briand LC (2007) A critical analysis of empirical research in software testing. In: Proceedings of the 1st international symposium on empirical software engineering and measurement (ESEM’07). IEEE Computer Society, Washington, DCGoogle Scholar
  16. Brooks A, Roper M, Wood M, Daly J, Miller J (2008) Replication’s role in software engineering. In: Shull F, Singer J, Sjøberg DI (eds) Guide to advanced empirical software engineering. Springer, London, pp 365–379 doi: 10.1007/978-1-84800-044-5_14 CrossRefGoogle Scholar
  17. Chillarege R, Bhandari I, Chaar J, Halliday M, Moebus D, Ray B, Wong MY (1992) Orthogonal defect classification–a concept for in-process measurements. IEEE Trans Softw Eng 18(11):943–956CrossRefGoogle Scholar
  18. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence ErlbaumGoogle Scholar
  19. da Mota Silveira Neto PA, do Carmo Machado I, McGregor JD, de Almeida ES, de Lemos Meira SR (2011) A systematic mapping study of software product lines testing. Inf Softw Technol 53(5):407–423CrossRefGoogle Scholar
  20. Dias Neto AC, Subramanyan R, Vieira M, Travassos GH (2007) A survey on model-based testing approaches: a systematic review. In: Proceedings of the 1st ACM international workshop on empirical assessment of software engineering languages and technologies (WEASELTech’07): held in conjunction with the 22nd IEEE/ACM international conference on automated software engineering (ASE) 2007. ACM, New YorkGoogle Scholar
  21. do Nascimento LHO, Machado PDL (2007) An experimental evaluation of approaches to feature testing in the mobile phone applications domain. In: Workshop on domain specific approaches to software test automation (DOSTA’07): in conjunction with the 6th ESEC/FSE joint meeting. ACM, New YorkGoogle Scholar
  22. Dustin E, Rashka J, Paul J (1999) Automated software testing: introduction, management, and performance. Addison-Wesley ProfessionalGoogle Scholar
  23. Houdek F, Ernst D, Schwinn T (2002) Defect detection for executable specifications – an experiment. Int J Softw Eng Knowl Eng 12(6):637–655Google Scholar
  24. Galletta DF, Abraham D, El Louadi M, Lekse W, Pollalis YA, Sampler JL (1993) An empirical study of spreadsheet error-finding performance. Account Manag Inf Technol 3(2):79–95CrossRefGoogle Scholar
  25. Goodenough JB, Gerhart SL (1975) Toward a theory of test data selection. SIGPLAN Notes 10(6):493–510CrossRefGoogle Scholar
  26. Graves TL, Harrold MJ, Kim JM, Porter A, Rothermel G (2001) An empirical study of regression test selection techniques. ACM Trans Softw Eng Methodol 10:184–208CrossRefzbMATHGoogle Scholar
  27. Grechanik M, Xie Q, Fu C (2009) Maintaining and evolving GUI-directed test scripts. In: Proceedings of the 31st international conference on software engineering (ICSE’09). IEEE Computer Society, Washington, DC, pp 408–418Google Scholar
  28. Hartman A (2002) Is issta research relevant to industry? SIGSOFT Softw Eng Notes 27(4):205–206CrossRefGoogle Scholar
  29. Höst M, Wohlin C, Thélin T (2005) Experimental context classification: incentives and experience of subjects. In: Proceedings of the 27th international conference on software engineering (ICSE’05)Google Scholar
  30. Hutchins M, Foster H, Goradia T, Ostrand T (1994) Experiments of the effectiveness of data flow and control flow based test adequacy criteria. In: Proceedings of the 16th international conference on software engineering (ICSE’94). IEEE Computer Society Press, Los Alamitos, pp 191–200Google Scholar
  31. IEEE 1044-2009 (2010) IEEE standard classification for software anomaliesGoogle Scholar
  32. Itkonen J (2008) Do test cases really matter? An experiment comparing test case based and exploratory testing. Licentiate Thesis, Helsinki University of TechnologyGoogle Scholar
  33. Itkonen J, Rautiainen K (2005) Exploratory testing: a multiple case study. In: 2005 international symposium on empirical software engineering (ISESE’05), pp 84–93Google Scholar
  34. Itkonen J, Mäntylä M, Lassenius C (2007) Defect detection efficiency: test case based vs. exploratory testing. In: 1st international symposium on empirical software engineering and measurement (ESEM’07), pp 61–70Google Scholar
  35. Itkonen J, Mäntylä MV, Lassenius C (2009) How do testers do it? An exploratory study on manual testing practices. In: 3rd international symposium on empirical software engineering and measurement (ESEM’09), pp 494–497Google Scholar
  36. Itkonen J, Mäntylä M, Lassenius C (2013) The role of the tester’s knowledge in exploratory software testing. IEEE Trans Softw Eng 39(5):707–724CrossRefGoogle Scholar
  37. Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678CrossRefGoogle Scholar
  38. Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer, BostonCrossRefzbMATHGoogle Scholar
  39. Juristo N, Moreno A, Vegas S (2004) Reviewing 25 years of testing technique experiments. Empir Softw Eng 9(1):7–44CrossRefGoogle Scholar
  40. Kamsties E, Lott CM (1995) An empirical evaluation of three defect detection techniques. In: Proceedings of the 5th European software engineering conference (ESEC’95). Springer, London, pp 362–383Google Scholar
  41. Kaner C, Bach J, Pettichord B (2008) Lessons learned in software testing, 1st edn. Wiley-IndiaGoogle Scholar
  42. Kettunen V, Kasurinen J, Taipale O, Smolander K (2010) A study on agility and testing processes in software organizations. In: Proceedings of the international symposium on software testing and analysisGoogle Scholar
  43. Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, Emam KE, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28:721–734CrossRefGoogle Scholar
  44. Kuhn D, Wallace D, Gallo A (2004) Software fault interactions and implications for software testing. IEEE Trans Softw Eng 30(6):418–421CrossRefGoogle Scholar
  45. Lung J, Aranda J, Easterbrook S, Wilson G (2008) On the difficulty of replicating human subjects studies in software engineering. In: ACM/IEEE 30th international conference on software engineering (ICSE’08)Google Scholar
  46. Lyndsay J, van Eeden N (2003) Adventures in session-based testing.
  47. Myers GJ, Sandler C, Badgett T (1979) The art of software testing. Wiley, New YorkGoogle Scholar
  48. Naseer A, Zulfiqar M (2010) Investigating exploratory testing in industrial practice. Master’s thesis, Blekinge Institute of TechnologyGoogle Scholar
  49. Nie C, Leung H (2011) A survey of combinatorial testing. ACM Comput Surv 43(2):1–29CrossRefGoogle Scholar
  50. Poon P, Tse TH, Tang S, Kuo F (2011) Contributions of tester experience and a checklist guideline to the identification of categories and choices for software testing. Softw Qual J 19(1):141–163CrossRefGoogle Scholar
  51. Ryber T (2007) Essential software test design. Unique Publishing Ltd.Google Scholar
  52. Sjøberg D, Hannay J, Hansen O, Kampenes V, Karahasanovic A, Liborg NK, Rekdal A (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753CrossRefGoogle Scholar
  53. Svahnberg M, Aurum A, Wohlin C (2008) Using students as subjects – An empirical evaluation. In: Proceedings of the 2nd ACM-IEEE international symposium on empirical software engineering and measurement (ESEM’08). ACM, New YorkGoogle Scholar
  54. Taipale O, Kalviainen H, Smolander K (2006) Factors affecting software testing time schedule. In: Proceedings of the Australian software engineering conference (ASE’06). IEEE Computer Society, Washington, DC, pp 283–291Google Scholar
  55. van Veenendaal E, Bach J, Basili V, Black R, Comey C, Dekkers T, Evans I, Gerard P, Gilb T, Hatton L, Hayman D, Hendriks R, Koomen T, Meyerhoff D, Pol M, Reid S, Schaefer H, Schotanus C, Seubers J, Shull F, Swinkels R, Teunissen R, van Vonderen R, Watkins J, van der Zwan M (2002) The testing practitioner. UTN PublishersGoogle Scholar
  56. Våga J, Amland S (2002) Managing high-speed web testing. Springer, New York, pp 23–30Google Scholar
  57. Vargha A, Delaney HD (2000) A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Educ Behav Stat 25(2):101–132Google Scholar
  58. Weyuker EJ (1993) More experience with data flow testing. IEEE Trans Softw Eng 19:912–919CrossRefGoogle Scholar
  59. Whittaker JA (2010) Exploratory software testing. Addison-WesleyGoogle Scholar
  60. Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer, NorwellCrossRefGoogle Scholar
  61. Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. In: Proceedings of the 6th European software engineering conference (ESEC’97) held jointly with the 5th ACM SIGSOFT international symposium on foundations of software engineering (FSE’97). Springer New York, pp 262–277Google Scholar
  62. Yamaura T (2002) How to design practical test cases. IEEE Softw 15(6):30–36CrossRefGoogle Scholar
  63. Yang B, Hu H, Jia L (2008) A study of uncertainty in software cost and its impact on optimal software release time. IEEE Trans Softw Eng 34(6):813–825CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Wasif Afzal
    • 1
    Email author
  • Ahmad Nauman Ghazi
    • 2
  • Juha Itkonen
    • 3
  • Richard Torkar
    • 4
  • Anneliese Andrews
    • 5
  • Khurram Bhatti
    • 2
  1. 1.School of Innovation, Design and EngineeringMälardalen UniversityVästerås,Sweden
  2. 2.Blekinge Institute of TechnologyKarlskronaSweden
  3. 3.Department of Computer Science and EngineeringAalto UniversityEspooFinland
  4. 4.Department of Computer Science and EngineeringChalmers University of Technology|University of GothenburgGothenburgSweden
  5. 5.University of DenverDenverUSA

Personalised recommendations