Empirical Software Engineering

, Volume 19, Issue 2, pp 303–342 | Cite as

Are test cases needed? Replicated comparison between exploratory and test-case-based software testing

  • Juha ItkonenEmail author
  • Mika V. Mäntylä


Manual software testing is a widely practiced verification and validation method that is unlikely to fade away despite the advances in test automation. In the domain of manual testing, many practitioners advocate exploratory testing (ET), i.e., creative, experience-based testing without predesigned test cases, and they claim that it is more efficient than testing with detailed test cases. This paper reports a replicated experiment comparing effectiveness, efficiency, and perceived differences between ET and test-case-based testing (TCT) using 51 students as subjects, who performed manual functional testing on the jEdit text editor. Our results confirm the findings of the original study: 1) there is no difference in the defect detection effectiveness between ET and TCT, 2) ET is more efficient by requiring less design effort, and 3) TCT produces more false-positive defect reports than ET. Based on the small differences in the experimental design, we also put forward a hypothesis that the effectiveness of the TCT approach would suffer more than ET from time pressure. We also found that both approaches had distinctive issues: in TCT, the problems were related to correct abstraction levels of test cases, and the problems in ET were related to test design and logging of the test execution and results. Finally, we recognize that TCT has other benefits over ET in managing and controlling testing in large organizations.


Software testing Manual testing Test cases Exploratory testing Experiment Effectiveness Efficiency 


  1. Abran A, Moore JW, Bourque P et al (2004) Guide to the software engineering body of knowledge 2004 version. IEEE Computer Society, Los AlamitosGoogle Scholar
  2. Andersson C, Runeson P (2002) Verification and validation in industry—a qualitative survey on the state of practice. Proceedings of International Symposium on Empirical Software Engineering. pp 37–47Google Scholar
  3. Bach J (1999) General functionality and stability test procedure for certified for microsoft windows logo. Accessed 8 May 2013
  4. Bach J (2000) Session-based test management. In: software testing and quality engineering. Accessed 8 May 2013
  5. Bach J (2003) Exploratory testing explained. Accessed 8 May 2013
  6. Bach J (2004) Exploratory testing. In: van Veenendaal E (ed) The testing practitioner. Second. UTN Publishers, Den Bosch, pp 253–265Google Scholar
  7. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc., New York, USAGoogle Scholar
  8. Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng 13:1278–1296CrossRefGoogle Scholar
  9. Beizer B (1990) Software testing techniques. Van Nostrand Reinhold, New YorkGoogle Scholar
  10. Berner S, Weber R, Keller RK (2005) Observations and lessons learned from automated testing. Proceedings of International Conference on Software Engineering. pp 571–579Google Scholar
  11. Bolton M (2005) Testing without a map. Better Software 7(1)Google Scholar
  12. Carver JC (2010) Towards reporting guidelines for experimental replications: a proposal. 1st International Workshop on Replication in Empirical Software EngineeringGoogle Scholar
  13. Cohen DM, Dalal SR, Fredman ML, Patton GC (1997) The AETG system: an approach to testing based on combinatorial design. IEEE Trans Softw Eng 23:437–444. doi: 10.1109/32.605761 CrossRefGoogle Scholar
  14. Copeland L (2004) A practitioner’s guide to software test design. Artech House Publishers, BostonGoogle Scholar
  15. Craig RD, Jaskiel SP (2002) Systematic software testing. Artech House Publishers, BostonzbMATHGoogle Scholar
  16. Crispin L, Gregory J (2009) Agile testing: A practical guide for testers and agile teams. Addison-Wesley, BostonGoogle Scholar
  17. Do Nascimento LHO, Machado PDL (2007) An experimental evaluation of approaches to feature testing in the mobile phone applications domain. Proceedings of the Workshop on Domain Specific Approaches to Software Test Automation. pp 27–33Google Scholar
  18. Ellis PD (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results, 1st edn. Cambridge University Press, New York, USAGoogle Scholar
  19. Engström E, Runeson P (2010) A qualitative survey of regression testing practices. Proceedings of International Conference on Product-Focused Software Process Improvement. pp 3–16Google Scholar
  20. Engström E, Runeson P (2013) Test overlay in an emerging software product line—An industrial case study. Inf Softw Technol 55:581–594. doi: 10.1016/j.infsof.2012.04.009 CrossRefGoogle Scholar
  21. Field A (2005) Discovering statistics using SPSS, 2nd edn. Sage Publications Ltd, London, UKGoogle Scholar
  22. Glass RL (2002) Project retrospectives, and why they never happen. IEEE Softw 19:112. doi: 10.1109/MS.2002.1032872 Google Scholar
  23. Grechanik M, Jones JA, Orso A, van der Hoek A (2010) Bridging gaps between developers and testers in globally-distributed software development. Proceedings of the FSE/SDP workshop on Future of software engineering research. ACM, New York, NY, USA, pp 149–154Google Scholar
  24. Houdek F, Schwinn T, Ernst D (2002) Defect detection for executable specifications—an experiment. Int J Softw Eng Knowl Eng 12:637–655CrossRefGoogle Scholar
  25. Huang L, Boehm B (2006) How much software quality investment is enough: a value-based approach. IEEE Softw 23:88–95. doi: 10.1109/MS.2006.127 CrossRefGoogle Scholar
  26. Itkonen J (2008) Do test cases really matter? An experiment comparing test case based and exploratory testing. Licentiate Thesis, Helsinki University of TechnologyGoogle Scholar
  27. Itkonen J (2013) ET vs. TCT Experiment replication dataset. In: Accessed 29 Apr 2013
  28. Itkonen J, Rautiainen K (2005) Exploratory testing: a multiple case study. Proceedings of International Symposium on Empirical Software Engineering. pp 84–93Google Scholar
  29. Itkonen J, Mäntylä MV, Lassenius C (2007) Defect detection efficiency: test case based vs. exploratory testing. Proceedings of International Symposium on Empirical Software Engineering and Measurement. pp 61–70Google Scholar
  30. Itkonen J, Mäntylä MV, Lassenius C (2009) How do testers do it? An exploratory study on manual testing practices. Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on. pp 494–497Google Scholar
  31. Itkonen J, Mäntylä MV, Lassenius C (2013) The role of the tester’s knowledge in exploratory software testing. IEEE Trans Softw Eng 39:707–724. doi: 10.1109/TSE.2012.55 CrossRefGoogle Scholar
  32. Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer Academic Publishers, BostonCrossRefzbMATHGoogle Scholar
  33. Juristo N, Moreno AM, Vegas S (2004) Reviewing 25 years of testing technique experiments. Empir Softw Eng 9:7–44CrossRefGoogle Scholar
  34. Juristo N, Vegas S, Solari M et al (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading by stepwise abstraction applied by subjects. Proceedings of Fifth International Conference on Software Testing, Verification and Validation. pp 330–339Google Scholar
  35. Kamsties E, Lott CM (1995) An empirical evaluation of three defect-detection techniques. In: Schäfer W, Botella P (eds) Proceedings of ESEC’95. Springer Berlin Heidelberg, pp 362–383Google Scholar
  36. Kaner C, Falk J, Nguyen HQ (1999) Testing computer software. Wiley, New YorkGoogle Scholar
  37. Kaner C, Bach J, Pettichord B (2002) Lessons learned in software testing. Wiley, New YorkGoogle Scholar
  38. Kitchenham B (2008) The role of replications in empirical software engineering—a word of warning. Empir Softw Eng 13:219–221. doi: 10.1007/s10664-008-9061-0 CrossRefGoogle Scholar
  39. Lyndsay J, van Eeden N (2003) Adventures in session-based testing. Accessed 20 Jun 2012
  40. Mäntylä MV, Itkonen J (2013) More testers—The effect of crowd size and time restriction in software testing. Inf Softw Technol 55:986–1003. doi: 10.1016/j.infsof.2012.12.004 CrossRefGoogle Scholar
  41. Mäntylä MV, Vanhanen J (2011) Software deployment activities and challenges—a case study of four software product companies. Proceedings of the 15th European Conference on Software Maintenance and Reengineering. pp 131–140Google Scholar
  42. Martin D, Rooksby J, Rouncefield M, Sommerville I (2007) “Good” organisational reasons for “bad” software testing: an ethnographic study of testing in a small software company. Proceedings of International Conference on Software Engineering. pp 602–611Google Scholar
  43. McConnell S (2004) Code complete. Microsoft Press, Redmond, WA, USAGoogle Scholar
  44. McDaniel LS (1990) The effects of time pressure and audit program structure on audit performance. J Account Res 28:267–285. doi: 10.2307/2491150 CrossRefGoogle Scholar
  45. Mouchawrab S, Briand LC, Labiche Y, Di Penta M (2011) Assessing, comparing, and combining state machine-based testing and structural testing: a series of experiments. IEEE Trans Softw Eng 37:161–187. doi: 10.1109/TSE.2010.32 CrossRefGoogle Scholar
  46. Myers GJ (1978) A controlled experiment in program testing and code walkthroughs/inspections. Commun ACM 21:760–768. doi: 10.1145/359588.359602 CrossRefGoogle Scholar
  47. Myers GJ (1979) The art of software testing. Wiley, New YorkGoogle Scholar
  48. Ng SP, Murnane T, Reed K, et al (2004) A preliminary survey on software testing practices in Australia. Proceedings of the Australian Software Engineering Conference. pp 116–125Google Scholar
  49. Page A, Johnston K, Rollison B (2008) How we test software at microsoft. Microsoft Press, Redmond, WA, USAGoogle Scholar
  50. Pichler J, Ramler R (2008) How to test the intangible properties of graphical user interfaces? Proceedings of 1st International Conference on Software Testing, Verification, and Validation. pp 494–497Google Scholar
  51. Rafi DM, Moses KRK, Petersen K, Mantyla MV (2012) Benefits and limitations of automated software testing: Systematic literature review and practitioner survey. 2012 7th International Workshop on Automation of Software Test (AST). pp 36–42Google Scholar
  52. Ramasubbu N, Balan RK (2009) The impact of process choice in high maturity environments: An empirical analysis. Proceedings of 31st International Conference on Software Engineering. pp 529–539Google Scholar
  53. Runeson P, Andersson C, Thelin T et al (2006) What do we know about defect detection methods? IEEE Softw 23:82–90. doi: 10.1109/MS.2006.89 CrossRefGoogle Scholar
  54. Shah SMA, Morisio M, Torchiano M (2012) The impact of process maturity on defect density. Proceedings of International symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 315–318Google Scholar
  55. Shoaib L, Nadeem A, Akbar A (2009) An empirical evaluation of the influence of human personality on exploratory software testing. Proceedings of IEEE International Multitopic Conference. pp 1–6Google Scholar
  56. Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in Empirical Software Engineering. Empir Softw Eng 13:211–218. doi: 10.1007/s10664-008-9060-1 CrossRefGoogle Scholar
  57. Spolsky J (2001) Big Macs vs. the naked chef. In: Joel on Software. Accessed 28 Jun 2012
  58. Tichy WF (2000) Hints for reviewing empirical work in software engineering. Empir Softw Eng 5:309–312. doi: 10.1023/A:1009844119158 CrossRefMathSciNetGoogle Scholar
  59. Tinkham A, Kaner C (2003a) Exploring exploratory testing. Proceedings of the Software Testing Analysis & Review Conference. p 9Google Scholar
  60. Tinkham A, Kaner C (2003b) Learning styles and exploratory testing. Proceedings of the Pacific Northwest Software Quality ConferenceGoogle Scholar
  61. Tsang EWK, Kwan K-M (1999) Replication and theory development in organizational science: a critical realist perspective. Acad Manag Rev 24:759–780Google Scholar
  62. Tuomikoski J, Tervonen I (2009) Absorbing software testing into the scrum method. Proceedings of 10th International Conference on Product-Focused Software Process Improvement 32 LNBIPGoogle Scholar
  63. Våga J, Amland S (2002) Managing high-speed web testing. In: Meyerhoff D, Laibarra B, van der Pouw Kraan R, Wallet A (eds) Software quality and software testing in internet times. Springer, Berlin, pp 23–30CrossRefGoogle Scholar
  64. Vegas S, Juristo N, Moreno A et al (2006) Analysis of the influence of communication between researchers on experiment replication. Proceedings of the 2006 ACM/IEEE international symposium on Empirical Software Engineering. ACM, New York, NY, USA, pp 28–37Google Scholar
  65. Whittaker JA (2003) How to break software a practical guide to testing. Addison Wesley, BostonGoogle Scholar
  66. Whittaker JA (2009) Exploratory software testing: tips, tricks, tours, and techniques to guide test design. Addison-Wesley Professional, Boston, MA, USAGoogle Scholar
  67. Wohlin C, Runeson P, Höst M et al (2000) Experimentation in software engineering: An introduction. Kluwer Academic Publishers, BostonCrossRefGoogle Scholar
  68. Wood B, James D (2003) Applying session-based testing to medical software. Med Device Diagn Ind 25:90Google Scholar
  69. Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. ACM SIGSOFT Softw Eng Notes 22:262–277. doi: 10.1145/267895.267915 CrossRefGoogle Scholar
  70. Yatani K (2010) Statistics for HCI Research: Mann-Whitney’s U test. In: Statistics for HCI research. Accessed 28 Jun 2012

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringAalto UniversityAaltoFinland
  2. 2.Department of Computer ScienceLund UniversityLundSweden

Personalised recommendations