Are test cases needed? Replicated comparison between exploratory and test-case-based software testing

Abstract

Manual software testing is a widely practiced verification and validation method that is unlikely to fade away despite the advances in test automation. In the domain of manual testing, many practitioners advocate exploratory testing (ET), i.e., creative, experience-based testing without predesigned test cases, and they claim that it is more efficient than testing with detailed test cases. This paper reports a replicated experiment comparing effectiveness, efficiency, and perceived differences between ET and test-case-based testing (TCT) using 51 students as subjects, who performed manual functional testing on the jEdit text editor. Our results confirm the findings of the original study: 1) there is no difference in the defect detection effectiveness between ET and TCT, 2) ET is more efficient by requiring less design effort, and 3) TCT produces more false-positive defect reports than ET. Based on the small differences in the experimental design, we also put forward a hypothesis that the effectiveness of the TCT approach would suffer more than ET from time pressure. We also found that both approaches had distinctive issues: in TCT, the problems were related to correct abstraction levels of test cases, and the problems in ET were related to test design and logging of the test execution and results. Finally, we recognize that TCT has other benefits over ET in managing and controlling testing in large organizations.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    Available at http://www.soberit.hut.fi/jitkonen/Publications/Juha_Itkonen_Licentiate_Thesis_2008.pdf

  2. 2.

    jEdit, http://www.jedit.org/

  3. 3.

    http://www.cs.ua.edu/~carver/ReplicationGuidelines.htm

  4. 4.

    False defect reports refer to reported defects that cannot be understood, are duplicates, or report non-existing defects.

  5. 5.

    In the original study it says the average was 107.9. However, when doing re-analysis we found that three missing data points (=student that had not answered this question) had turned to zeros. Thus, changing zeros to missing increased the average slightly.

  6. 6.

    Burnstein I (2003) Practical Software Testing. Springer-Verlag, New York. (selected chapters)

  7. 7.

    http://www.bugzilla.org/

  8. 8.

    http://figshare.com/

  9. 9.

    If a previously unknown, valid defect was reported a new known defect and ID was created.

  10. 10.

    www.r-project.org

  11. 11.

    This is the number of false defect reports divided by all findings (both real and false defect reports).

  12. 12.

    The authors are aware of the controversy of calculating the mean value from the ordinal data. However, we felt that using mean would be more accurate than median, e.g., if an individual respondent’s coverage estimate median for ET and TCT coverage are both three, it could be result of ET coverage having a mean of 2.6, while TCT coverage would have mean value of 3.4. Obviously, in such a case, the respondent’s intention would be that TCT provided better coverage, but it would not be visible in the perceived coverage measure using the median.

  13. 13.

    Windows-Icons-Menus-Pointer

References

  1. Abran A, Moore JW, Bourque P et al (2004) Guide to the software engineering body of knowledge 2004 version. IEEE Computer Society, Los Alamitos

    Google Scholar 

  2. Andersson C, Runeson P (2002) Verification and validation in industry—a qualitative survey on the state of practice. Proceedings of International Symposium on Empirical Software Engineering. pp 37–47

  3. Bach J (1999) General functionality and stability test procedure for certified for microsoft windows logo. http://www.satisfice.com/tools/procedure.pdf. Accessed 8 May 2013

  4. Bach J (2000) Session-based test management. In: software testing and quality engineering. http://www.satisfice.com/articles/sbtm.pdf. Accessed 8 May 2013

  5. Bach J (2003) Exploratory testing explained. http://www.satisfice.com/articles/et-article.pdf. Accessed 8 May 2013

  6. Bach J (2004) Exploratory testing. In: van Veenendaal E (ed) The testing practitioner. Second. UTN Publishers, Den Bosch, pp 253–265

    Google Scholar 

  7. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc., New York, USA

  8. Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng 13:1278–1296

    Article  Google Scholar 

  9. Beizer B (1990) Software testing techniques. Van Nostrand Reinhold, New York

    Google Scholar 

  10. Berner S, Weber R, Keller RK (2005) Observations and lessons learned from automated testing. Proceedings of International Conference on Software Engineering. pp 571–579

  11. Bolton M (2005) Testing without a map. Better Software 7(1)

  12. Carver JC (2010) Towards reporting guidelines for experimental replications: a proposal. 1st International Workshop on Replication in Empirical Software Engineering

  13. Cohen DM, Dalal SR, Fredman ML, Patton GC (1997) The AETG system: an approach to testing based on combinatorial design. IEEE Trans Softw Eng 23:437–444. doi:10.1109/32.605761

    Article  Google Scholar 

  14. Copeland L (2004) A practitioner’s guide to software test design. Artech House Publishers, Boston

    Google Scholar 

  15. Craig RD, Jaskiel SP (2002) Systematic software testing. Artech House Publishers, Boston

    Google Scholar 

  16. Crispin L, Gregory J (2009) Agile testing: A practical guide for testers and agile teams. Addison-Wesley, Boston

    Google Scholar 

  17. Do Nascimento LHO, Machado PDL (2007) An experimental evaluation of approaches to feature testing in the mobile phone applications domain. Proceedings of the Workshop on Domain Specific Approaches to Software Test Automation. pp 27–33

  18. Ellis PD (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results, 1st edn. Cambridge University Press, New York, USA

  19. Engström E, Runeson P (2010) A qualitative survey of regression testing practices. Proceedings of International Conference on Product-Focused Software Process Improvement. pp 3–16

  20. Engström E, Runeson P (2013) Test overlay in an emerging software product line—An industrial case study. Inf Softw Technol 55:581–594. doi:10.1016/j.infsof.2012.04.009

    Article  Google Scholar 

  21. Field A (2005) Discovering statistics using SPSS, 2nd edn. Sage Publications Ltd, London, UK

  22. Glass RL (2002) Project retrospectives, and why they never happen. IEEE Softw 19:112. doi:10.1109/MS.2002.1032872

    Google Scholar 

  23. Grechanik M, Jones JA, Orso A, van der Hoek A (2010) Bridging gaps between developers and testers in globally-distributed software development. Proceedings of the FSE/SDP workshop on Future of software engineering research. ACM, New York, NY, USA, pp 149–154

  24. Houdek F, Schwinn T, Ernst D (2002) Defect detection for executable specifications—an experiment. Int J Softw Eng Knowl Eng 12:637–655

    Article  Google Scholar 

  25. Huang L, Boehm B (2006) How much software quality investment is enough: a value-based approach. IEEE Softw 23:88–95. doi:10.1109/MS.2006.127

    Article  Google Scholar 

  26. Itkonen J (2008) Do test cases really matter? An experiment comparing test case based and exploratory testing. Licentiate Thesis, Helsinki University of Technology

  27. Itkonen J (2013) ET vs. TCT Experiment replication dataset. In: Figshare.com. http://dx.doi.org/10.6084/m9.figshare.689809. Accessed 29 Apr 2013

  28. Itkonen J, Rautiainen K (2005) Exploratory testing: a multiple case study. Proceedings of International Symposium on Empirical Software Engineering. pp 84–93

  29. Itkonen J, Mäntylä MV, Lassenius C (2007) Defect detection efficiency: test case based vs. exploratory testing. Proceedings of International Symposium on Empirical Software Engineering and Measurement. pp 61–70

  30. Itkonen J, Mäntylä MV, Lassenius C (2009) How do testers do it? An exploratory study on manual testing practices. Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on. pp 494–497

  31. Itkonen J, Mäntylä MV, Lassenius C (2013) The role of the tester’s knowledge in exploratory software testing. IEEE Trans Softw Eng 39:707–724. doi:10.1109/TSE.2012.55

    Article  Google Scholar 

  32. Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer Academic Publishers, Boston

    Google Scholar 

  33. Juristo N, Moreno AM, Vegas S (2004) Reviewing 25 years of testing technique experiments. Empir Softw Eng 9:7–44

    Article  Google Scholar 

  34. Juristo N, Vegas S, Solari M et al (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading by stepwise abstraction applied by subjects. Proceedings of Fifth International Conference on Software Testing, Verification and Validation. pp 330–339

  35. Kamsties E, Lott CM (1995) An empirical evaluation of three defect-detection techniques. In: Schäfer W, Botella P (eds) Proceedings of ESEC’95. Springer Berlin Heidelberg, pp 362–383

  36. Kaner C, Falk J, Nguyen HQ (1999) Testing computer software. Wiley, New York

    Google Scholar 

  37. Kaner C, Bach J, Pettichord B (2002) Lessons learned in software testing. Wiley, New York

    Google Scholar 

  38. Kitchenham B (2008) The role of replications in empirical software engineering—a word of warning. Empir Softw Eng 13:219–221. doi:10.1007/s10664-008-9061-0

    Article  Google Scholar 

  39. Lyndsay J, van Eeden N (2003) Adventures in session-based testing. http://www.workroom-productions.com/papers/AiSBTv1.2.pdf. Accessed 20 Jun 2012

  40. Mäntylä MV, Itkonen J (2013) More testers—The effect of crowd size and time restriction in software testing. Inf Softw Technol 55:986–1003. doi:10.1016/j.infsof.2012.12.004

    Article  Google Scholar 

  41. Mäntylä MV, Vanhanen J (2011) Software deployment activities and challenges—a case study of four software product companies. Proceedings of the 15th European Conference on Software Maintenance and Reengineering. pp 131–140

  42. Martin D, Rooksby J, Rouncefield M, Sommerville I (2007) “Good” organisational reasons for “bad” software testing: an ethnographic study of testing in a small software company. Proceedings of International Conference on Software Engineering. pp 602–611

  43. McConnell S (2004) Code complete. Microsoft Press, Redmond, WA, USA

  44. McDaniel LS (1990) The effects of time pressure and audit program structure on audit performance. J Account Res 28:267–285. doi:10.2307/2491150

    Article  Google Scholar 

  45. Mouchawrab S, Briand LC, Labiche Y, Di Penta M (2011) Assessing, comparing, and combining state machine-based testing and structural testing: a series of experiments. IEEE Trans Softw Eng 37:161–187. doi:10.1109/TSE.2010.32

    Article  Google Scholar 

  46. Myers GJ (1978) A controlled experiment in program testing and code walkthroughs/inspections. Commun ACM 21:760–768. doi:10.1145/359588.359602

    Article  Google Scholar 

  47. Myers GJ (1979) The art of software testing. Wiley, New York

    Google Scholar 

  48. Ng SP, Murnane T, Reed K, et al (2004) A preliminary survey on software testing practices in Australia. Proceedings of the Australian Software Engineering Conference. pp 116–125

  49. Page A, Johnston K, Rollison B (2008) How we test software at microsoft. Microsoft Press, Redmond, WA, USA

  50. Pichler J, Ramler R (2008) How to test the intangible properties of graphical user interfaces? Proceedings of 1st International Conference on Software Testing, Verification, and Validation. pp 494–497

  51. Rafi DM, Moses KRK, Petersen K, Mantyla MV (2012) Benefits and limitations of automated software testing: Systematic literature review and practitioner survey. 2012 7th International Workshop on Automation of Software Test (AST). pp 36–42

  52. Ramasubbu N, Balan RK (2009) The impact of process choice in high maturity environments: An empirical analysis. Proceedings of 31st International Conference on Software Engineering. pp 529–539

  53. Runeson P, Andersson C, Thelin T et al (2006) What do we know about defect detection methods? IEEE Softw 23:82–90. doi:10.1109/MS.2006.89

    Article  Google Scholar 

  54. Shah SMA, Morisio M, Torchiano M (2012) The impact of process maturity on defect density. Proceedings of International symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 315–318

  55. Shoaib L, Nadeem A, Akbar A (2009) An empirical evaluation of the influence of human personality on exploratory software testing. Proceedings of IEEE International Multitopic Conference. pp 1–6

  56. Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in Empirical Software Engineering. Empir Softw Eng 13:211–218. doi:10.1007/s10664-008-9060-1

    Article  Google Scholar 

  57. Spolsky J (2001) Big Macs vs. the naked chef. In: Joel on Software. http://www.joelonsoftware.com/articles/fog0000000024.html. Accessed 28 Jun 2012

  58. Tichy WF (2000) Hints for reviewing empirical work in software engineering. Empir Softw Eng 5:309–312. doi:10.1023/A:1009844119158

    Article  MathSciNet  Google Scholar 

  59. Tinkham A, Kaner C (2003a) Exploring exploratory testing. Proceedings of the Software Testing Analysis & Review Conference. p 9

  60. Tinkham A, Kaner C (2003b) Learning styles and exploratory testing. Proceedings of the Pacific Northwest Software Quality Conference

  61. Tsang EWK, Kwan K-M (1999) Replication and theory development in organizational science: a critical realist perspective. Acad Manag Rev 24:759–780

    Google Scholar 

  62. Tuomikoski J, Tervonen I (2009) Absorbing software testing into the scrum method. Proceedings of 10th International Conference on Product-Focused Software Process Improvement 32 LNBIP

  63. Våga J, Amland S (2002) Managing high-speed web testing. In: Meyerhoff D, Laibarra B, van der Pouw Kraan R, Wallet A (eds) Software quality and software testing in internet times. Springer, Berlin, pp 23–30

    Google Scholar 

  64. Vegas S, Juristo N, Moreno A et al (2006) Analysis of the influence of communication between researchers on experiment replication. Proceedings of the 2006 ACM/IEEE international symposium on Empirical Software Engineering. ACM, New York, NY, USA, pp 28–37

  65. Whittaker JA (2003) How to break software a practical guide to testing. Addison Wesley, Boston

    Google Scholar 

  66. Whittaker JA (2009) Exploratory software testing: tips, tricks, tours, and techniques to guide test design. Addison-Wesley Professional, Boston, MA, USA

  67. Wohlin C, Runeson P, Höst M et al (2000) Experimentation in software engineering: An introduction. Kluwer Academic Publishers, Boston

    Google Scholar 

  68. Wood B, James D (2003) Applying session-based testing to medical software. Med Device Diagn Ind 25:90

    Google Scholar 

  69. Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. ACM SIGSOFT Softw Eng Notes 22:262–277. doi:10.1145/267895.267915

    Article  Google Scholar 

  70. Yatani K (2010) Statistics for HCI Research: Mann-Whitney’s U test. In: Statistics for HCI research. http://yatani.jp/HCIstats/MannWhitney. Accessed 28 Jun 2012

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Juha Itkonen.

Additional information

Communicated by: Jeffrey C. Carver, Natalia Juristo, Teresa Baldassarre, Sira Vegas

Appendices

Appendix A: Summary of the Survey Questions

Background

  • Years of university studies

  • Study credits

  • How many years of experience do you have on the following areas?

    • Professional software development (any kind of role in development)

    • Professional programming (as a developer, programmer or equivalent)

    • Professional software testing (as a tester, developer, or equivalent)

    • Other kind of experience in software development

  • Have you got any training on software testing before this course? (Yes or No)

    • What kind of training?

Coverage

  • Assess the coverage of your testing on the following features

    • 4-step ordinal scale: not covered at all—covered superficially—basic functions well covered—covered thoroughly

Exploratory Approach

  • How easy was the exploratory testing approach to apply in practice?

    • 7-step ordinal scale: (1) difficult … (4) neutral … (7) very easy

  • How useful was the provided test charter for structuring and guiding your testing?

    • 7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful

  • How useful was the exploratory testing approach for finding defects?

    • 7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful

  • What problems or shortcomings did you experience in the exploratory testing approach?

Test-case Based Approach

  • How easy were your own test cases to execute in practice?

    • 7-step ordinal scale: (1) difficult … (4) neutral … (7) very easy

  • How useful were your own test cases for structuring and guiding your testing?

    • 7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful

  • How useful were your own test cases for finding defects?

    • 7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful

  • What problems or shortcomings did your test cases have?

  • Which one of the two testing approaches (ET or TCT) gave you a better confidence to the quality of your testing, and why?

Appendix B: Contents of the ET Charter

  1. 1.

    What—tested areas

    Select the correct description of tested features for your exploratory testing and remove the other one:

    • Feature Set B1

      Search and replace (User’s Guide chapter 5)

      • Searching For Text

      • Replacing Text

        • Text Replace

      • HyperSearch

      • Multiple File Search

      • The Search Bar

      + Applicable shortcuts in Appendix A

    • Feature Set B2

      Editing source code (User’s Guide chapter 6)

      (test for one edit mode, e.g. java-mode)

      • Tabbing and Indentation

        • Soft Tabs

        • Automatic Indent

      • Commenting Out Code

      • Bracket Matching

      • Folding

        • Collapsing and Expanding Folds

        • Navigating Around With Folds

        • Miscellaneous Folding Commands

        • Narrowing

      + Applicable shortcuts in Appendix A

  2. 2.

    Why—goal and focus

    Perform testing from the viewpoint of a typical user and pay attention to following issues:

    • Does the function work as described in the user manual?

    • Does the function do any things that it should not do?

    • From the viewpoint of a typical user, does the function work as the user would expect and want?

    • What interactions the function has or might have with another functions, settings, data, or configuration of the application; do these interactions work correctly and as the user would expect and want them to work?

    Focus into functionality in your testing. Try to test exceptional cases, invalid as well as valid inputs, things that the user could do wrong, and typical error situations. However, do not test external and environment related (e.g. hardware) errors and exceptions (such as very low memory, broken hard drive, corrupted files, etc.).

  3. 3.

    How—approach

    Use the jEdit User’s Guide as the specification for the features, and utilize also your own knowledge and experience since the User’s Guide is neither comprehensive nor unambiguous. Use the following testing strategies for functional testing.

    Domain testing

    • equivalence partitioning

    • boundary value analysis

    Combination testing

    • Base choice strategy

    • Pair-wise (all-pairs) strategy

  4. 4.

    Exploration log

    SESSION START TIME: 2006-mm-dd hh:mm

    TESTER: _

    VERSION: jEdit 4.2 variant for T-76.5613 exercise

    ENVIRONMENT: _

    1. 4.1

      Task breakdown

      DURATION (hh:mm): __:__

      TEST DESIGN AND EXECUTION (percent): _%

      BUG INVESTIGATION AND REPORTING (percent): _%

      SESSION SETUP (percent): _%

    2. 4.2

      Test Data and Tools

      What data files and tools were used in testing?

    3. 4.3

      Test notes

      • Test notes that describe what was done, and how.

      • Detailed enough to be able to use in briefing the test session with other persons.

      • Detailed enough to be able to reproduce failures.

    4. 4.4

      Defects

      Time stamp, short note, Bugzilla bug ID

    5. 4.5

      Issues

      Any observations, issues, new feature requests and questions that came up during testing but were not reported as bugs.

Appendix C The Target Application and Features

The target of testing in the both experiments was jEdit open source text editor, version 4.2. with seeded defects in the tested features.

The official version and documentation of the target software can be accessed at: http://sourceforge.net/projects/jedit/files/jedit/4.2/

The user’s guide used as the source documentation for testing can be accessed at:

http://sourceforge.net/projects/jedit/files/jedit/4.2/jedit42manual-a4.pdf/download

The target feature sets used in the experiments were the following:

  • Feature Set A (Used in the original experiment)

    • Working with files (User’s Guide chapter 4, pp. 11–12, 17)

      • Creating new files

      • Opening files (excluding GZipped files)

      • Saving files

      • Closing Files and Exiting jEdit

    • Editing text (User’s Guide chapter 5, 18–23)

      • Moving The Caret

      • Selecting Text

        • Range Selection

        • Rectangular Selection

        • Multiple Selection

      • Inserting and Deleting Text

      • Working With Words

        • What’s a Word?

      • Working With Lines

      • Working With Paragraphs

      • Wrapping Long Lines

        • Soft Wrap

        • Hard Wrap

      And the applicable shortcuts (User’s Guide Appendix A, pp. 46–50)

  • Feature Set B1 (Used in the original and replicated experiment)

    • Search and replace (User’s Guide chapter 5, pp. 26–29)

      • Searching For Text

      • Replacing Text

        • Text Replace

      • HyperSearch

      • Multiple File Search

      • The Search Bar

      And the applicable shortcuts (User’s Guide Appendix A, pp. 46–50)

  • Feature Set B2 (Used in the original and replicated experiment)

    • Editing source code (User’s Guide chapter 6, pp. 30–36)

      • Tabbing and Indentation

        • Soft Tabs

        • Automatic Indent

      • Commenting Out Code

      • Bracket Matching

      • Folding

        • Collapsing and Expanding Folds

        • Navigating Around With Folds

        • Miscellaneous Folding Commands

        • Narrowing

      And the applicable shortcuts (User’s Guide Appendix A, pp. 46–50)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Itkonen, J., Mäntylä, M.V. Are test cases needed? Replicated comparison between exploratory and test-case-based software testing. Empir Software Eng 19, 303–342 (2014). https://doi.org/10.1007/s10664-013-9266-8

Download citation

Keywords

  • Software testing
  • Manual testing
  • Test cases
  • Exploratory testing
  • Experiment
  • Effectiveness
  • Efficiency