Are test cases needed? Replicated comparison between exploratory and test-case-based software testing

Itkonen, Juha; Mäntylä, Mika V.

doi:10.1007/s10664-013-9266-8

Are test cases needed? Replicated comparison between exploratory and test-case-based software testing

Published: 11 July 2013

Volume 19, pages 303–342, (2014)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Juha Itkonen¹ &
Mika V. Mäntylä^1,2

1792 Accesses
40 Citations
7 Altmetric
Explore all metrics

Abstract

Manual software testing is a widely practiced verification and validation method that is unlikely to fade away despite the advances in test automation. In the domain of manual testing, many practitioners advocate exploratory testing (ET), i.e., creative, experience-based testing without predesigned test cases, and they claim that it is more efficient than testing with detailed test cases. This paper reports a replicated experiment comparing effectiveness, efficiency, and perceived differences between ET and test-case-based testing (TCT) using 51 students as subjects, who performed manual functional testing on the jEdit text editor. Our results confirm the findings of the original study: 1) there is no difference in the defect detection effectiveness between ET and TCT, 2) ET is more efficient by requiring less design effort, and 3) TCT produces more false-positive defect reports than ET. Based on the small differences in the experimental design, we also put forward a hypothesis that the effectiveness of the TCT approach would suffer more than ET from time pressure. We also found that both approaches had distinctive issues: in TCT, the problems were related to correct abstraction levels of test cases, and the problems in ET were related to test design and logging of the test execution and results. Finally, we recognize that TCT has other benefits over ET in managing and controlling testing in large organizations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Excellence in Exploratory Testing: Success Factors in Large-Scale Industry Projects

Chapter 3 Efficient and Effective Exploratory Testing of Large-Scale Software Systems

Empirical Verification of TQED - A New Test Design Heuristic Technique

Notes

Available at http://www.soberit.hut.fi/jitkonen/Publications/Juha_Itkonen_Licentiate_Thesis_2008.pdf
jEdit, http://www.jedit.org/
http://www.cs.ua.edu/~carver/ReplicationGuidelines.htm
False defect reports refer to reported defects that cannot be understood, are duplicates, or report non-existing defects.
In the original study it says the average was 107.9. However, when doing re-analysis we found that three missing data points (=student that had not answered this question) had turned to zeros. Thus, changing zeros to missing increased the average slightly.
Burnstein I (2003) Practical Software Testing. Springer-Verlag, New York. (selected chapters)
http://www.bugzilla.org/
http://figshare.com/
If a previously unknown, valid defect was reported a new known defect and ID was created.
www.r-project.org
This is the number of false defect reports divided by all findings (both real and false defect reports).
The authors are aware of the controversy of calculating the mean value from the ordinal data. However, we felt that using mean would be more accurate than median, e.g., if an individual respondent’s coverage estimate median for ET and TCT coverage are both three, it could be result of ET coverage having a mean of 2.6, while TCT coverage would have mean value of 3.4. Obviously, in such a case, the respondent’s intention would be that TCT provided better coverage, but it would not be visible in the perceived coverage measure using the median.
Windows-Icons-Menus-Pointer

References

Abran A, Moore JW, Bourque P et al (2004) Guide to the software engineering body of knowledge 2004 version. IEEE Computer Society, Los Alamitos
Google Scholar
Andersson C, Runeson P (2002) Verification and validation in industry—a qualitative survey on the state of practice. Proceedings of International Symposium on Empirical Software Engineering. pp 37–47
Bach J (1999) General functionality and stability test procedure for certified for microsoft windows logo. http://www.satisfice.com/tools/procedure.pdf. Accessed 8 May 2013
Bach J (2000) Session-based test management. In: software testing and quality engineering. http://www.satisfice.com/articles/sbtm.pdf. Accessed 8 May 2013
Bach J (2003) Exploratory testing explained. http://www.satisfice.com/articles/et-article.pdf. Accessed 8 May 2013
Bach J (2004) Exploratory testing. In: van Veenendaal E (ed) The testing practitioner. Second. UTN Publishers, Den Bosch, pp 253–265
Google Scholar
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc., New York, USA
Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng 13:1278–1296
Article Google Scholar
Beizer B (1990) Software testing techniques. Van Nostrand Reinhold, New York
Google Scholar
Berner S, Weber R, Keller RK (2005) Observations and lessons learned from automated testing. Proceedings of International Conference on Software Engineering. pp 571–579
Bolton M (2005) Testing without a map. Better Software 7(1)
Carver JC (2010) Towards reporting guidelines for experimental replications: a proposal. 1st International Workshop on Replication in Empirical Software Engineering
Cohen DM, Dalal SR, Fredman ML, Patton GC (1997) The AETG system: an approach to testing based on combinatorial design. IEEE Trans Softw Eng 23:437–444. doi:10.1109/32.605761
Article Google Scholar
Copeland L (2004) A practitioner’s guide to software test design. Artech House Publishers, Boston
Google Scholar
Craig RD, Jaskiel SP (2002) Systematic software testing. Artech House Publishers, Boston
MATH Google Scholar
Crispin L, Gregory J (2009) Agile testing: A practical guide for testers and agile teams. Addison-Wesley, Boston
Google Scholar
Do Nascimento LHO, Machado PDL (2007) An experimental evaluation of approaches to feature testing in the mobile phone applications domain. Proceedings of the Workshop on Domain Specific Approaches to Software Test Automation. pp 27–33
Ellis PD (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results, 1st edn. Cambridge University Press, New York, USA
Engström E, Runeson P (2010) A qualitative survey of regression testing practices. Proceedings of International Conference on Product-Focused Software Process Improvement. pp 3–16
Engström E, Runeson P (2013) Test overlay in an emerging software product line—An industrial case study. Inf Softw Technol 55:581–594. doi:10.1016/j.infsof.2012.04.009
Article Google Scholar
Field A (2005) Discovering statistics using SPSS, 2nd edn. Sage Publications Ltd, London, UK
Glass RL (2002) Project retrospectives, and why they never happen. IEEE Softw 19:112. doi:10.1109/MS.2002.1032872
Google Scholar
Grechanik M, Jones JA, Orso A, van der Hoek A (2010) Bridging gaps between developers and testers in globally-distributed software development. Proceedings of the FSE/SDP workshop on Future of software engineering research. ACM, New York, NY, USA, pp 149–154
Houdek F, Schwinn T, Ernst D (2002) Defect detection for executable specifications—an experiment. Int J Softw Eng Knowl Eng 12:637–655
Article Google Scholar
Huang L, Boehm B (2006) How much software quality investment is enough: a value-based approach. IEEE Softw 23:88–95. doi:10.1109/MS.2006.127
Article Google Scholar
Itkonen J (2008) Do test cases really matter? An experiment comparing test case based and exploratory testing. Licentiate Thesis, Helsinki University of Technology
Itkonen J (2013) ET vs. TCT Experiment replication dataset. In: Figshare.com. http://dx.doi.org/10.6084/m9.figshare.689809. Accessed 29 Apr 2013
Itkonen J, Rautiainen K (2005) Exploratory testing: a multiple case study. Proceedings of International Symposium on Empirical Software Engineering. pp 84–93
Itkonen J, Mäntylä MV, Lassenius C (2007) Defect detection efficiency: test case based vs. exploratory testing. Proceedings of International Symposium on Empirical Software Engineering and Measurement. pp 61–70
Itkonen J, Mäntylä MV, Lassenius C (2009) How do testers do it? An exploratory study on manual testing practices. Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on. pp 494–497
Itkonen J, Mäntylä MV, Lassenius C (2013) The role of the tester’s knowledge in exploratory software testing. IEEE Trans Softw Eng 39:707–724. doi:10.1109/TSE.2012.55
Article Google Scholar
Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer Academic Publishers, Boston
Book MATH Google Scholar
Juristo N, Moreno AM, Vegas S (2004) Reviewing 25 years of testing technique experiments. Empir Softw Eng 9:7–44
Article Google Scholar
Juristo N, Vegas S, Solari M et al (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading by stepwise abstraction applied by subjects. Proceedings of Fifth International Conference on Software Testing, Verification and Validation. pp 330–339
Kamsties E, Lott CM (1995) An empirical evaluation of three defect-detection techniques. In: Schäfer W, Botella P (eds) Proceedings of ESEC’95. Springer Berlin Heidelberg, pp 362–383
Kaner C, Falk J, Nguyen HQ (1999) Testing computer software. Wiley, New York
Google Scholar
Kaner C, Bach J, Pettichord B (2002) Lessons learned in software testing. Wiley, New York
Google Scholar
Kitchenham B (2008) The role of replications in empirical software engineering—a word of warning. Empir Softw Eng 13:219–221. doi:10.1007/s10664-008-9061-0
Article Google Scholar
Lyndsay J, van Eeden N (2003) Adventures in session-based testing. http://www.workroom-productions.com/papers/AiSBTv1.2.pdf. Accessed 20 Jun 2012
Mäntylä MV, Itkonen J (2013) More testers—The effect of crowd size and time restriction in software testing. Inf Softw Technol 55:986–1003. doi:10.1016/j.infsof.2012.12.004
Article Google Scholar
Mäntylä MV, Vanhanen J (2011) Software deployment activities and challenges—a case study of four software product companies. Proceedings of the 15th European Conference on Software Maintenance and Reengineering. pp 131–140
Martin D, Rooksby J, Rouncefield M, Sommerville I (2007) “Good” organisational reasons for “bad” software testing: an ethnographic study of testing in a small software company. Proceedings of International Conference on Software Engineering. pp 602–611
McConnell S (2004) Code complete. Microsoft Press, Redmond, WA, USA
McDaniel LS (1990) The effects of time pressure and audit program structure on audit performance. J Account Res 28:267–285. doi:10.2307/2491150
Article Google Scholar
Mouchawrab S, Briand LC, Labiche Y, Di Penta M (2011) Assessing, comparing, and combining state machine-based testing and structural testing: a series of experiments. IEEE Trans Softw Eng 37:161–187. doi:10.1109/TSE.2010.32
Article Google Scholar
Myers GJ (1978) A controlled experiment in program testing and code walkthroughs/inspections. Commun ACM 21:760–768. doi:10.1145/359588.359602
Article Google Scholar
Myers GJ (1979) The art of software testing. Wiley, New York
Google Scholar
Ng SP, Murnane T, Reed K, et al (2004) A preliminary survey on software testing practices in Australia. Proceedings of the Australian Software Engineering Conference. pp 116–125
Page A, Johnston K, Rollison B (2008) How we test software at microsoft. Microsoft Press, Redmond, WA, USA
Pichler J, Ramler R (2008) How to test the intangible properties of graphical user interfaces? Proceedings of 1st International Conference on Software Testing, Verification, and Validation. pp 494–497
Rafi DM, Moses KRK, Petersen K, Mantyla MV (2012) Benefits and limitations of automated software testing: Systematic literature review and practitioner survey. 2012 7th International Workshop on Automation of Software Test (AST). pp 36–42
Ramasubbu N, Balan RK (2009) The impact of process choice in high maturity environments: An empirical analysis. Proceedings of 31st International Conference on Software Engineering. pp 529–539
Runeson P, Andersson C, Thelin T et al (2006) What do we know about defect detection methods? IEEE Softw 23:82–90. doi:10.1109/MS.2006.89
Article Google Scholar
Shah SMA, Morisio M, Torchiano M (2012) The impact of process maturity on defect density. Proceedings of International symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 315–318
Shoaib L, Nadeem A, Akbar A (2009) An empirical evaluation of the influence of human personality on exploratory software testing. Proceedings of IEEE International Multitopic Conference. pp 1–6
Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in Empirical Software Engineering. Empir Softw Eng 13:211–218. doi:10.1007/s10664-008-9060-1
Article Google Scholar
Spolsky J (2001) Big Macs vs. the naked chef. In: Joel on Software. http://www.joelonsoftware.com/articles/fog0000000024.html. Accessed 28 Jun 2012
Tichy WF (2000) Hints for reviewing empirical work in software engineering. Empir Softw Eng 5:309–312. doi:10.1023/A:1009844119158
Article MathSciNet Google Scholar
Tinkham A, Kaner C (2003a) Exploring exploratory testing. Proceedings of the Software Testing Analysis & Review Conference. p 9
Tinkham A, Kaner C (2003b) Learning styles and exploratory testing. Proceedings of the Pacific Northwest Software Quality Conference
Tsang EWK, Kwan K-M (1999) Replication and theory development in organizational science: a critical realist perspective. Acad Manag Rev 24:759–780
Google Scholar
Tuomikoski J, Tervonen I (2009) Absorbing software testing into the scrum method. Proceedings of 10th International Conference on Product-Focused Software Process Improvement 32 LNBIP
Våga J, Amland S (2002) Managing high-speed web testing. In: Meyerhoff D, Laibarra B, van der Pouw Kraan R, Wallet A (eds) Software quality and software testing in internet times. Springer, Berlin, pp 23–30
Chapter Google Scholar
Vegas S, Juristo N, Moreno A et al (2006) Analysis of the influence of communication between researchers on experiment replication. Proceedings of the 2006 ACM/IEEE international symposium on Empirical Software Engineering. ACM, New York, NY, USA, pp 28–37
Whittaker JA (2003) How to break software a practical guide to testing. Addison Wesley, Boston
Google Scholar
Whittaker JA (2009) Exploratory software testing: tips, tricks, tours, and techniques to guide test design. Addison-Wesley Professional, Boston, MA, USA
Wohlin C, Runeson P, Höst M et al (2000) Experimentation in software engineering: An introduction. Kluwer Academic Publishers, Boston
Book Google Scholar
Wood B, James D (2003) Applying session-based testing to medical software. Med Device Diagn Ind 25:90
Google Scholar
Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. ACM SIGSOFT Softw Eng Notes 22:262–277. doi:10.1145/267895.267915
Article Google Scholar
Yatani K (2010) Statistics for HCI Research: Mann-Whitney’s U test. In: Statistics for HCI research. http://yatani.jp/HCIstats/MannWhitney. Accessed 28 Jun 2012

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Aalto University, FI-00076, Aalto, Finland
Juha Itkonen & Mika V. Mäntylä
Department of Computer Science, Lund University, Lund, Sweden
Mika V. Mäntylä

Authors

Juha Itkonen
View author publications
You can also search for this author in PubMed Google Scholar
Mika V. Mäntylä
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juha Itkonen.

Additional information

Communicated by: Jeffrey C. Carver, Natalia Juristo, Teresa Baldassarre, Sira Vegas

Appendices

Appendix A: Summary of the Survey Questions

1.1 Background

Years of university studies
Study credits
How many years of experience do you have on the following areas?
- Professional software development (any kind of role in development)
- Professional programming (as a developer, programmer or equivalent)
- Professional software testing (as a tester, developer, or equivalent)
- Other kind of experience in software development
Have you got any training on software testing before this course? (Yes or No)
- What kind of training?

1.2 Coverage

Assess the coverage of your testing on the following features
- 4-step ordinal scale: not covered at all—covered superficially—basic functions well covered—covered thoroughly

1.3 Exploratory Approach

How easy was the exploratory testing approach to apply in practice?
- 7-step ordinal scale: (1) difficult … (4) neutral … (7) very easy
How useful was the provided test charter for structuring and guiding your testing?
- 7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful
How useful was the exploratory testing approach for finding defects?
- 7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful
What problems or shortcomings did you experience in the exploratory testing approach?

1.4 Test-case Based Approach

How easy were your own test cases to execute in practice?
- 7-step ordinal scale: (1) difficult … (4) neutral … (7) very easy
How useful were your own test cases for structuring and guiding your testing?
- 7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful
How useful were your own test cases for finding defects?
- 7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful
What problems or shortcomings did your test cases have?
Which one of the two testing approaches (ET or TCT) gave you a better confidence to the quality of your testing, and why?

Appendix B: Contents of the ET Charter

1.
What—tested areas

Select the correct description of tested features for your exploratory testing and remove the other one:
- Feature Set B1
  
  Search and replace (User’s Guide chapter 5)
  - Searching For Text
  - Replacing Text
    - Text Replace
  - HyperSearch
  - Multiple File Search
  - The Search Bar
  + Applicable shortcuts in Appendix A
- Feature Set B2
  
  Editing source code (User’s Guide chapter 6)
  
  (test for one edit mode, e.g. java-mode)
  - Tabbing and Indentation
    - Soft Tabs
    - Automatic Indent
  - Commenting Out Code
  - Bracket Matching
  - Folding
    - Collapsing and Expanding Folds
    - Navigating Around With Folds
    - Miscellaneous Folding Commands
    - Narrowing
  + Applicable shortcuts in Appendix A
2.
Why—goal and focus

Perform testing from the viewpoint of a typical user and pay attention to following issues:
- Does the function work as described in the user manual?
- Does the function do any things that it should not do?
- From the viewpoint of a typical user, does the function work as the user would expect and want?
- What interactions the function has or might have with another functions, settings, data, or configuration of the application; do these interactions work correctly and as the user would expect and want them to work?
Focus into functionality in your testing. Try to test exceptional cases, invalid as well as valid inputs, things that the user could do wrong, and typical error situations. However, do not test external and environment related (e.g. hardware) errors and exceptions (such as very low memory, broken hard drive, corrupted files, etc.).
3.
How—approach

Use the jEdit User’s Guide as the specification for the features, and utilize also your own knowledge and experience since the User’s Guide is neither comprehensive nor unambiguous. Use the following testing strategies for functional testing.

Domain testing
- equivalence partitioning
- boundary value analysis
Combination testing
- Base choice strategy
- Pair-wise (all-pairs) strategy
4.
Exploration log

SESSION START TIME: 2006-mm-dd hh:mm

TESTER: _

VERSION: jEdit 4.2 variant for T-76.5613 exercise

ENVIRONMENT: _
1. 4.1
  Task breakdown
  
  DURATION (hh:mm): __:__
  
  TEST DESIGN AND EXECUTION (percent): _%
  
  BUG INVESTIGATION AND REPORTING (percent): _%
  
  SESSION SETUP (percent): _%
2. 4.2
  Test Data and Tools
  
  What data files and tools were used in testing?
3. 4.3
  Test notes
  - Test notes that describe what was done, and how.
  - Detailed enough to be able to use in briefing the test session with other persons.
  - Detailed enough to be able to reproduce failures.
4. 4.4
  Defects
  
  Time stamp, short note, Bugzilla bug ID
5. 4.5
  Issues
  
  Any observations, issues, new feature requests and questions that came up during testing but were not reported as bugs.

Appendix C The Target Application and Features

The target of testing in the both experiments was jEdit open source text editor, version 4.2. with seeded defects in the tested features.

The official version and documentation of the target software can be accessed at: http://sourceforge.net/projects/jedit/files/jedit/4.2/

The user’s guide used as the source documentation for testing can be accessed at:

http://sourceforge.net/projects/jedit/files/jedit/4.2/jedit42manual-a4.pdf/download

The target feature sets used in the experiments were the following:

Feature Set A (Used in the original experiment)
- Working with files (User’s Guide chapter 4, pp. 11–12, 17)
  - Creating new files
  - Opening files (excluding GZipped files)
  - Saving files
  - Closing Files and Exiting jEdit
- Editing text (User’s Guide chapter 5, 18–23)
  - Moving The Caret
  - Selecting Text
    - Range Selection
    - Rectangular Selection
    - Multiple Selection
  - Inserting and Deleting Text
  - Working With Words
    - What’s a Word?
  - Working With Lines
  - Working With Paragraphs
  - Wrapping Long Lines
    - Soft Wrap
    - Hard Wrap
  And the applicable shortcuts (User’s Guide Appendix A, pp. 46–50)
Feature Set B1 (Used in the original and replicated experiment)
- Search and replace (User’s Guide chapter 5, pp. 26–29)
  - Searching For Text
  - Replacing Text
    - Text Replace
  - HyperSearch
  - Multiple File Search
  - The Search Bar
  And the applicable shortcuts (User’s Guide Appendix A, pp. 46–50)
Feature Set B2 (Used in the original and replicated experiment)
- Editing source code (User’s Guide chapter 6, pp. 30–36)
  - Tabbing and Indentation
    - Soft Tabs
    - Automatic Indent
  - Commenting Out Code
  - Bracket Matching
  - Folding
    - Collapsing and Expanding Folds
    - Navigating Around With Folds
    - Miscellaneous Folding Commands
    - Narrowing
  And the applicable shortcuts (User’s Guide Appendix A, pp. 46–50)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Itkonen, J., Mäntylä, M.V. Are test cases needed? Replicated comparison between exploratory and test-case-based software testing. Empir Software Eng 19, 303–342 (2014). https://doi.org/10.1007/s10664-013-9266-8

Download citation

Published: 11 July 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10664-013-9266-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Are test cases needed? Replicated comparison between exploratory and test-case-based software testing

Abstract

Access this article

Similar content being viewed by others

Excellence in Exploratory Testing: Success Factors in Large-Scale Industry Projects

Chapter 3 Efficient and Effective Exploratory Testing of Large-Scale Software Systems

Empirical Verification of TQED - A New Test Design Heuristic Technique

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Summary of the Survey Questions

1.1 Background

1.2 Coverage

1.3 Exploratory Approach

1.4 Test-case Based Approach

Appendix B: Contents of the ET Charter

Appendix C The Target Application and Features

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Are test cases needed? Replicated comparison between exploratory and test-case-based software testing

Abstract

Access this article

Similar content being viewed by others

Excellence in Exploratory Testing: Success Factors in Large-Scale Industry Projects

Chapter 3 Efficient and Effective Exploratory Testing of Large-Scale Software Systems

Empirical Verification of TQED - A New Test Design Heuristic Technique

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Summary of the Survey Questions

1.1 Background

1.2 Coverage

1.3 Exploratory Approach

1.4 Test-case Based Approach

Appendix B: Contents of the ET Charter

Appendix C The Target Application and Features

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation