Advertisement

Empirical Software Engineering

, Volume 10, Issue 4, pp 405–435 | Cite as

Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact

  • Hyunsook Do
  • Sebastian Elbaum
  • Gregg Rothermel
Article

Abstract

Where the creation, understanding, and assessment of software testing and regression testing techniques are concerned, controlled experimentation is an indispensable research methodology. Obtaining the infrastructure necessary to support such experimentation, however, is difficult and expensive. As a result, progress in experimentation with testing techniques has been slow, and empirical data on the costs and effectiveness of techniques remains relatively scarce. To help address this problem, we have been designing and constructing infrastructure to support controlled experimentation with testing and regression testing techniques. This paper reports on the challenges faced by researchers experimenting with testing techniques, including those that inform the design of our infrastructure. The paper then describes the infrastructure that we are creating in response to these challenges, and that we are now making available to other researchers, and discusses the impact that this infrastructure has had and can be expected to have.

Keywords

Software testing regression testing controlled experimentation experiment infrastructure 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrews, J., Briand, L., and Labiche, Y. 2005. Is mutation an appropriate tool for testing experiments? In International Conference on Software Engineering, May.Google Scholar
  2. Basili, V., Selby, R., Heinz, E., and Hutchens, D. 1986. Experimentation in software engineering. IEEE Transactions on Software Engineering 12(7): July, 733–743.Google Scholar
  3. Basili, V., Shull, F., and Lanubile, F. 1999. Building knowledge through families of experiments. IEEE Transactions on Software Engineering 25(4): 456–473.CrossRefGoogle Scholar
  4. Beizer, B. 1990. Software Testing Techniques. New York, NY: Van Nostrand Reinhold.Google Scholar
  5. Bible, J., Rothermel, G., and Rosenblum, D. 2001. Coarse-and fine-grained safe regression test selection. ACM Transactions on Software Engineering and Methodology 10(2): April, 149–183.CrossRefGoogle Scholar
  6. Binkley, D. 1997. Semantics guided regression test cost reduction. IEEE Transactions on Software Engineering 23(8): August, 498–516.CrossRefGoogle Scholar
  7. Binkley, D., Capellini, R., Raszewski, L., and Smith, C. 2001. An implementation of and experiment with semantic differencing. In International Conference on Software Maintenance, November.Google Scholar
  8. Budd, T., and Gopal, A. 1985. Program testing by specification mutation. Computer Languages 10(1): 63–73.CrossRefGoogle Scholar
  9. Chen, Y. F., Rosenblum, D. S., and Vo, K. P. 1994. Test Tube: A system for selective regression testing. In Proceedings of International Conference on Software Engineering, May, pp. 211–220.Google Scholar
  10. Coen-Porisini, A., Denaro, G., Ghezzi, C., and Pezze, P. 2001. Using symbolic execution for verifying safety-critical systems. In Proceedings of ACM Foundations of Software Engineering.Google Scholar
  11. Delamaro, M., Maldonado, J., and Mathur, A. 2001. Interface mutation: An approach for integration testing. IEEE Transactions on Software Engineering 27(3): March, 228–247.Google Scholar
  12. Do, H., and Rothermel, G. September 2005. A controlled experiment assessing test case prioritization techniques via mutation faults. In Proceedings of International Conference on Software Maintenance, September.Google Scholar
  13. Do, H., Rothermel, G., and Kinneer, A. 2004. Empirical studies of test case prioritization in a JUnit testing environment. In International Conference on Software Maintenace, September.Google Scholar
  14. Elbaum, S., Gable, D., and Rothermel, G. November 2001a. The impact of software evolution on code coverage. In Proceedings of International Conference on Software Maintenance, November, pp. 169–179.Google Scholar
  15. Elbaum, S., Malishevsky, A., and Rothermel, G. 2001b. Incorporating varyingtest costs and fault severities into test case prioritization. In International Conference on Software Maintenance, May, pp. 329–338.Google Scholar
  16. Elbaum, S., Kallakuri, P., Malishevsky, A., Rothermel, R., and Kanduri, S. 2003. Understanding the effects of changes on the cost-effectiveness of regression testing techniques. Journal of Software Testing, Verification and Reliability 12(2): June, 65–83.Google Scholar
  17. Ernst, M., Czeisler, A., Griswold, W., and Notkin, D. 2000. Quickly detecting relevant program invariants. In International Conference on Software Engineering, June.Google Scholar
  18. Fewster, M., and Graham, D. 1999. Software Test Automation: Effective Use of Test ExecutionTools. Reading, MA: Addison-Wesley.Google Scholar
  19. Frankl, P. G., and Weiss, S. N. August 1993. An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Transactions on Software Engineering 19(8): 774–787.CrossRefGoogle Scholar
  20. Harder, M., and Ernst, M. May 2003. Improving test suites via operational abstraction. In Proceedings of International Conference on Software Engineering, May.Google Scholar
  21. Hartmann, J., and Robson, D. J. 1990. Techniques for selective revalidation. IEEE Software 16(1): January, 31–38.CrossRefGoogle Scholar
  22. Hoffman, D., and Brealey, C. 1989. Module testcase generation. In Proceedings of 3rd Workshop on Software Testing, Analysis, and Verification, December, pp. 97–102.Google Scholar
  23. Hutchins, M., Foster, H., Goradia, T., and Ostrand, T. 1994. Experiments on the effectiveness of data flow-and control flow-based test adequacy criteria. In Proceedings of International Conference on a Software Engineering, pp. 191–200.Google Scholar
  24. Juristo, N., Moreno, A. M., and Vegas, S. 2004. Reviewing 25 years of testing technique experiments. Empirical Software Engineering: An International Journal 9(1), March.Google Scholar
  25. Kim, J., and Porter, A. 2002. A history-based test prioritization technique for regression testing in resource constrained environments. In In Proceedings of International Conference Software Engineering, May.Google Scholar
  26. Kitchenham, B., Pickard, L., and Pfleeger, S. 1995. Case studies for method and tool evaluation. IEEE Software 52–62, July.Google Scholar
  27. Kitchenham, B., Pfleeger, S., Pickard, L., Jones, P., Hoaglin, D., Emam, K., and Rosenberg, J. 2002. Preliminary guidelines for empirical research in software engineering. IEEE Transactions on Software Engineering 28(8): August, 721–734.CrossRefGoogle Scholar
  28. Leung, H. K. N., and White, L. 1989. Insights into- regression testing. In International Conference on SoftwareMaintenance, October, pp. 60–69.Google Scholar
  29. Malishevsky, A. G., Rothermel, G., and Elbaum, S. 2002. Modeling the cost–benefits tradeoffs for regression testing techniques. In Proceedings of International Conference on Software Maintenance, October, pp. 230–240.Google Scholar
  30. Marre, M., and Bertolino, A. 2003. Using spanning sets for coverage testing. IEEE Transactions on Software Engineering 29(11): November, 974–984.Google Scholar
  31. Offutt, A., Rothermel, G., Untch, R., and Zapf, C. 1996. An experimental determination of sufficient mutant operators. ACM Transactions on Software Engineering and Methodology 5(2): April, 99–118.Google Scholar
  32. Okun, V., Black, P., and Yesha, Y. 2003. Testing with model checkers: Insuring fault visibility. WSEAS Transactions on Systems 2(1): January, 77–82.Google Scholar
  33. Onoma, K., Tsai, W.-T., Poonawala, M., and Suganuma, H. 1988. Regression testing in an industrial environment. Communications of the ACM 41(5): May 81–86.CrossRefGoogle Scholar
  34. Ostrand, T. J., and Balcer, M. J. 1988. The category-partition method for specifying and generating functional tests. Communications of the ACM 31(6): June, 676–686.Google Scholar
  35. Pickard, L., and Kitchenham, B. August 1998. Combining empirical results in software engineering. Information and Software Technology 40(14): 811–821.CrossRefGoogle Scholar
  36. Rapps, S., and Weyuker, E. J. 1985. Selecting software test data using data flow information. IEEE Transactions on Software Engineering SE-11(4): April, 367–375.Google Scholar
  37. Rothermel, G., and Harrold, M. J. 1994. Selecting tests and identifying test coverage requirements for modified software. In Proceedings of International Symposium on Software Testing and Analysis, May.Google Scholar
  38. Rothermel, G., and Harrold, M. J. 1996. Analyzing regression test selection techniques. IEEE Transactions on Software Engineering 22(8): August 529–551.CrossRefGoogle Scholar
  39. Rothermel, G., and Harrold, M. J. 1997. A safe, efficient regression test selection technique. ACM Transactions on Software Engineering and Methodology 6(2): April, 173–210.CrossRefGoogle Scholar
  40. Rothermel, G., Untch, R., Chu, C., and Harrold, M. J. 2001. Test case prioritization. IEEE Transactions on Software Engineering 27(10): October, 929–948.CrossRefGoogle Scholar
  41. Rothermel, G., Elbaum, S., Malishevsky, A., Kallakuri, P., and Davia, B. May 2002. The impact of test suite granularity on the cost-effectiveness of regression testing. In Proceedings of International Conference onSoftware Engineering, May.Google Scholar
  42. Tichy, W., Lukowicz, P., Heinz, E., and Prechelt, L. 1995. Experimental evaluation in computer science: A quantitative study. Journal of Systems and Software 28(1): January, 9–18.CrossRefGoogle Scholar
  43. Trochim, W. 2000. The Research Methods Knowledge Base, 2nd edition. Cinncinnati, OH: Atomic Dog Publishing.Google Scholar
  44. Vokolos, F. I., and Frankl, P. G. 1998. Empirical evaluation of the textual differencing regression testing technique. In Proceedings of International Conference on Software Maintenance, November, pp. 44–53.Google Scholar
  45. Weyuker, E. 1988. The evaluation of program-based software test data adequacy criteria. Communications of the ACM 31(6) June 668–675.Google Scholar
  46. Wohlin, C., Runeson, P., Host, M., Ohlsson, M., Regnell, B., and Wesslen, A. 2000. Experimentation in Software Engineering: An Introduction. Boston, MA: Kluwer Academic Publishers.Google Scholar
  47. Wong, W. E., Horgan, J. R., Mathur, A. P., and Pasquini, A. 1997a. Test set size minimization and fault detection effectiveness: A case study in a space application. In Proceedings of the Computer Software Applications Conference, pp. 522–528.Google Scholar
  48. Wong, W. E., Horgan, J. R., London, S., and Agrawal, H. November 1997b. A study of effective regression testing in practice. In Proceedings of Eighth International Symposium Software Release Engineering, pp. 230–238.Google Scholar
  49. Xie, T., and Notkin, D. 2002. Macro and micro perspectives on strategic software qualityassurance in resource constrained environments. In Proceedings of EDSER-4, May.Google Scholar
  50. Yin, R. K. 1994. Case Study Research: Design and Methods (Applied Social Research Methods, Vol. 5). London, UK: Sage Publications.Google Scholar
  51. Zelkowitz, M., and Wallace, D. 1998. Experimental models for validating technology. IEEE Computer 31(5) 23–31.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Hyunsook Do
    • 1
  • Sebastian Elbaum
    • 1
  • Gregg Rothermel
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of Nebraska–LincolnLincolnUSA

Personalised recommendations