Skip to main content

Automating regression testing using web-based application similarities


Web-based applications are one of the most widely used types of software, and have become the backbone of many e-commerce and communications businesses. These applications are often mission-critical for many organizations, motivating their precise validation. Although regression testing has been widely used to gain confidence in the reliability of software by providing information about the quality of an application, it has suffered limited use in this domain due to the frequent nature of updates to websites and the difficulty of automatically comparing test case output. We present techniques to address these challenges in regression testing web-based applications. Without precise comparators, test cases that fail due to benign program evolutions must be manually inspected. Our approach harnesses the inherent similarities between unrelated web-based applications to provide fully automated solutions to reduce the number of such false positives, while simultaneously returning true faults. By applying a model derived from regression testing other programs, our approach can predict which test cases merit human inspection. Our method is 2.5 to 50 times as accurate as current industrial practice, but requires no user annotations.

This is a preview of subscription content, access via your institution.


  1. Apache Click. (2008)

  2. A7soft jexamxml is a java based command line xml diff tool for comparing and merging xml documents. (2009)

  3. Al-Ekram, R., Adma, A., Baysal, O.: diffX: an algorithm to detect changes in multi-version XML documents. In: Conference of the Centre for Advanced Studies on Collaborative Research, pp. 1–11. IBM Press, USA (2005)

  4. Benedikt, M., Freire, J., Godefroid, P.: Veriweb: automatically testing dynamic web sites. In: World Wide Web Conference, May (2002)

  5. Binder R.V.: Testing Object-Oriented Systems: Models, Patterns, and Tools. Addison-Wesley, Reading (1999)

    Google Scholar 

  6. Binkley, D.: Using semantic differencing to reduce the cost of regression testing. In: International Conference on Software Maintenance, pp. 41–50 (1992)

  7. Binkley D.: Semantics guided regression test cost reduction. IEEE Trans. Softw. Eng. 23(8), 498–516 (1997)

    Article  Google Scholar 

  8. Boehm B., Basili V.: Software defect reduction. IEEE Comput. Innov. Technol. Comput. Prof. 34(1), 135–137 (2001)

    Google Scholar 

  9. Chen, T.Y., Kuo, F.-C., Merkel, R.: On the statistical properties of the f-measure. In: International Conference on Quality Software, pp. 146–153 (2004)

  10. Dobolyi, K., Weimer, W.: Harnessing web-based application similarities to aid in regression testing. In: 20th International Symposium on Software Reliability Engineering, November (2009)

  11. Dobolyi, K., Weimer, W.: Modeling consumer-perceived web application fault severities for testing. Technical report, University of Virginia (2009)

  12. Dobolyi, K., Weimer, W,: Addressing high severity faults in web application testing. In: The IASTED International Conference on Software Engineering, February (2010)

  13. Elbaum, S., Karre, S., Rothermel, G.: Improving web application testing with user session data. In: International Conference on Software Engineering (2003)

  14. Ellims, M.,Ince, D., Petre, M.: The csaw c mutation tool: initial results, pp. 185–192 (2007)

  15. GCC-XML. (2008)

  16. Jean Harrold M., Gupta R., Soffa M.L.: A methodology for controlling the size of a test suite. ACM Trans. Softw. Eng. Methodol. 2(3), 270–285 (1993)

    Article  Google Scholar 

  17. Hieatt E., Mee R.: Going faster: testing the web application. IEEE Softw. 19(2), 60–65 (2002)

    Article  Google Scholar 

  18. Hooimeijer, P., Weimer, W.: Modeling bug report quality. In: Automated Software Engineering, pp. 34–43 (2007)

  19. Kapfhammer, G.M.: Software testing. In: The Computer Science Handbook, ch. 105. CRC Press, Boca Raton (2004)

  20. Karre S.: Leveraging user-session data to support web application testing. IEEE Trans. Softw. Eng. 31(3), 187–202 (2005)

    Article  Google Scholar 

  21. Knight, J.C., Ammann, P.: An experimental evaluation of simple methods for seeding program errors. In: International Conference on Software Engineering, pp. 337–342 (1985)

  22. Kohavi R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. Int. Joint Conf. Artif. Intell. 14(2), 1137–1145 (1995)

    Google Scholar 

  23. Lee, S.C.L., Offutt, J.: Generating test cases for xml-based web component interactions using mutation analysis. In: International Symposium on Software Reliability Engineering, p. 200 (2001)

  24. Di Lucca, G., Fasolino, A., Faralli, F., de Carlini, U.: Testing web applications. In: International Conference on Software Maintenance, p. 310 (2002)

  25. Memon, A., Banerjee, I., Hashmi, N., Nagarajan, A.: DART: a framework for regression testing “nightly/daily builds” of GUI applications. In: International Conference on Software Maintenance (2003)

  26. Meszaros, G.: Agile regression testing using record and playback. In: Object-oriented Programming, Systems, Languages, and Applications, pp. 353–360 (2003)

  27. Neamtiu I., Foster J.S., Hicks M.: Understanding source code evolution using abstract syntax tree matching. SIGSOFT Softw. Eng. Notes 30(4), 1–5 (2005)

    Article  Google Scholar 

  28. Offutt J.: Quality attributes of web software applications. IEEE Softw. 19(2), 25–32 (2002)

    Article  Google Scholar 

  29. Open-realty. (2010)

  30. Prestashop free open source e-commerce software for web 2.0. (2010)

  31. Pertet, S., Narsimhan, P.: Causes of failures in web applications. Technical Report CMU-PDL-05-109, Carnegie Mellon University, December (2005)

  32. Pressman R.S.: What a tangled web we weave [web engineering]. IEEE Softw. 17(1), 18–21 (2000)

    Article  Google Scholar 

  33. Raghavan, S., Rohana, R., Leon, D., Podgurski, A., Augustine, V.: Dex: a semantic-graph differencing tool for studying changes in large code bases. In: International Conference on Software Maintenance (2004)

  34. Ramamoothy C.V., Tsai W.-T.: Advances in software engineering. IEEE Comput. 29(10), 47–58 (1996)

    Google Scholar 

  35. Ricca F., Tonella P.: Testing processes of web applications. Ann. Softw. Eng. 14(1–4), 93–114 (2002)

    Article  MATH  Google Scholar 

  36. Rosenberg, S.: What you don’t know can cost you millions, (2009)

  37. Rothermel G., Untch R.J., Chu C.: Prioritizing test cases for regression testing. IEEE Trans. Softw. Eng. 27(10), 929–948 (2001)

    Article  Google Scholar 

  38. Salton G., McGill M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)

    Google Scholar 

  39. Seacord R.C., Plakosh D., Lewis G.A.: Modernizing Legacy Systems: Software Technologies, Engineering Process and Business Practices. Addison-Wesley Longman Publishing Co., Inc., Boston (2003)

    Google Scholar 

  40. Silva, L.M.: Comparing error detection techniques for web applications: an experimental study. In: Network Computing and Applications, pp. 144–151 (2008)

  41. Sneed, H.M.: Testing a web application. In: Workshop on Web Site Evolution, pp. 3–10 (2004)

  42. Soechting, E., Dobolyi, K., Weimer, W.: Syntactic regression testing for tree-structured output. In: International Symposium on Web Systems Evolution, September (2009)

  43. Soliman, F., Youssef, M.A.: Internet-based e-commerce and its impact on manufacturing and business operations. In: Industrial Management and Data Systems (2003)

  44. Sprenkle, S., Gibson, E., Sampath, S., Pollock, L.: Automated replay and failure detection for web applications. In: Automated Software Engineering (2005)

  45. Sprenkle, S., Hill, E., Pollock, L.: Learning effective oracle comparator combinations for web applications. In: International Conference on Quality Software (2007)

  46. Sprenkle, S., Pollock, L., Esquivel, H., Hazelwood, B., Ecott, S.: Automated oracle comparators for testing web applications. In: International Symposium on Reliability Engineering, pp. 117–126 (2007)

  47. Sutherland J.: Business objects in corporate information systems. ACM Comput. Surv. 27(2), 274–276 (1995)

    Article  MathSciNet  Google Scholar 

  48. txt2html—text to HTML converter. Technical report (2008)

  49. van Wyk K.R., McGraw G.: Bridging the gap between software development and information security. IEEE Secur. Privacy 3, 75–79 (2005)

    Google Scholar 

  50. Vanilla—free, open-source forum software. (2010)

  51. Vokolos, F.I., Frankl, P.G.: Pythia: a regression test selection tool based on textual differencing. In: Reliability, Quality and Safety of Software-Intensive Systems, pp. 3–21 (1997)

  52. Vqwiki open source project. (2008)

  53. Walcott, K.R., Soffa, M.L., Kapfhammer, G.M., Roos, R.S.: Timeaware test suite prioritization. In: International Symposium on Software Testing and Analysis (2006)

  54. Williamson, L.: IBM Rational software analyzer: beyond source code. In: Rational Software Developer Conference., June (2008)

  55. Xing Z., Stroulia E.: Differencing logical uml models. Autom. Softw. Eng. 14(2), 215–259 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kinga Dobolyi.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Dobolyi, K., Soechting, E. & Weimer, W. Automating regression testing using web-based application similarities. Int J Softw Tools Technol Transfer 13, 111–129 (2011).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Testing
  • Web
  • Oracle
  • Comparator
  • Model