Skip to main content
Log in

Continuous validation of performance test workloads

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

The rise of large-scale software systems poses many new challenges for the software performance engineering field. Failures in these systems are often associated with performance issues, rather than with feature bugs. Therefore, performance testing has become essential to ensuring the problem-free operation of these systems. However, the performance testing process is faced with a major challenge: evolving field workloads, in terms of evolving feature sets and usage patterns, often lead to “outdated” tests that are not reflective of the field. Hence performance analysts must continually validate whether their tests are still reflective of the field. Such validation may be performed by comparing execution logs from the test and the field. However, the size and unstructured nature of execution logs makes such a comparison unfeasible without automated support. In this paper, we propose an automated approach to validate whether a performance test resembles the field workload and, if not, determines how they differ. Performance analysts can then update their tests to eliminate such differences, hence creating more realistic tests. We perform six case studies on two large systems: one open-source system and one enterprise system. Our approach identifies differences between performance tests and the field with a precision of 92 % compared to only 61 % for the state-of-the-practice and 19 % for a conventional statistical comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Adam K.: Process a million songs with apache pig. http://blog.cloudera.com/blog/2012/08/process-a-million-songs-with-apache-pig/ (2012). Accessed 28 Oct 2015

  • Ausick, P.: NASDAQ gets off cheap in Facebook IPO SNAFU. http://finance.yahoo.com/news/nasdaq-gets-off-cheap-facebook-174557126.html (2012). Accessed 09 Dec 2014

  • Avritzer, A., Weyuker, E.J.: Generating test suites for software load testing. In: Proceedings of the International Symposium on Software Testing and Analysis, pp. 44–57 (1994)

  • Avritzer, A., Weyuker, E.J.: The automatic generation of load test suites and the assessment of the resulting software. Trans. Softw. Eng. 21(9), 705–716 (1995)

    Article  Google Scholar 

  • Barros, M.D., Shiau, J., Shang, C., Gidewall, K., Shi, H., Forsmann, J.: Web services wind tunnel: on performance testing large-scale stateful web services. In: International Conference on Dependable Systems and Networks, pp. 612–617 (2007)

  • Bataille, J.: Operational progress report. http://www.hhs.gov/digitalstrategy/blog/2013/12/operational-progress-report.html (2013). Accessed 01 Jun 2014

  • Benoit, D.: Nasdaqs blow-by-blow on what happened to Facebook. http://blogs.wsj.com/deals/2012/05/21/nasdaqs-blow-by-blow-on-what-happened-to-facebook/ (2013). Accessed 05 May 2014

  • Bernat, A.R., Miller B.P.: Anywhere, any-time binary instrumentation. In: Proceedings of the Workshop on Program Analysis for Software Tools, pp. 9–16 (2011)

  • Bertolotti, L., Calzarossa, M.C.: Models of mail server workloads. Perform. Eval. 46(2–3), 65–76 (2001)

    Article  MATH  Google Scholar 

  • Cai, Y., Grundy, J., Hosking, J.: Synthesizing client load models for performance engineering via web crawling. In: Proceedings of the International Conference on Automated Software Engineering, pp. 353–362 (2007)

  • Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  • Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J Math. Models Methods Appl. Sci. 1(4), 300–307 (2007)

    MathSciNet  Google Scholar 

  • Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc. VLDB Endow. 5(12), 1802–1813 (2012)

    Article  Google Scholar 

  • Cheng, J.: Steve jobs on MobileMe. http://arstechnica.com/apple/2008/08/steve-jobs-on-mobileme-the-full-e-mail/ (2008). Accessed 25 Jan 2014

  • Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Routledge, New York (1988)

    MATH  Google Scholar 

  • Coleman P.: The avoidable cost of downtime. http://www.ca.com//media/Files/SupportingPieces/acd_report_110110.ashx (2011). Accessed 14 Apr 2014

  • Cornelissen, B., Zaidman, A., van Deursen, A., Moonen, L., Koschke, R.: A systematic survey of program comprehension through dynamic analysis. Trans. Softw. Eng. 35(5), 684–702 (2009)

    Article  Google Scholar 

  • Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)

    Article  Google Scholar 

  • Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  • Draheim, D., Grundy, J., Hosking, J., Lutteroth, C., Weber, G.: Realistic load testing of web applications. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 57–68 (2006)

  • Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis, 1st edn. Wiley, New York (1973)

    MATH  Google Scholar 

  • Elliott, A.C.: Statistical Analysis Quick Reference Guidebook, 1st edn. Sage, Thousand Oaks (2006)

    Google Scholar 

  • Frades, I., Matthiesen, R.: Overview on techniques in cluster analysis. Bioinform. Methods Clin. Res. 593, 81–107 (2009)

    Article  Google Scholar 

  • Fulekar, M.H.: Bioinformatics: Applications in Life and Environmental Sciences, 1st edn. Springer, New York (2008)

    Google Scholar 

  • Greenwood, D., Lyell, M., Mallya, A., Suguri, H.: The IEEE FIPA approach to integrating software agents and web services. In: Proceedings of the International Joint Conference on Autonomous-Agents and Multiagent Systems, pp. 1412–1418 (2007)

  • Hadoop: http://hadoop.apache.org/ (2014). Accessed 17 Apr 2013

  • Hadoop-LZO: https://github.com/twitter/hadoop-lzo (2011). Accessed 28 Oct 2015

  • Harris, C.: IT downtime costs \({\$}\)26.5 billion in lost revenue. http://www.informationweek.com/it-downtime-costs-$265-billion-in-lost-revenue/d/d-id/1097919? (2011). Accessed 25 Jan 2014

  • Hassan, A.E., Flora, P.: Performance engineering in industry: current practices and adoption challenges. In: Proceedings of the International Workshop on Software and Performance, pp. 209–209 (2007)

  • Hassan, A.E., Martin, D.J., Flora, P., Mansfield, P., Dietz, D.: An industrial case study of customizing operational profiles using log compression. In: Proceedings of the 30th International Conference on Software Engineering, pp. 713–723 (2008)

  • Howell Jr., T., Dinan, S.: Price of fixing, upgrading obamacare website rises to \$121 million. http://www.washingtontimes.com/news/2014/apr/29/obamacare-website-fix-will-cost-feds-121-million/ (2014). Accessed 09 Dec 2014

  • Huang, A.: Similarity measures for text document clustering. In: Proceedings of the New Zealand Computer Science Research Student Conference, pp. 44–56 (2008)

  • Jiang Z.M.: Automated analysis of load testing results. PhD thesis, Queen’s University (2013)

  • Jiang, Z.M., Hassan, A.E., Hamann, G., Flora, P.: An automated approach for abstracting execution logs to execution events. J. Softw. Maint. Evol. 20(4), 249–267 (2008a)

    Article  Google Scholar 

  • Jiang, Z.M., Hassan, A.E., Hamann, G., Flora, P.: Automatic identification of load testing problems. In: Proceedings of the International Conference on Software Maintenance, pp. 307–316 (2008b)

  • Jiang, Z.M., Hassan, A.E., Hamann, G., Flora, P.: Automated performance analysis of load tests. In: Proceedings of the International Conference on Software Maintenance, pp. 125–134 (2009)

  • Kampenes, V.B., Dybå, T., Hannay, J.E., Sjøberg, D.I.K.: A systematic review of effect size in software engineering experiments. Inform. Softw. Technol. 49(11–12), 1073–1086 (2007)

    Article  Google Scholar 

  • Kavulya, S., Tan, J., Gandhi, R., Narasimhan, P.: An analysis of traces from a production mapreduce cluster. In: Proceedings of the International Conference on Cluster, Cloud and Grid Computing, pp. 94–103 (2010)

  • Klose, O.: Hadoop on Linux on Azure. http://blogs.technet.com/b/oliviaklose/archive/2014/06/17/hadoop-on-linux-on-azure-1.aspx (2014). Accessed 28 Oct 2015

  • Kremenek, T., Engler, D.: Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In: Proceedings of the International Conference on Static Analysis, pp. 295–315 (2003)

  • Krishnamurthy, D., Rolia, J.A., Majumdar, S.: A synthetic workload generation technique for stress testing session-based systems. Trans. Softw. Eng. 32(11), 868–882 (2006)

    Article  Google Scholar 

  • Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)

    Article  MATH  Google Scholar 

  • Laurenzano, M.A., Peraza, J., Carrington, L., Tiwari Jr., A., Ward, W., Campbell, R.: Pebil: binary instrumentation for practical data-intensive program analysis. Clust. Comput. 1(18), 1–14 (2015)

    Article  Google Scholar 

  • MapReduce Tutorial: http://hadoop.apache.org/docs/stable/mapred_tutorial.html (2014). Accessed 16 Jun 2014

  • Meira, J.A., de Almeida, E.C., Traon, Y.L., Sunye, G.: Peer-to-peer load testing. In: Proceedings of the International Conference on Software Testing, Verification and Validation, pp. 642–647 (2012)

  • Menascé, D.A.: Load testing of web sites. IEEE Internet Comput. 6(4), 70–74 (2002)

    Article  Google Scholar 

  • Metrics 20: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html (2014). Accessed 16 Jun 2014

  • Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)

    Article  Google Scholar 

  • Million Song Dataset: https://aws.amazon.com/datasets/million-song-dataset/ (2011). Accessed 28 Oct 2015

  • Million Song Dataset: http://labrosa.ee.columbia.edu/millionsong/ (2012). Accessed 28 Oct 2015

  • Mojena, R.: Hierarchical grouping methods and stopping rules: an evaluation. Comput. J. 20(4), 353–363 (1977)

    Article  MATH  Google Scholar 

  • Nagappan, M., Wu, K., Vouk M.A.: Efficiently extracting operational profiles from execution logs using suffix arrays. In: Proceedings of the International Symposium on Software Reliability Engineering, pp. 41–50 (2009)

  • OutputCommitter: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/OutputCommitter.html (2014). Accessed 16 Jun 2014

  • Parnas, D.L.: Software aging. In: Proceedings of the International Conference on Software Engineering, pp. 279–287 (1994)

  • PerfMon: http://perfmon.sourceforge.net/ (2014). Accessed 26 Jan 2014

  • RecordReader: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RecordReader.html (2014). Accessed 16 Jun 2014

  • Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)

    Article  MATH  Google Scholar 

  • Sandhya, N., Govardhan, A.: Analysis of similarity measures with wordnet based text document clustering. In: Proceedings of the International Conference on Information Systems Design and Intelligent Applications, pp. 703–714 (2012)

  • Shang, W.: Log engineering: towards systematic log mining to support the development of ultra-large scale systems. PhD thesis, Queen’s University (2014)

  • Shang, W., Jiang, Z.M., Adams, B., Hassan, A.E., Godfrey, M.W., Nasser, M., Flora, P.: An exploratory study of the evolution of communicated information about the execution of large software systems. In: Proceedings of the Working Conference on Reverse Engineering, pp. 335–344 (2011)

  • Shang, W., Jiang, Z.M., Hemmati, H., Adams, B., Hassan, A.E., Martin, P.: Assisting developers of big data analytics applications when deploying on hadoop clouds. In: Proceedings of the International Conference on Software Engineering, pp. 402–411 (2013)

  • Shang, W., Nagappan, M., Hassan, A.E.: Studying the relationship between logging characteristics and the code quality of platform software. Empir. Softw. Eng. 20(1), 20:1–20:27 (2015)

  • SiliconBeat: Firefox download stunt sets record for quickest meltdown. http://www.siliconbeat.com/2008/06/17/firefox-download-stunt-sets-record-for-quickest-meltdown/ (2008). Accessed 25 Jan 2014

  • Software Engineering Institute: Ultra-Large-Scale Systems: The Software Challenge of the Future. Carnegie Mellon University, Pittsburgh (2006)

    Google Scholar 

  • Sokal, R.R., Rohlf, F.J.: Biometry: The Principles and Practice of Statistics in Biological Research, 4th edn. W. H. Freeman, New York (2011)

    MATH  Google Scholar 

  • Student: The probable error of a mean. Biometrika 6(1), 1–25 (1908)

  • Syer, M.D., Adams, B., Hassan A.E.: Identifying performance deviations in thread pools. In: Proceedings of the International Conference on Software Maintenance, pp. 83–92 (2011a)

  • Syer, M.D., Adams, B., Hassan A.E.: Industrial case study on supporting the comprehension of system behaviour. In: Proceedings of the International Conference on Program Comprehension, pp. 215–216 (2011b)

  • Syer, M.D., Jiang, Z.M., Nagappan, M., Hassan, A.E., Nasser, M., Flora, P.: Leveraging performance counters and execution logs to diagnose memory-related performance issues. In: Proceedings of the International Conference on Software Maintenance, pp. 110–119 (2013)

  • Syer, M.D., Jiang, Z.M., Nagappan, M., Hassan, A.E., Nasser, M., Flora, P.: Continuous validation of load test suites. In: Proceedings of the International Conference on Performance Engineering, pp. 259–270 (2014)

  • Tan, P.N., Steinbach, M., Kumar, V.: Cluster Analysis: Basic Concepts and Algorithms, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston (2005)

    Google Scholar 

  • TextInputFormat: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TextInputFormat.html (2014). Accessed 16 Jun 2014

  • The Sarbanes-Oxley Act 2002: http://soxlaw.com/ (2014). Accessed 28 Jan 2014

  • Twitter: New Tweets per second record, and how! https://blog.twitter.com/2013/new-tweets-per-second-record-and-how (2013). Accessed 12 Dec 2014

  • Uh, G.R., Cohn, R., Yadavalli, B., Peri, R., Ayyagari, R.: Analyzing dynamic binary instrumentation overhead. In: Proceedings of the Workshop on Binary Instrumentation and Applications, pp. 56–64 (2006)

  • Voas, J.: Will the real operational profile please stand up? IEEE Softw. 17(2), 87–89 (2000)

    Google Scholar 

  • Welch, B.L.: The generalization of “student’s” problem when several different population variances are involved. Biometrika 34(1–2), 28–35 (1997)

    MathSciNet  MATH  Google Scholar 

  • Weyuker, E., Vokolos, F.: Experience with performance testing of software systems: issues, an approach, and case study. Trans. Softw. Eng. 26(12), 1147–1156 (2000)

    Article  Google Scholar 

  • Williams, A.: Amazon web services outage caused by memory leak and failure in monitoring alarm. http://techcrunch.com/2012/10/27/amazon-web-services-outage-caused-by-memory-leak-and-failure-in-monitoring-alarm/ (2012). Accessed 09 Dec 2014

  • Yuan, D., Luo, Y., Zhuang, X., Rodrigues, G.R., Zhao, X., Zhang, Y., Jain, P.U., Stumm, M.: Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In: Proceedings of the Conference on Operating Systems Design and Implementation, pp. 249–265 (2014)

  • Zhang, J., Cheung, S.C.: Automated test case generation for the stress testing of multimedia systems. Softw. Pract. Exp. 32, 1411–1435 (2002)

    Article  MATH  Google Scholar 

  • Zhang, Z., Cherkasova, L., Loo B.T. Benchmarking approach for designing a mapreduce performance model. In: Proceedings of the International Conference on Performance Engineering, pp. 253–258 (2013)

Download references

Acknowledgments

We would like to thank BlackBerry for providing access to the enterprise system used in our case study. The findings and opinions expressed in this paper are those of the authors and do not necessarily represent or reflect those of BlackBerry and/or its subsidiaries and affiliates. Moreover, our results do not reflect the quality of BlackBerry’s products. We would also like to thank Microsoft Azure for (1) providing us access to a large-scale deployment and (2) working closely with us to setup and troubleshoot our deployment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark D. Syer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Syer, M.D., Shang, W., Jiang, Z.M. et al. Continuous validation of performance test workloads. Autom Softw Eng 24, 189–231 (2017). https://doi.org/10.1007/s10515-016-0196-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-016-0196-8

Keywords

Navigation