Skip to main content

Advertisement

Log in

Assessing and optimizing the performance impact of the just-in-time configuration parameters - a case study on PyPy

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Many modern programming languages (e.g., Python, Java, and JavaScript) support just-in-time (JIT) compilation to speed up the execution of a software system. During runtime, the JIT compiler translates the frequently executed part of the system into efficient machine code, which can be executed much faster compared to the default interpreted mode. There are many JIT configuration parameters, which vary based on the programming languages and types of the jitting strategies (method vs. tracing-based). Although there are many existing works trying to improve various aspects of the jitting process, there are very few works which study the performance impact of the JIT configuration settings. In this paper, we performed an empirical study on the performance impact of the JIT configuration settings of PyPy. PyPy is a popular implementation of the Python programming language. Due to PyPy’s efficient JIT compiler, running Python programs under PyPy is usually much faster than other alternative implementations of Python (e.g., cPython, Jython, and IronPython). To motivate the need for tuning PyPy’s JIT configuration settings, we first performed an exploratory study on two microbenchmark suites. Our findings show that systems executed under PyPy’s default JIT configuration setting may not yield the best performance. Optimal JIT configuration settings vary from systems to systems. Larger portions of the code being jitted do not necessarily lead to better performance. To cope with these findings, we developed an automated approach, ESM-MOGA, to tuning the JIT configuration settings. ESM-MOGA, which stands for effect-size measure-based multi-objective genetic algorithm, automatically explores the PyPy’s JIT configuration settings for optimal solutions. Case studies on three open source systems show that systems running under the resulting configuration settings significantly out-perform (5% - 60% improvement in average peak performance) the default configuration settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. To ease explanation, we will call this the TechEmpower benchmark in the rest of this paper.

References

  • Abdessalem RB, Panichella A, Nejati S, Briand LC, Stifter T (2018) Testing autonomous cars for feature interaction failures using many-objective search. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering (ASE)

  • Alghmadi HM, Syer MD, Shang W, Hassan AE (2016) An automated approach for recommending when to stop performance tests. In: 2016 IEEE international conference on software maintenance and evolution (ICSME), pp 279–289

  • Apache JMeter (2015) http://jmeter.apache.org/, visited 2015-10-23

  • Barrett E, Bolz-Tereick CF, Killick R, Mount S, Tratt L (2017) Virtual machine warmup blows hot and cold. In: Proceedings of the ACM Programming Language 1(OOPSLA), pp 52:1–52:27. https://doi.org/10.1145/3133876

  • Bolz CF, Cuni A, Fijalkowski M, Rigo A (2009) Tracing the meta-level: Pypy’s tracing jit compiler. In: Proceedings of the 4th workshop on the implementation, compilation, optimization of object-oriented languages and programming systems (ICOOOLPS), pp 18–25

  • Bondi AB (2007) Automating the analysis of load test results to assess the scalability and stability of a component. In: Proceedings of the 2007 computer measurement group conference (CMG), pp 133–146

  • Brecht T, Arjomandi E, Li C, Pham H (2006) Controlling garbage collection and heap growth to reduce the execution time of java applications. ACM Trans Program Lang Syst 28(5):908–941. https://doi.org/10.1145/1152649.1152652

    Article  Google Scholar 

  • Candan KS, Li WS, Luo Q, Hsiung WP, Agrawal D (2001) Enabling dynamic content caching for database-driven web sites. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data (SIGMOD), pp 532–543

  • Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proceedings of the 2013 IEEE 6th international conference on software testing, verification and validation (ICST)

  • Clark M (2017) How the BBC builds websites that scale. http://www.creativebloq.com/features/how-the-bbc-builds-websites-that-scale. Last accessed 10/06/2017

  • Cramer T, Friedman R, Miller T, Seberger D, Wilson R, Wolczko M (1997) Compiling java just in time. IEEE Micro 17(3):36–43

    Article  Google Scholar 

  • Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In: International conference on parallel problem solving from nature. Springer, pp 849–858

  • Distributed Evolutionary Algorithms in Python (DEAP) (2017) https://github.com/DEAP/deap. Last accessed 10/06/2017

  • Duan S, Thummala V, Babu S (2009) Tuning Database Configuration Parameters with iTuned. Proceedings of the VLDB Endowment 2(1):1246–1257. https://doi.org/10.14778/1687627.1687767

    Article  Google Scholar 

  • Eaton K (2017) How One Second Could Cost Amazon $1.6 Billion In Sales. https://www.fastcompany.com/1825005/how-one-second-could-cost-amazon-16-billion-sales. Last accessed 10/06/2017

  • Evaluation tools in Python (DEAP) (2017) http://deap.readthedocs.io/en/master/api/tools.html?highlight=dominance. Last accessed 10/06/2017

  • Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th european conference on foundations of software engineering (ESEC/FSE)

  • Gal A, Eich B, Shaver M, Anderson D, Mandelin D, Haghighat MR, Kaplan B, Hoare G, Zbarsky B, Orendorff J, Ruderman J, Smith EW, Reitmaier R, Bebenita M, Chang M, Franz M (2009) Trace-based just-in-time type specialization for dynamic languages. In: Proceedings of the 30th ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 465–478

  • Gao R, Jiang ZMJ (2017) An exploratory study on assessing the impact of environment variations on the results of load tests. In: Proceedings of the 14th international conference on mining software repositories (MSR)

  • Gao R, Jiang ZMJ, Barna C, Litoiu M (2016) A framework to evaluate the effectiveness of different load testing analysis techniques. In: 2016 IEEE international conference on software testing, verification and validation (ICST)

  • Georges A, Buytaert D, Eeckhout L (2007) Statistically rigorous java performance evaluation. In: Proceedings of the 22nd international conference on object-oriented programming, systems, languages and applications (OOPSLA)

  • Gewirtz D (2017) Which programming languages are most popular (and what does that even mean)? http://www.zdnet.com/article/which-programming-languages-are-most-popular-and-what-does-that-even-mean/. Last accessed 10/06/2017

  • Golovin D, Solnik B, Moitra S, Kochanski G, Karro J, Sculley D (2017) Google vizier: a service for black-box optimization. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD)

  • Gong L, Pradel M, Sen K (2015) Jitprof: Pinpointing jit-unfriendly javascript code. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering (ESEC/FSE), pp 357–368

  • Grigorik I (2017) Optimizing Encoding and Transfer Size of Text-Based Assets. https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/optimize-encoding-and-transfer. Last accessed 10/06/2017

  • Hashemi M (2014) 10 Myths of Enterprise Python. https://www.paypal-engineering.com/2014/12/10/10-myths-of-enterprise-python/. Last accessed 10/06/2017

  • Henard C, Papadakis M, Harman M, Traon YL (2015) Combining multi-objective search and constraint solving for configuring large software product lines. In: Proceedings of the 37th international conference on software engineering (ICSE)

  • Hopkins WG (2016) A new view of statistics. [Online accessed 2017-10-14] http://www.sportsci.org/resource/stats/index.html

  • Hoskins DS, Colbourn CJ, Montgomery DC (2005) Software performance testing using covering arrays: Efficient screening designs with categorical factors. In: Proceedings of the 5th international workshop on software and performance (WOSP)

  • Hoste K, Georges A, Eeckhout L (2010) Automated just-in-time compiler tuning. In: Proceedings of the 8th annual IEEE/ACM international symposium on code generation and optimization (CGO), pp 62–72

  • IBM Java 8 JIT and AOT command-line options (2017) https://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/appendixes/cmdline/commands_jit.html. Last accessed 10/06/2017

  • Insights GP (2017) Remove Render-Blocking JavaScript. https://developers.google.com/speed/docs/insights/BlockingJS. Last accessed 10/06/2017

  • Jamshidi P, Siegmund N, Velez M, Kästner C, Patel A, Agarwal Y (2017) Transfer learning for performance modeling of configurable systems: an exploratory analysis. In: Proceedings of the international conference on automated software engineering (ASE)

  • Jamshidi P, Siegmund N, Velez M, Kästner C, Patel A, Agarwal Y (2017) Transfer learning for performance modeling of configurable systems: an exploratory analysis. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering (ASE)

  • Jantz MR, Kulkarni PA (2013) Exploring single and multilevel jit compilation policy for modern machines 1. ACM Trans Archit Code Optim (TACO) 10(4):22:1–22:29

    Google Scholar 

  • Java Microbenchmark Harness (JMH) (2017) http://openjdk.java.net/projects/code-tools/jmh/. Last accessed 10/06/2017

  • Jiang ZM, Hassan AE (2015) A survey on load testing of large-scale software systems. IEEE Trans Softw Eng 41:1–1. https://doi.org/10.1109/TSE.2015.2445340

    Article  Google Scholar 

  • Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) Systematic review: A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11-12):1073–1086

    Article  Google Scholar 

  • Komorn R (2016) Python in production engineering. https://code.facebook.com/posts/1040181199381023/python-in-production-engineering/. Last accessed 10/06/2017

  • Lengauer P, Mössenböck H (2014) The taming of the Shrew: increasing performance by automatic parameter tuning for java garbage collectors. In: Proceedings of the 5th ACM/SPEC international conference on performance engineering (ICPE), pp 111–122

  • Libič P, Bulej L, Horky V, Tůma P (2014) On the limits of modeling generational garbage collector performance. In: Proceedings of the 5th ACM/SPEC international conference on performance engineering (ICPE)

  • Lion D, Chiu A, Sun H, Zhuang X, Grcevski N, Yuan D (2016) Don’t get caught in the cold, warm-up your jvm: Understand and eliminate jvm warm-up overhead in data-parallel systems. In: Proceedings of the 12th USENIX conference on operating systems design and implementation (OSDI), pp 383–400

  • Martí L, García J, Berlanga A, Molina JM (2009) An approach to stopping criteria for multi-objective optimization evolutionary algorithms: The mgbm criterion. In: IEEE congress on evolutionary computation, 2009. CEC’09. IEEE, pp 1263–1270

  • Oaks S (2014) Java performance: the definitive guide, 1st. O’Reilly Media, Inc, Sebastopol

    Google Scholar 

  • Osogami T, Kato S (2007) Optimizing system configurations quickly by guessing at the performance. In: Proceedings of the 2007 ACM SIGMETRICS international conference on measurement and modeling of computer systems (SIGMETRICS)

  • Oracle Java 8 Advanced JIT Compiler Options (2017) https://docs.oracle.com/javase/8/docs/technotes/tools/windows/java.html#BABDDFII. Last accessed 10/06/2017

  • Performance monitoring tools for Linux (2015) https://github.com/sysstat/sysstat, visited 2015-10-23

  • PyPy speed center (2017) http://speed.pypy.org/. Last accessed 10/04/2017

  • Quokka CMS (Content Management System) - Python Flask and MongoDB (2017) http://quokkaproject.org/. Last accessed 10/06/2017

  • Replication package (2018) https://github.com/seasun525/PyPyJITTuner

  • Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys?. In: Annual meeting of the Florida Association of Institutional Research

  • Saleor - An e-commerce storefront for Python and Django (2017) http://getsaleor.com/. Last accessed 10/06/2017

  • Shamshiri S, Rojas JM, Fraser G, McMinn P (2015) Random or genetic algorithm search for object-oriented test suite generation?. In: Proceedings of the 2015 annual conference on genetic and evolutionary computation (GECCO)

  • Siegmund N, Grebhahn A, Apel S, Kästner C (2015) Performance-influence models for highly configurable systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, ESEC/FSE 2015. ACM

  • Siegmund N, Kolesnikov SS, Kästner C, Apel S, Batory D, Rosenmüller M, Saake G (2012) Predicting Performance via Automated Feature-interaction Detection. In: Proceedings of the 34th international conference on software engineering (ICSE)

  • Singer J, Brown G, Watson I, Cavazos J (2007) Intelligent selection of application-specific garbage collectors. In: Proceedings of the 6th International Symposium on Memory Management, ISMM ’07

  • Singh R, Bezemer CP, Shang W, Hassan AE (2016) Optimizing the performance-related configurations of object-relational mapping frameworks using a multi-objective genetic algorithm. In: Proceedings of the 7th ACM/SPEC on international conference on performance engineering (ICPE), pp 309–320

  • Sopitkamol M, Menascé DA (2005) A method for evaluating the impact of software configuration parameters on e-commerce sites. In: Proceedings of the 5th international workshop on software and performance (WOSP), pp 53–64

  • TechEmpower Web Framework Benchmarks (2017) https://www.techempower.com/benchmarks/. Last accessed 10/04/2017

  • The Python Profilers (2018) https://docs.python.org/2/library/profile.html. Last accessed 10/28/2018

  • Thonangi R, Thummala V, Babu S (2008) finding good configurations in High-Dimensional spaces: doing more with less. In: 2008 IEEE international symposium on modeling, analysis and simulation of computers and telecommunication systems

  • Tracing a Program As It Runs (2018) https://pymotw.com/2/sys/tracing.html. Last accessed 10/28/2018

  • vmprof - a statistical program profiler (2017) http://vmprof.com/. Last accessed 10/06/2017

  • Wang K, Lin X, Tang W (2012) Predator - An experience guided configuration optimizer for Hadoop MapReduce. In: 4Th IEEE international conference on cloud computing technology and science proceedings

  • Wagtail CMS: Django Content Management System (2017) https://wagtail.io/. Last accessed 10/06/2017

  • What is Load Balancing? (2017) https://www.nginx.com/resources/glossary/load-balancing/. Last accessed 10/06/2017

  • What is python used for at Google? (2017) https://www.quora.com/What-is-python-used-for-at-Google. Last accessed 10/06/2017

  • Wimmer C, Brunthaler S (2013) Zippy on truffle: a fast and simple implementation of python. In: Proceedings of the 2013 companion publication for conference on systems, programming, & applications: software for humanity (SPLASH)

  • Würthinger T, Wimmer C, Humer C, Wöß A, Stadler L, Seaton C, Duboscq G, Simon D, Grimmer M (2017) Practical partial evaluation for high-performance dynamic language runtimes. In: Proceedings of the 38th ACM SIGPLAN conference on programming language design and implementation, PLDI 2017. ACM, New York, pp 662–676. https://doi.org/10.1145/3062341.3062381

  • Würthinger T, Wimmer C, Wöß A, Stadler L, Duboscq G, Humer C, Richards G, Simon D, Wolczko M (2013) One vm to rule them all. In: Proceedings of the 2013 ACM international symposium on new ideas, new paradigms, and reflections on programming & software (Onward!)

  • Xi B, Liu Z, Raghavachari M, Xia CH, Zhang L (2004) A smart hill-climbing algorithm for application server configuration. In: Proceedings of the 13th international conference on world wide web (WWW). https://doi.org/10.1145/988672.988711. ACM, New York, pp 287–296

  • Xu T, Jin X, Huang P, Zhou Y, Lu S, Jin L, Pasupathy S (2016) Early detection of configuration errors to reduce failure damage. In: Proceedings of the 12th USENIX conference on operating systems design and implementation (OSDI)

  • Yilmaz C, Porter A, Krishna AS, Memon AM, Schmidt DC, Gokhale AS, Natarajan B (2007) Reliable effects screening: a distributed continuous quality assurance process for monitoring performance degradation in evolving software systems. IEEE Trans Softw Eng (TSE) 33(2):124–141. https://doi.org/10.1109/TSE.2007.20

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangguang Li.

Additional information

Communicated by: Sven Apel

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Jiang, Z.M.(. Assessing and optimizing the performance impact of the just-in-time configuration parameters - a case study on PyPy. Empir Software Eng 24, 2323–2363 (2019). https://doi.org/10.1007/s10664-019-09691-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09691-z

Keywords

Navigation