Abstract
Many software systems today are configurable, offering customization of functionality by feature selection. Understanding how performance varies in terms of feature selection is key for selecting appropriate configurations that meet a set of given requirements. Due to a huge configuration space and the possibly high cost of performance measurement, it is usually not feasible to explore the entire configuration space of a configurable system exhaustively. It is thus a major challenge to accurately predict performance based on a small sample of measured system variants. To address this challenge, we propose a data-efficient learning approach, called DECART, that combines several techniques of machine learning and statistics for performance prediction of configurable systems. DECART builds, validates, and determines a prediction model based on an available sample of measured system variants. Empirical results on 10 real-world configurable systems demonstrate the effectiveness and practicality of DECART. In particular, DECART achieves a prediction accuracy of 90% or higher based on a small sample, whose size is linear in the number of features. In addition, we propose a sample quality metric and introduce a quantitative analysis of the quality of a sample for performance prediction.
This is a preview of subscription content, access via your institution.


References
Abdelaziz AA, Kadir WMW, Osman A (2011) Comparative analysis of software performance prediction approaches in context of component-based system. Int J Comput Appl 23(3):15–22
Apel S, Kästner C (2009) An overview of feature-oriented software development. J Object Tech 8(5):49–84
Balsamo S, Marco AD, Inverardi P, Simeoni M (2004) Model-based performance prediction in software development: A survey. IEEE Trans Software Eng 30(5):295–310
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305
Berk R (2008) Statistical learning from a regression perspective. Springer, Berlin
Breiman L, Friedman J, Stone C, Olshen R (1984) Classication and regression trees. Wadsworth and Brooks
Bu X, Rao J, Xu C (2009) A reinforcement learning approach to online web systems auto-configuration. In: Proceedings of 29th IEEE international conference on distributed computing systems (ICDCS), pp 2–11
Chen S, Liu Y, Gorton I, Liu A (2005) Performance prediction of component-based applications. J Syst Softw 74(1):35–43
Courtois M, Woodside CM (2000) Using regression splines for software performance analysis. In: Proceedings of second international workshop on software and performance, pp 105–114
Czarnecki K, Eisenecker U (2000) Generative programming: methods, tools, and applications. Addison-Wesley, Boston
Deisenroth M, Mohamed S, Doshi-Velez F, Krause A, Welling M (2016) ICML Workshop on data-efficient machine learning. https://sites.google.com/site/dataefficientml/
Domhan T, Springenberg JT, Hutter F (2015) Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence (IJCAI), pp 3460–3468
Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, Chapman
Grechanik M, Fu C, Xie Q (2012) Automatically finding performance problems with feedback-directed learning software testing. In: Proceedings of international conference on software engineering. IEEE, pp 156–166
Guo J, Czarnecki K, Apel S, Siegmund N, Wąsowski A (2013) Variability-aware performance prediction: a statistical learning approach. In: Proceedings of international conference on automated software engineering. IEEE, pp 301–311
Hand DJ, Mannila H, Smyth P (2001) Principles of data mining. The MIT Press, Cambridge
Happe J, Koziolek H, Reussner R (2011) Facilitating performance predictions using software components. IEEE Soft 28(3):27–33
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning data mining. In: Inference, and prediction, 2nd edn. Springer
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University, Taipei City
Huang P, Ma X, Shen D, Zhou Y (2014) Performance regression testing target prioritization via performance risk analysis. In: Proceedings of international conference on software engineering. ACM, pp 60–71
Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: Proceedings of international conference on learning and intelligent optimization. Springer, pp 507–523
Hutter F, Xu L, Hoos HH, Leyton-Brown K (2014) Algorithm runtime prediction: methods & evaluation. Artif Intell 206:79–111
Jamshidi P, Casale G (2016) An uncertainty-aware approach to optimal configuration of stream processing systems. In: International symposium on modeling, analysis and simulation of computer and telecommunication systems, pp 39–48
Jovic M, Adamoli A, Hauswirth M (2011) Catch me if you can: performance bug detection in the wild. In: Proceedings of international conference on object oriented programming systems languages and applications. ACM, pp 155–170
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of international joint conference on artificial intelligence. Morgan Kaufmann, pp 1137–1145
Kwon Y, Lee S, Yi H, Kwon D, Yang S, Chun BG, Huang L, Maniatis P, Naik M, Paek Y (2013) Mantis: automatic performance prediction for smartphone applications. In: Proceedings of the 2013 USENIX conference on annual technical conference. USENIX Association, pp 297–308
Lee BC, Brooks DM, de Supinski BR, Schulz M, Singh K, McKee SA (2007) Methods of inference and learning for performance modeling of parallel applications. In: Proceedings of the 12th ACM SIGPLAN symposium on principles and practice of parallel programming (PPOPP), pp 249–258
Nadi S, Berger T, Kästner C., Czarnecki K (2015) Where do configuration constraints stem from? An extraction approach and an empirical study. IEEE Trans Softw Eng 41(8):820–841
Osogami T, Kato S (2007) Optimizing system configurations quickly by guessing at the performance. In: Proceedings of international conference on measurement and modeling of computer systems, pp 145–156
Provost FJ, Jensen D, Oates T (1999) Efficient progressive sampling. In: Proceedings of international conference on knowledge discovery and data Mining. ACM, pp 23–32
Ramirez A, Cheng B (2011) Automatic derivation of utility functions for monitoring software requirements. In: Proceedings of international conference on model driven engineering languages and systems. IEEE
Salkind NJ (2003) Exploring research. Prentice Hall PTR
Sarkar A, Guo J, Siegmund N, Apel S, Czarnecki K (2015) Cost-efficient sampling for performance prediction of configurable systems. In: Proceedings of international conference on automated software engineering. IEEE, pp 342–352
She S, Ryssel U, Andersen N, Wasowski A, Czarnecki K (2014) Efficient synthesis of feature models. Inf Soft Tech 56(9):1122–1143
Siegmund N, Grebhahn A, Apel S, Kästner C (2015) Performance-influence models for highly configurable systems. In: Proceedings of international symposium on the foundations of software engineering, pp 284–294
Siegmund N, Kolesnikov S, Kästner C, Apel S, Batory D, Rosenmüller M, Saake G (2012a) Predicting performance via automated feature-interaction detection. In: Proceedings of international conference on software engineering. IEEE
Siegmund N, Rosenmüller M, Kuhlemann M, Kästner C, Apel S, Saake G (2012b) SPL conqueror: toward optimization of non-functional properties in software product lines. Softw Qual J 20(3-4):487–517
Siegmund N, Sobernig S, Apel S (2017) Attributed variability models: outside the comfort zone. In: Proceedings of international symposium on the foundations of software engineering, pp 268–278
Sincero J, Schröder-Preikschat W, Spinczyk O (2010) Approaching non-functional properties of software product lines: learning from products. In: Proceedings of Asia-Pacific software engineering conference. IEEE
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Proceedings of 26th annual conference on neural information processing systems (NIPS), pp 2960–2968
Tawhid R, Petriu D (2011) Automatic derivation of a product performance model from a software product line model. In: Proceedings of international software product line conference. IEEE, pp 80–89
Thereska E, Doebel B, Zheng A, Nobel P (2010) Practical performance models for complex, popular applications. In: Proceedings SIGMETRICS. ACM, pp 1–12
Valov P, Guo J, Czarnecki K (2015) Empirical comparison of regression methods for variability-aware performance prediction. In: Proceedings of international software product line conference. ACM, pp 186–190
Valov P, Petkovich J, Guo J, Fischmeister S, Czarnecki K (2017) Transferring performance prediction models across different hardware platforms. In: Proceedings of the 8th ACM/SPEC on international conference on performance engineering (ICPE), pp 39–50
Westermann D, Happe J, Krebs R, Farahbod R (2012) Automated inference of goal-oriented performance prediction functions. In: Proceedings of international conference on automated software engineering. ACM
Williams G (2011) Data Mining with Rattle and R: the art of excavating data for knowledge discovery. Springer, Berlin
Xi B, Liu Z, Raghavachari M, Xia CH, Zhang L (2004) A smart hill-climbing algorithm for application server configuration. In: Proceedings of international conference on World Wide Web, pp 287–296
Zhang Y, Guo J, Blais E, Czarnecki K (2015) Performance prediction of configurable software systems by fourier learning. In: Proceedings of international conference on automated software engineering. IEEE, pp 365–373
Zhang Y, Guo J, Blais E, Czarnecki K, Yu H (2016) A mathematical model of performance-relevant feature interactions. In: Proceedings of the 20th international systems and software product line conference (SPLC), pp 25–34
Acknowledgements
We would like to thank anonymous reviewers for their insightful comments. This research was partially supported by National Natural Science Foundation of China (No. 61772200, 61702320), Shanghai Pujiang Talent Program (No. 17PJ1401900), Shanghai Municipal Natural Science Foundation (No. 17ZR1406900), Shanghai Municipal Education Commission Funds of Young Teacher Training Program (No. ZZSDJ17021), Specialized Fund of Shanghai Municipal Commission of Economy and Informatization (No. 201602008), Specialized Research Fund for Doctoral Program of Higher Education (No. 20130074110015), the DFG grants (AP 206/4, AP 206/5, AP 206/7, SI 2171/2, and SI 2171/3), Natural Sciences and Engineering Research Council of Canada, and Pratt & Whitney Canada.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by: Vittorio Cortellessa
Jianmei Guo and Dingyu Yang contributed equally to this work.
Rights and permissions
About this article
Cite this article
Guo, J., Yang, D., Siegmund, N. et al. Data-efficient performance learning for configurable systems. Empir Software Eng 23, 1826–1867 (2018). https://doi.org/10.1007/s10664-017-9573-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9573-6
Keywords
- Performance prediction
- Configurable systems
- Regression
- Model selection
- Parameter tuning