Automated Software Engineering

, Volume 25, Issue 2, pp 247–277 | Cite as

Faster discovery of faster system configurations with spectral learning

  • Vivek Nair
  • Tim Menzies
  • Norbert Siegmund
  • Sven Apel
Article

Abstract

Despite the huge spread and economical importance of configurable software systems, there is unsatisfactory support in utilizing the full potential of these systems with respect to finding performance-optimal configurations. Prior work on predicting the performance of software configurations suffered from either (a) requiring far too many sample configurations or (b) large variances in their predictions. Both these problems can be avoided using the WHAT spectral learner. WHAT’s innovation is the use of the spectrum (eigenvalues) of the distance matrix between the configurations of a configurable software system, to perform dimensionality reduction. Within that reduced configuration space, many closely associated configurations can be studied by executing only a few sample configurations. For the subject systems studied here, a few dozen samples yield accurate and stable predictors—less than 10% prediction error, with a standard deviation of less than 2%. When compared to the state of the art, WHAT (a) requires 2–10 times fewer samples to achieve similar prediction accuracies, and (b) its predictions are more stable (i.e., have lower standard deviation). Furthermore, we demonstrate that predictive models generated by WHAT can be used by optimizers to discover system configurations that closely approach the optimal performance.

Keywords

Performance prediction Spectral learning Decision trees Search-based software engineering Sampling 

Notes

Acknowledgements

The work is partially funded by NSF awards #1506586. Sven Apel’s work has been supported by the German Research Foundation (AP 206/4 and AP 206/6). Norbert Siegmund’s work has been supported by the German Research Foundation (SI 2171/2).

References

  1. Bettenburg, N., Nagappan, M., Hassan, A.E.: Think locally, act globally: improving defect and effort prediction models. In: Proceedings of IEEE Working Conference on Mining Software Repositories, pp. 60–69. IEEE (2012)Google Scholar
  2. Bettenburg, N., Nagappan, M., Hassan, A.E.: Towards improving statistical modeling of software engineering data: think locally, act globally!. Empir. Softw. Eng. 20(2), 294–335 (2015)CrossRefGoogle Scholar
  3. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)Google Scholar
  4. Boley, Daniel: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2(4), 325–344 (1998)CrossRefGoogle Scholar
  5. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC, Boca Raton (1984)Google Scholar
  6. Burges, C., Christopher, J.: Dimension reduction: a guided tour. Foundations and trends. Mach. Learn. 2(4), 275–365 (2010)CrossRefMATHGoogle Scholar
  7. Chen, J., Nair, V., Krishna, R., Menzies, T.: Is “sampling” better than “evolution” for search-based software engineering? arXiv:1608.07617 (2016)
  8. Dasgupta, S.: Experiments with random projection. In: Proceedings of conference on Uncertainty in Artificial Intelligence, pp. 143–151. Morgan Kaufmann Publishers Inc. (2000)Google Scholar
  9. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRefGoogle Scholar
  10. Deiters, C., Rausch, A., Schindler, M.: Using spectral clustering to automate identification and optimization of component structures. In: Proceedings of International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, pp. 14–20. IEEE (2013)Google Scholar
  11. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRefGoogle Scholar
  12. Du, Qian, James, E.Fowler: Low-complexity principal component analysis for hyperspectral image compression. Int. J. High Perform. Comput. Appl. 22(4), 438–448 (2008)CrossRefGoogle Scholar
  13. Efron, Bradley, Tibshirani, Robert J.: An Introduction to the Bootstrap. CRC, Boca Raton (1994)MATHGoogle Scholar
  14. Faloutsos, C., Lin, K.I.: Fastmap.: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of International Conference on Management of Data, pp. 163–174. ACM (1995)Google Scholar
  15. Fletcher, R.: Practical Methods of Optimization. Wiley, New York (2013)MATHGoogle Scholar
  16. Ghotra, B., McIntosh, S., Hassan, A.E.: Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of International Conference on Software Engineering, pp. 789–800. IEEE (2015)Google Scholar
  17. Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D Nonlinear Phenom. 9(1–2), 189–208 (1983)MathSciNetCrossRefMATHGoogle Scholar
  18. Guo, J., Czarnecki, K., Apel, S., Siegmund, N., Wasowski, A.: Variability-aware performance prediction: a statistical learning approach. In: Proceedings of International Conference on Automated Software Engineering, pp. 301–311. IEEE (2013)Google Scholar
  19. Hamerly, G.: Making k-means even faster. In: Proceedings of International Conference on Data Mining, pp. 130–140. SIAM (2010)Google Scholar
  20. Harman, M., Mansouri, S.A., Zhang, Y.: Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45(1), 11 (2012)CrossRefGoogle Scholar
  21. Hinton, G.E.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefMATHGoogle Scholar
  22. Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: Proceedings of International Conference on Data Mining Workshops (ICDMW), pp. 587–594. IEEE (2012)Google Scholar
  23. Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11(Jul), 1957–2000 (2010)MathSciNetMATHGoogle Scholar
  24. Jolliffe, Ian: Principal Component Analysis. Wiley, New York (2002)MATHGoogle Scholar
  25. Kamvar, K., Sepandar, S., Klein, K., Dan, D., Manning, M., Christopher, C.: Spectral learning. In: Proceedings of International Joint Conference of Artificial Intelligence. Stanford InfoLab (2003)Google Scholar
  26. Krall, J., Menzies, T., Davies, M.: Gale: geometric active learning for search-based software engineering. Trans. Softw. Eng. 41(10), 1001–1018 (2015)Google Scholar
  27. Kuhn, D.R., Kacker, R.N., Lei, Y.: Introduction to Combinatorial Testing. CRC, Boca Raton (2013)MATHGoogle Scholar
  28. Loshchilov, I.G.: Surrogate-assisted evolutionary algorithms. PhD thesis, Citeseer (2013)Google Scholar
  29. Menzies, T., Butcher, A., Cok, D., Marcus, A., Layman, L., Shull, F., Turhan, B., Zimmermann, T.: Local versus global lessons for defect prediction and effort estimation. Trans. Softw. Eng. 39(6), 822–834 (2013)CrossRefGoogle Scholar
  30. Mittas, N., Angelis, L.: Ranking and clustering software cost estimation models through a multiple comparisons algorithm. Trans. Softw. Eng. 39(4), 537–551 (2013)CrossRefGoogle Scholar
  31. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)Google Scholar
  32. Platt, J.: Fastmap, metricmap, and landmark mds are all nystrom algorithms. In: Proceedings of International Conference on Artificial Intelligence and Statistics (2005)Google Scholar
  33. Pukelsheim, F.: Optimal Design of Experiments. SIAM, Philadelphia (2006)CrossRefMATHGoogle Scholar
  34. Sarkar, A., Guo, J., Siegmund, N., Apel, S., Czarnecki, K.: Cost-efficient sampling for performance prediction of configurable systems. In: Proceedings of International Conference on Automated Software Engineering, pp. 342–352. IEEE (2015)Google Scholar
  35. Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: Proceedings of International Conference on Automated Software Engineering, pp. 465–474. IEEE (2013)Google Scholar
  36. Shi, Jianbo, Malik, Jitendra: Trans. Pattern Anal. Mach. Intell. Normalized cuts and image segmentation 22(8), 888–905 (2000)Google Scholar
  37. Siegmund, N., Kolesnikov, S.S., Kästner, C., Apel, S., Batory, D., Rosenmüller, M., Gunter, S.: Predicting performance via automated feature-interaction detection. In: Proceedings of International Conference on Software Engineering, pp. 167–177. IEEE (2012)Google Scholar
  38. Siegmund, J., Siegmund, N., Apel, S.: Views on internal and external validity in empirical software engineering. In: Proceedings of International Conference on Software Engineering, vol. 1, pp. 9–19. IEEE (2015)Google Scholar
  39. Siegmund, N., Grebhahn, A., Apel, S., Kästner, C.: Performance-influence models for highly configurable systems. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 284–294. ACM (2015)Google Scholar
  40. Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997)MathSciNetCrossRefMATHGoogle Scholar
  41. Theisen, C., Herzig, K., Morrison, P., Murphy, B., Williams, L.: Approximating attack surfaces with stack traces. In: Proceedings of International Conference on Software Engineering, pp. 199–208. IEEE (2015)Google Scholar
  42. Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000)Google Scholar
  43. Weiss, G.M., Tian, Y.: Maximizing classifier utility when there are data acquisition and modeling costs. Data Min. Knowl. Discov. 17(2), 253–282 (2008)MathSciNetCrossRefGoogle Scholar
  44. Xu, T., Jin, L., Fan, X., Zhou, Y., Pasupathy, S., Talwadker, R.: Hey, you have given me too many knobs!: Understanding and dealing with over-designed configuration in system software. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 307–319. ACM (2015)Google Scholar
  45. Zhang, F., Zheng, Q., Zou, Y., Hassan, A.E.: Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of International Conference on Software Engineering, pp. 309–320. ACM (2016)Google Scholar
  46. Zhang, Y., Guo, J., Blais, E., Czarnecki, K.: Performance prediction of configurable software systems by fourier learning. In: Proceedings of International Conference on Automated Software Engineering, pp. 365–373. IEEE (2015)Google Scholar
  47. Zuluaga, M., Sergent, G., Krause, A., Püschel, M.: Active learning for multi-objective optimization. In: Proceedings of International Conference in Machine Learning, vol. 28, pp. 462–470. ICML (2013)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.North Carolina State UniversityRaleighUSA
  2. 2.Bauhaus University of WeimarWeimarGermany
  3. 3.University of PassauPassauGermany

Personalised recommendations