Using a Genetic Algorithm to Optimize Configurations in a Data-Driven Application

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12420)


Users of highly-configurable software systems often want to optimize a particular objective such as improving a functional outcome or increasing system performance. One approach is to use an evolutionary algorithm. However, many applications today are data-driven, meaning they depend on inputs or data which can be complex and varied. Hence, a search needs to be run (and re-run) for all inputs, making optimization a heavy-weight and potentially impractical process. In this paper, we explore this issue on a data-driven highly-configurable scientific application. We build an exhaustive database containing 3,000 configurations and 10,000 inputs, leading to almost 100 million records as our oracle, and then run a genetic algorithm individually on each of the 10,000 inputs. We ask if (1) a genetic algorithm can find configurations to improve functional objectives; (2) whether patterns of best configurations over all input data emerge; and (3) if we can we use sampling to approximate the results. We find that the original (default) configuration is best only 34% of the time, while clear patterns emerge of other best configurations. Out of 3,000 possible configurations, only 112 distinct configurations achieve the optimal result at least once across all 10,000 inputs, suggesting the potential for lighter weight optimization approaches. We show that sampling of the input data finds similar patterns at a lower cost.


Genetic algorithm Data-driven SSBSE 



This work is supported in part by NSF Grant CCF-1901543 and by The Center for Bioenergy Innovation (CBI) which is supported by the Office of Biological and Environmental Research in the DOE Office of Science.


  1. 1.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215 (2018).
  2. 2.
    Cashman, M., Cohen, M.B., Ranjan, P., Cottingham, R.W.: Navigating the maze: the impact of configurability in bioinformatics software. In: International Conference on Automated Software Engineering, pp. 757–767. ASE, September 2018Google Scholar
  3. 3.
    Garvin, B.J., Cohen, M.B., Dwyer, M.B.: Evaluating improvements to a meta-heuristic search for constrained interaction testing. Empir. Softw. Eng. (EMSE) 16, 61–102 (2010)CrossRefGoogle Scholar
  4. 4.
    Garvin, B.J., Cohen, M.B., Dwyer, M.B.: Failure avoidance in configurable systems through feature locality. In: Cámara, J., de Lemos, R., Ghezzi, C., Lopes, A. (eds.) Assurances for Self-Adaptive Systems. LNCS, vol. 7740, pp. 266–296. Springer, Heidelberg (2013). Scholar
  5. 5.
    Henard, C., Papadakis, M., Harman, M., Le Traon, Y.: Combining multi-objective search and constraint solving for configuring large software product lines. In: IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, pp. 517–528 (2015)Google Scholar
  6. 6.
    Henard, C., Papadakis, M., Perrouin, G., Klein, J., Heymans, P., Le Traon, Y.: Bypassing the combinatorial explosion: using similarity to generate and prioritize T-wise test configurations for software product lines. IEEE Trans. Softw. Eng. 40(7), 650–670 (2014)CrossRefGoogle Scholar
  7. 7.
    Jamshidi, P., Siegmund, N., Velez, M., Kästner, C., Patel, A., Agarwal, Y.: Transfer learning for performance modeling of configurable systems: an exploratory analysis. In: International Conference on Automated Software Engineering (ASE), pp. 497–508, November 2017Google Scholar
  8. 8.
    Jamshidi, P., Velez, M., Kästner, C., Siegmund, N.: Learning to sample: exploiting similarities across environments to learn performance models for configurable systems. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 71–82. ESEC/FSE (2018)Google Scholar
  9. 9.
    Jia, Y., Cohen, M.B., Harman, M., Petke, J.: Learning combinatorial interaction test generation strategies using hyperheuristic search. In: IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, pp. 540–550 (2015)Google Scholar
  10. 10.
    Krishna, R., Menzies, T., Fu, W.: Too much automation? The bellwether effect and its implications for transfer learning. In: 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 122–131 (2016)Google Scholar
  11. 11.
    Krishna, R., Menzies, T.: Bellwethers: a baseline method for transfer learning. IEEE Trans. Softw. Eng. 45(11), 1081–1105 (2019)CrossRefGoogle Scholar
  12. 12.
    Langdon, W.B.: Big data driven genetic improvement for maintenance of legacy software systems. SIGEVOlution Newsl. ACM Spec. Interes. Group Genet. Evol. Comput. 12(3), 6–9 (2019)Google Scholar
  13. 13.
    Langdon, W.B., Krauss, O.: Evolving sqrt into 1/x via software data maintenance. In: Coello, C.A.C. (ed.) GECCO 2020: Genetic and Evolutionary Computation Conference, Companion Volume, pp. 1928–1936. ACM, July 2020Google Scholar
  14. 14.
    Medeiros, F., Kästner, C., Ribeiro, M., Gheyi, R., Apel, S.: A comparison of 10 sampling algorithms for configurable systems. In: International Conference on Software Engineering (ICSE), pp. 643–654. ACM, May 2016Google Scholar
  15. 15.
    Meinicke, J., Wong, C.P., Kästner, C., Thüm, T., Saake, G.: On essential configuration complexity: measuring interactions in highly-configurable systems. In: International Conference on Automated Software Engineering (ASE), pp. 483–494. ACM, September 2016Google Scholar
  16. 16.
    Nair, V., et al.: Data-driven search-based software engineering. In: IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp. 341–352 (2018)Google Scholar
  17. 17.
    Nair, V., Menzies, T., Siegmund, N., Apel, S.: Using bad learners to find good configurations. In: Joint Meeting on Foundations of Software Engineering, pp. 257–267. ESEC/FSE (2017)Google Scholar
  18. 18.
    Oh, J., Batory, D., Myers, M., Siegmund, N.: Finding near-optimal configurations in product lines by random sampling. In: Joint Meeting on Foundations of Software Engineering, p. 61–71. ESEC/FSE (2017)Google Scholar
  19. 19.
    Qu, X., Cohen, M.B., Rothermel, G.: Configuration-aware regression testing: an empirical study of sampling and prioritization. In: International Symposium on Software Testing and Analysis, pp. 75–86. ISSTA, ACM (2008)Google Scholar
  20. 20.
    Siegmund, N., Grebhahn, A., Kästner, C., Apel, S.: Performance-influence models for highly configurable systems. In: European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 284–294. ACM Press, August 2015Google Scholar
  21. 21.
    Xiang, Y., Zhou, Y., Zheng, Z., Li, M.: Configuring software product lines by combining many-objective optimization and sat solvers. ACM Trans. Softw. Eng. Methodol. 26(4), 1–46 (2018)CrossRefGoogle Scholar
  22. 22.
    Yilmaz, C., Dumlu, E., Cohen, M.B., Porter, A.: Reducing masking effects in combinatorial interaction testing: a feedback driven adaptive approach. IEEE Trans. Softw. Eng. 40(1), 43–66 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Iowa State UniversityAmesUSA
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations