Analyzing randomness effects on the reliability of exploratory landscape analysis


The inherent difficulty of solving a continuous, static, bound-constrained and single-objective black-box optimization problem depends on the characteristics of the problem’s fitness landscape and the algorithm being used. Exploratory landscape analysis (ELA) uses numerical features generated via a sampling process of the search space to describe such characteristics. Despite their success in a number of applications, these features have limitations related with the computational costs associated with generating accurate results. Consequently, only approximations are available in practice which may be unreliable, leading to systemic errors. The overarching aim of this paper is to evaluate the reliability of five well-known ELA feature sets across multiple dimensions and sample sizes. For this purpose, we propose a comprehensive experimental methodology combining exploratory and statistical validation stages, which uses resampling techniques to minimize the sampling cost, and statistical significance tests to identify strengths and weaknesses of individual features. The data resulting from the methodology is collected and made available in the LEarning and OPtimization Archive of Research Data v1.0. The results show that instances of the same function can have feature values that are significantly different; hence, non-generalizable across instances, due to the effects produced by the boundary constraints. In addition, some landscape features under evaluation are highly volatile, and strongly susceptible to changes in sample size. Finally, the results show evidence of a curse of modality, meaning that the sample size should increase with the number of local optima.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15


  1. Alissa M, Sim K, Hart E (2019) Algorithm selection using deep learning without feature extraction. In: GECCO’19. ACM Press.

  2. Beck J, Freuder E (2004) Simple rules for low-knowledge algorithm selection. In: CPAIOR ’04, LNCS, vol 3011. Springer, pp 50–64.

  3. Belkhir N, Dréo J, Savéant P, Schoenauer M (2016a) Feature based algorithm configuration: A case study with differential evolution. In: Parallel problem solving from nature—PPSN XIV. Springer, pp 156–166.

  4. Belkhir N, Dréo J, Savéant P, Schoenauer M (2016b) Surrogate assisted feature computation for continuous problems. In: Sellmann M, Vanschoren J, Festa P (eds) Learning and intelligent optimization. Springer, Berlin, pp 17–31

    Google Scholar 

  5. Belkhir N, Dréo J, Savéant P, Schoenauer M (2017) Per instance algorithm configuration of CMA-ES with limited budget. In: Proceedings of the genetic and evolutionary computation conference. ACM.

  6. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4):1165–1188

    MathSciNet  Article  Google Scholar 

  7. Bischl B, Mersmann O, Trautmann H, PreußM (2012a) Algorithm selection based on exploratory landscape analysis and cost-sensitive learning. In: GECCO ’12. ACM, pp 313–320.

  8. Bischl B, Mersmann O, Trautmann H, Weihs C (2012b) Resampling methods for meta-model validation with recommendations for evolutionary computation. Evol Comput 20(2):249–275

    Article  Google Scholar 

  9. Crombecq K, Laermans E, Dhaene T (2011) Efficient space-filling and non-collapsing sequential design strategies for simulation-based modeling. Eur J Oper Res 214(3):683–696.

    Article  Google Scholar 

  10. Davidor Y (1991) Epistasis variance: a viewpoint on GA-hardness. In: Rawlins G (ed) FOGA I. Morgan Kauffmann, Burlington, pp 23–35

    Google Scholar 

  11. Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, London

    Google Scholar 

  12. Fonlupt C, Robilliard D, Preux P (1998) A bit-wise epistasis measure for binary search spaces. PPSN V LNCS 1498:47–56.

    Article  Google Scholar 

  13. Graff M, Poli R (2010) Practical performance models of algorithms in evolutionary program induction and other domains. Artif Intell 174:1254–1276.

    MathSciNet  Article  MATH  Google Scholar 

  14. Groppe D, Urbach T, Kutas M (2011) Mass univariate analysis of event-related brain potentials/fields I: a critical tutorial review. Psychophysiology 48(12):1711–1725.

    Article  Google Scholar 

  15. Hansen N, Auger A, Ros R, Finck S, Pošík P (2011a) Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009. In: GECCO ’11, pp 1689–1696.

  16. Hansen N, Ros R, Mauny N, Schoenauer M, Auger A (2011b) Impacts of invariance in search: when CMA-ES and PSO face ill-conditioned and non-separable problems. Appl Soft Comput 11(8):5755–5769.

    Article  Google Scholar 

  17. Hansen N, Auger A, Finck S, Ros R (2014) Real-parameter black-box optimization benchmarking BBOB-2010: experimental setup. Tech. Rep. RR-7215, INRIA.

  18. He J, Reeves C, Witt C, Yao X (2007) A note on problem difficulty measures in black-box optimization: classification, realizations and predictability. Evol Comput 15(4):435–443.

    Article  Google Scholar 

  19. Hinkle D, Wiersma W, Jurs S (2003) Applied statistics for the behavioral sciences. Houghton Mifflin, Boston

    Google Scholar 

  20. Jones T, Forrest S (1995) Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Proceedings of the sixth international conference on genetic algorithms. Morgan Kaufmann Publishers Inc., pp 184–192

  21. Kang Y, Hyndman R, Smith-Miles K (2017) Visualising forecasting algorithm performance using time series instance spaces. Int J Forecast 33(2):345–358.

    Article  Google Scholar 

  22. Kerschke P, Trautmann H (2019a) Automated algorithm selection on continuous black-box problems by combining exploratory landscape analysis and machine learning. Evol Comput 27(1):99–127.

    Article  Google Scholar 

  23. Kerschke P, Trautmann H (2019b) Comprehensive feature-based landscape analysis of continuous and constrained optimization problems using the R-package flacco. In: Bauer N, Ickstadt K, Lübke K, Szepannek G, Trautmann H, Vichi M (eds) Applications in statistical computing—from music data analysis to industrial quality improvement, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 93–123.

    Google Scholar 

  24. Kerschke P, PreußM, Wessing S, Trautmann H (2016) Low-budget exploratory landscape analysis on multiple peaks models. In: GECCO ’16. ACM, New York, pp 229–236.

  25. Lunacek M, Whitley D (2006) The dispersion metric and the CMA evolution strategy. In: GECCO ’06. ACM, New York, pp 477–484.

  26. Malan K, Engelbrecht A (2014) Characterising the searchability of continuous optimisation problems for PSO. Swarm Intell 8(4):1–28.

    Article  Google Scholar 

  27. Marin J (2012) How landscape ruggedness influences the performance of real-coded algorithms: a comparative study. Soft Comput 16(4):683–698.

    Article  Google Scholar 

  28. Mersmann O, PreußM, Trautmann H (2010) Benchmarking evolutionary algorithms: towards exploratory landscape analysis. In: PPSN XI. LNCS, vol 6238. Springer, pp 73–82.

  29. Mersmann O, Bischl B, Trautmann H, PreußM, Weihs C, Rudolph G (2011) Exploratory landscape analysis. In: GECCO ’11. ACM, pp 829–836.

  30. Miranda P, Prudéncio R, Pappa G (2017) H3ad: a hybrid hyper-heuristic for algorithm design. Inf Sci 414:340–354.

    Article  Google Scholar 

  31. Morgan R, Gallagher M (2014) Sampling techniques and distance metrics in high dimensional continuous landscape analysis: limitations and improvements. IEEE Trans Evol Comput 18(3):456–461.

    Article  Google Scholar 

  32. Muñoz M (2020) LEOPARD: LEarning and OPtimization Archive of Research Data, version 1.0.

  33. Muñoz M, Smith-Miles K (2015) Effects of function translation and dimensionality reduction on landscape analysis. In: IEEE CEC ’15, pp 1336–1342.

  34. Muñoz M, Smith-Miles K (2017) Performance analysis of continuous black-box optimization algorithms via footprints in instance space. Evol Comput 25(4):529–554.

    Article  Google Scholar 

  35. Muñoz M, Smith-Miles K (2020) Generating new space-filling test instances for continuous black-box optimization. Evol Comput 28(3):379–404.

    Article  Google Scholar 

  36. Muñoz M, Kirley M, Halgamuge S (2012) Landscape characterization of numerical optimization problems using biased scattered data. In: IEEE CEC ’12, pp 1–8.

  37. Muñoz M, Kirley M, Halgamuge S (2015a) Exploratory landscape analysis of continuous space optimization problems using information content. IEEE Trans Evol Comput 19(1):74–87.

    Article  Google Scholar 

  38. Muñoz M, Sun Y, Kirley M, Halgamuge S (2015b) Algorithm selection for black-box continuous optimization problems: a survey on methods and challenges. Inf Sci 317:224–245.

    Article  Google Scholar 

  39. Müller C, Sbalzarini I (2011) Global characterization of the CEC 2005 fitness landscapes using fitness-distance analysis. In: Applications of evolutionary computation. LNCS, vol 6624. Springer, pp 294–303.

  40. Naudts B, Suys D, Verschoren A (1997) Epistasis as a basic concept in formal landscape analysis. In: Bäck T (ed) Proceedings of the 7th international conference on genetic algorithms. Morgan Kaufmann, pp 65–72

  41. Pošík P (2005) On the utility of linear transformations for population-based optimization algorithms. IFAC Proc Vol 38(1):281–286. IFAC World Congress)

    Article  Google Scholar 

  42. Renau Q, Dreo J, Doerr C, Doerr B (2019) Expressiveness and robustness of landscape features. In: GECCO’19. ACM Press.

  43. Renau Q, Doerr C, Dreo J, Doerr B (2020) Exploratory landscape analysis is strongly sensitive to the sampling strategy. In: Bäck T, Preuss M, Deutz A, Wang H, Doerr C, Emmerich M, Trautmann H (eds) Parallel problem solving from nature—PPSN XVI. Springer, Cham, pp 139–153

  44. Rochet S, Slimane M, Venturini G (1996) Epistasis for real encoding in genetic algorithms. In: Australian and New Zealand conference on intelligent information systems, pp 268–271.

  45. Rochet S, Venturini G, Slimane M, El Kharoubi E (1998) A critical and empirical study of epistasis measures for predicting GA performances: a summary. In: Third European conference on artificial evolution, pp 275–285.

  46. Rosé H, Ebeling W, Asselmeyer T (1996) The density of states—a measure of the difficulty of optimisation problems. In: PPSN IV, LNCS, vol 1141. Springer, pp 208–217.

  47. Sala R, Müller R (2020) Benchmarking for metaheuristic black-box optimization: perspectives and open challenges. In: 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, pp 1–8

  48. Saleem S, Gallagher M, Wood I (2019) Direct feature evaluation in black-box optimization using problem transformations. Evol Comput 27(1):75–98.

    Article  Google Scholar 

  49. Seo D, Moon B (2007) An information-theoretic analysis on the interactions of variables in combinatorial optimization problems. Evol Comput 15(2):169–198.

    Article  Google Scholar 

  50. Škvorc U, Eftimov T, Korošec P (2020) Understanding the problem space in single-objective numerical optimization using exploratory landscape analysis. Appl Soft Comput 90:106138.

    Article  Google Scholar 

  51. Smith-Miles K, Baatar D, Wreford B, Lewis R (2014) Towards objective measures of algorithm performance across instance space. Comput Oper Res 45:12–24.

    MathSciNet  Article  MATH  Google Scholar 

  52. Stein M (1987) Large sample properties of simulations using latin hypercube sampling. Technometrics 29(2):143–151.

    MathSciNet  Article  MATH  Google Scholar 

  53. Storlie CB, Swiler LP, Helton JC, Sallaberry CJ (2009) Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models. Reliab Eng Syst Saf 94(11):1735–1763.

    Article  Google Scholar 

  54. Stowell D, Plumbley M (2009) Fast multidimensional entropy estimation by k-d partitioning. IEEE Signal Process Lett 16(6):537–540.

    Article  Google Scholar 

  55. Tian W, Song J, Li Z, de Wilde P (2014) Bootstrap techniques for sensitivity analysis and model selection in building thermal performance analysis. Appl Energy 135:320–328.

    Article  Google Scholar 

Download references


We express our gratitude to the two reviewers and the guest editors for their thorough and valuable suggestions, which significantly improved this paper. We also acknowledge Saman K. Halgamuge for his feedback on earlier versions on this work.


Funding was provided by the Australian Research Council through the Australian Laureate Fellowship FL140100012, and The University of Melbourne through MIRS/MIFRS scholarships.

Author information



Corresponding author

Correspondence to Mario Andrés Muñoz.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Validation of the assumptions behind the experimental methodology

Appendix: Validation of the assumptions behind the experimental methodology

Our experimental methodology makes assumptions that can be summarized in two questions: (a) Since Latin Hyper-cube Samplin (LHS) is a type of stratified sampling, is the independence assumption still valid? (b) What are the differences between multiple uniformly distributed random samples, bootstrapping a single uniformly distributed random sample, and bootstrapping a single LHS, when calculating the variance of an estimate? To answer these questions, we have carried out two simple experiments that demonstrate that there is no practical difference on the results between taking multiple uniformly distributed random samples and bootstrapping a LHS. On the first experiment, we address the independence assumption, by calculating the magnitude of the auto-correlation with lags in the \(\left[ 1,\ 50\right]\) range, for data drawn from the \(\left[ 0,\ 1\right]\) interval. For this assumption to hold for LHS, the magnitudes of the auto-correlation should follow the same trend that for a uniformly distributed random sample and be close to zero, indicating that it is not possible to estimate the value of one point from another. We repeat this experiment 1000 times and average the results, which are presented in Fig. 16a for samples with \(\left\{ 200,600,1000\right\}\) points. Other than the descending trend for a sample of 200 points, which can be explained by the decrease in points in the sample for which the auto-correlation can be calculated, the results demonstrate that the independence assumption holds for a LHS in practice.

Fig. 16

Validation of the assumptions behind our experimental methodology. a Average magnitude of the auto-correlation with lags in the \(\left[ 1,\ 50\right]\) range, for data drawn from the \(\left[ 0,\ 1\right]\) interval. b Distribution of the variance of the mean from N samples of n points, \(IID\left( n,N\right)\), bootstrapping N times a sample of n points, \(IID+B\left( n,N\right)\), and bootstrapping N times a LHS of n points, \(LHS+B\left( n,N\right)\). The results confirm that there is no practical difference between taking N uniformly distributed random samples and bootstrapping N times a single LHS

On the second experiment, we address the second question by estimating the variance of the mean from these three different sampling regimes, using data drawn from the \(\left[ 0,\ 1\right]\) interval. On the first one, called \(IID\left( n,N\right)\), we took N uniformly distributed random samples of n points. On the second one, called \(IID+B\left( n,N\right)\), we took one uniformly distributed sample of n points and bootstrapped it N times. On the third one, called \(LHS+B\left( n,N\right)\), we took one LHS of n points and bootstrapped it N times. Each sampling regime produced N mean estimates, from which the variance is calculated. The experiments are repeated 1000 times for all the combinations of \(\left\{ n,N\right\} =\left\{ 200,600,1000\right\}\). The results are shown in Fig. 16b as box-plots, which demonstrate that there is no practical difference between taking N uniformly distributed random samples and bootstrapping N times a single LHS.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Muñoz, M.A., Kirley, M. & Smith-Miles, K. Analyzing randomness effects on the reliability of exploratory landscape analysis. Nat Comput (2021).

Download citation


  • Black-box optimization
  • Bound-constrained optimization
  • Continuous optimization
  • Exploratory landscape analysis
  • Single-objective optimization