Understanding measure-driven algorithms solving irreversibly ill-conditioned problems

The paper helps to understand the essence of stochastic population-based searches that solve ill-conditioned global optimization problems. This condition manifests itself by presence of lowlands, i.e., connected subsets of minimizers of positive measure, and inability to regularize the problem. We show a convenient way to analyze such search strategies as dynamic systems that transform the sampling measure. We can draw informative conclusions for a class of strategies with a focusing heuristic. For this class we can evaluate the amount of information about the problem that can be gathered and suggest ways to verify stopping conditions. Next, we show the Hierarchic Memetic Strategy coupled with Multi-Winner Evolutionary Algorithm (HMS/MWEA) that follow the ideas from the first part of the paper. We introduce a complex, ergodic Markov chain of their dynamics and prove an asymptotic guarantee of success. Finally, we present numerical solutions to ill-conditioned problems: two benchmarks and a real-life engineering one, which show the strategy in action. The paper recalls and synthesizes some results already published by authors, drawing new qualitative conclusions. The totally new parts are Markov chain models of the HMS structure of demes and of the MWEA component, as well as the theorem of their ergodicity.


Ill-conditioned global optimization problems
Many problems in machine learning, optimal control, medical diagnostics, optimal design, geophysics, etc. are formulated as global optimization ones.They are frequently irreversibly ill-conditioned and possess many solutions that can form uncountable, continuous subsets in the admissible domain.
A substantial part of such computational tasks are called ''inverse problems'' (IPs) in which parameters of a predefined mathematical model have to be identified.A general framework for handling inverse problems for which mathematical model is given by the system of Partial Differential Equations (PDEs) can be found in [22,59].Illconditioning of IPs, mentioned above, is caused mainly due to unavailability of complete and accurate measurements, e.g.insufficient set of data used for Artificial Neural Network (ANN) learning or pointwise measurement of the electric field called ''logging curve'' for investigation of oil and gas resources (see [28,31] and [9]).Ambiguity and lack of correctness of their mathematical model ( e.g.due to some symmetries, see [7]) can also contribute.
We may also refer to the representative examples of engineering ill-conditioned IPs: regression solved by Deep Neural Networks (DNNs) [18], ambiguity in lens design [23], calibration of conceptual rainfall-runoff models [12], investigation of oil and gas resources [56], and diagnosis of tumor tissue [37].
The work presented in this paper has been partially supported by the AGH statutory research Grant No. 11.11.230.124.

Deterministic and stochastic strategies of solving ill-conditioned problems
Traditional methods of local, convex optimization are hardly relevant for solving ill-conditioned problems mentioned above.If the problem is formulated as finding all global or ''almost global'' minimizers of the objective function (e.g.misfit between the measured and simulated data for IPs or loss function in case of ANN learning) then any single steepest-descent method can converge at least to a single minimizer.Even if multistart of steepest-descent processes is applied, a finite number of minimizers can be approximated at most.Moreover, a remarkable search redundancy may appear when many processes focus on the same minimizer or on different points of the same connected component of a continuous set of minimizers.
A more advantageous way is to apply a stochastic population-based strategy, which generally ''breeds'' a ''flock'' of candidate solutions in such a way that they tend to concentrate in regions containing solutions (both isolated and continuous clusters).
There are many stochastic AI techniques inspired by nature such as evolutionary computation, ant colony optimization, simulated annealing as well as their recent specialized instances (CMA-ES, IM, HGS, HMS, etc.) that follow this idea.Generally, they perform consecutive resampling of the more or less structured ''flock'' of candidate solutions, where the sampling measure is successively updated according to the information gathered during the search process.
The broad review of stochastic, population-based methods of solving global optimization problems in continuous domains involving multiple local minimizers is reported in Pardalos and Romeijn handbook [36], while the population-based methods dedicated to the ill-conditioned multimodal global optimization problems was discusses in the Preuss book [39].The information about some specialized strategies (HGS, HMS, CGS and EMAS) can be found in our former papers [4,6,13,17,47,50,53,[55][56][57]64].

State-of-the-art approaches of a solving strategies analysis
Most popular technique of analyzing stochastic global searches consists in evaluating the First Hitting Time (FHT) after which at least one point from the sample hits the set of solutions.Chapter devoted to such methods written by Wood and Zabinsky can be found in the monograph [36].Yao and He delivered a survey of FHT evaluation based on the Markov processes theory and applied to Evolutionary Algorithms (EAs) analysis [20].
The similar results for Island Model were obtained by Rudolph [45].
An other approach of studying population-based algorithm dynamics starts from the features of elementary stochastic operations changing the population members at each step, and then generalizing them to some rules of sampling measure modification.Such results were obtained by Arabas [3], Ghosh [19] as well as by Qi and Palmieri [40,41].
Analysis of the sampling measure dynamics of a whole population for small population instances were studied by numerous researchers, e.g.Rudolph [45], Sudholt [58], Beume [5] and their collaborators.The dynamics of a population composed of two individuals encoded by real numbers, processed by EA was deeply investigated by Karcz-Dule ˛ba [25,26].
The last approach, but most interesting from our point of view, consists in analyzing the sampling measure dynamics as a stochastic dynamic system, especially as the stationary Markov chain.Such model for the population composed of individuals encoded by real numbers and processed by EA was introduced by Rudolph [43,44].Our future consideration will refer mainly to the model developed by Vose with collaborators [33,62] suitable for expressing the dynamics of the sampling measure of the stochastic population-based search in finite, but very large admissible set and who introduced the so called ''heuristic operator''.
2 Modeling measure-driven stochastic algorithms inspired by nature

Reconsidering definition of the illconditioned global optimization problems
We intent to describe more precisely the class of ill-conditioned problems and their sets of solutions under consideration.The solutions of such problems will be searched is a bounded, connected, closed admissible set D & R l ; l ! 1 with a Lipschitz boundary [65], having a positive measure.Potential solutions will be evaluated by the objective function f 2 CðR l !RÞ so, that the lower the value f(x) is, the better is the candidate solution x.Of course, there exists at least one pair x min ; x max 2 D so, that À1\f ðx min Þ f ðxÞ f ðx max Þ\ þ 1; 8x 2 D.
2. Each local minimizer y to f in D will be called a global minimizer if f ðyÞ ¼ f ðx min Þ. 3. Set B y will be called the minimum manifold to f in D.
We will denote by Mmanifolds f ;D the family of all minimum manifolds to f in D. 4. Each minimizer y 2 D to f in D will be called isolated if B y ¼ fyg.
It can be proven (see Theorem 5 in [30]), that if y is a local minimizer of f in D, then B y is a connected component of f jD ð Þ À1 ðf ðyÞÞ that contains y.Moreover B y 0 ¼ B y 8y 0 2 B y and B y ¼ B y .Additionally if f is differentiable, then rf jx ¼ 0 8x 2 B y (see Observations 6 and 11 in [30]).
Definition 2 (see Definition 9 in [30]) Lowland P z to f in D associated with local minimizer z 2 B y is the maximum open-connected set (in the sense of inclusion) so that P z & B y and z 2 P z or the empty set.Let us denote by Lowlands f ;D the family of all non-empty lowlands to f in D.
It can be proven, that a non-empty lowland P z is unambiguously defined for z 2 B y and satisfies measðP z Þ [ 0. Moreover, 8z 0 2 P z P z ¼ P z 0 ; f ðzÞ ¼ f ðz 0 Þ ¼ f ðyÞ (see Remark 10 in [30]).
The further constructs will base on a strictly-steepestdescent local optimization methods denoted by loc.Roughly saying, they generate a minimizing sequence fx i g i¼1;2;... & D starting form an arbitrary x 0 2 D which tends to the ''nearest'' stationary point to f (possibly, to the nearest local minimizer) called locðx 0 Þ.Moreover, the value of f(x) has to decrease strictly along its path.The methods mentioned above were introduced in [11,42].The paper [30] contains the broad discussion of such methods and also proposes their approximation called a-strictlysteepest-descent method.In the particular case when f is continuously differentiable, we may replace the strictlysteepest-descent method by the antigradient flow for f, i.e. the family of solutions of equation dc dt ðtÞ ¼ Àrf ðcðtÞÞ; cð0Þ ¼ x 0 for which locðx 0 Þ will be set as lim cðtÞ; t !þ1.Definition 3 (see Definitions 28, 32 and 36 in [30]) Let loc be a strictly-steepest-descent local optimization method on D and y the local minimizer of f in D.
1.The set R loc y ¼ x 2 D; y ¼ locðxÞ f g will be called the set of attraction of y with respect to method loc.We will further simplify its notation to R y .

The sets
will be called the set of attraction of minimum manifold B y , and the set of attraction of lowland P y , respectively.Notice, that both above problems lead to find a subset (possibly not connected) of the admissible domain D & R l ; l ! 1, such that each of its connected parts has a positive measure in R l .Such tasks are suitable for stochastic search methods working with a finite samples in the admissible domain.

The principles of Vose model and its extensions for more advanced strategies
Perhaps the first result towards stochastic modeling population-based searches as a stochastic dynamic system was obtained by Michael Vose and his collaborators in 1990s [33,62].
Let U be the finite genetic universum being a set of codes of the finite number of chosen potential solutions to the global optimization problem.The universum U will be identified (labeled) with the set of positive integers where r\ þ 1 is the number of codes.Such labeling introduces the total linear order on U. We will write i\j, i; j 2 U if the label of the code i is smaller than the label of j, if it does not lead to ambiguity.
We assume that U is injectively mapped into the admissible set D by function code : U ! D and denote by D r ¼ codeðUÞ the set of ''phenotypes''.Moreover, we denote by d i;j ¼ dð code ðiÞ; code ðjÞÞ; i; j 2 U the distance between points code ðiÞ; code ðjÞ 2 D corresponding to the codes i; j 2 U. Finally, we introduce the fitness function For the sake of simplicity, we will denote The random sample of a size l processed at each step of a stochastic algorithm is a multiset which can be also represented by its frequency vector x t ¼ ðx 0 t ; . ..; x rÀ1 t Þ; x j t ¼ 1 l g t ðjÞ; j ¼ 0; . ..; r À 1.The population P t represents also the sample ðD r ; g t Þ in D, so that g t ðxÞ ¼ g t ðiÞ if code ðiÞ ¼ x.
Let us denote by X l the sets of all frequency vectors associated with the populations of size l.The cardinality of When we want to highlight the size of the genetic universum, we shall use the double-script notation X r l .Further it can be proven that ð3Þ where the simplex K rÀ1 & R r is a universal set of representations of all populations with an arbitrary size l 2 Z þ containing codes from U [62].
The stochastic, population-based strategy with a constant size of population l generates a random sequence of populations or, equivalently, a sequence of their frequency vectors If the following conditions hold: ð4Þ then the strategy is modeled by a stationary Markov chain with the space of states X l .Let us denote by p t l 2 MðX l Þ the probability distribution of a random variable x t at a step where MðX l Þ is a space of probabilistic measures on the set X l .The stochastic dynamics of such algorithm are determined then by the initial probability distribution p 0 l and the Kolmogorov equation: where denotes the transition probability matrix of this process.Further, we will use for various X l & K rÀ1 the polymorphic notation for the probability transition mappings s : X l !MðX l Þ that return the probability distributions over the states in the next step accordingly to the current state, i.e. sðxÞ ¼ fQ x;x 0 g x 0 2X l (see e.g.[35]).
The essence of the model presented above is shown at the upper part of diagram Fig. 1.The real operations applied to pass from the population frequency vector x t to the next one x tþ1 called implementation can be expressed by extracting the probability distribution p tþ1 l over the set of all population vectors X l , followed by one-time sampling of x tþ1 from X l .Both ways are stochastically equivalent.
Let us consider now the family of the population-based stochastic algorithms F U;f so, that all of them operate on populations P ¼ ðU; gÞ of clones from the same finite set U and all of them use the same stochastic operations transforming a population P t to the consecutive one P tþ1 , and such operations do not depend on the step t of the algorithm.In fact, all of these algorithms solve the same optimization problem imposed by the same fitness function f, and they differ only in size of population l they proceed.Of course, the frequency vectors of all such populations belong to the simplex K rÀ1 .

Fig. 1 Heuristic diagram
Notice, that each x 2 K rÀ1 can be also interpreted as a stochastic vector that belongs to MðUÞ.In order to avoid ambiguity, we introduce the mapping H : K rÀ1 !MðUÞ, being numerically the identity, changing only the meaning of its argument.
Definition 6 We will say, that the family F U;f has a heuristic, if there exists the continuous mapping H : K rÀ1 !K rÀ1 so, that: 1. 8x 2 K rÀ1 HðxÞ is the expected population vector in the next step following x, 2. 8x 2 K rÀ1 HðHðxÞÞ is the measure used for sampling the members of population immediately following x from the set of codes U.
Moreover, we will say, that the heuristic is focusing, if there exists a non-empty set of fixed points K & K rÀ1 to the operator H so, that: 3. 8x 2 K rÀ1 9!z 2 K; H p ðxÞ !z; p ! þ1, where H p denotes the p-times composition of the mapping H.
Remark 2 1.Because K rÀ1 is a bounded and convex set in R r and H is continuous then the Schauder theorem (see [54]) follows, that it has at least one fixed point.2. Assuming an arbitrary size of the population l, the coefficients of the transition probability matrix Q can be computed from the formula (see [62]) The lower part of a diagram shown on Fig. 1 illustrates the idea of modeling dynamics of a search with heuristic.The implementation can be replaced by taking the heuristic value on the current frequency population vector, followed by the l-times sampling without return from U according to the probability distribution imposed by Hðx t Þ.
The above model was introduced and successfully applied for the Simple Genetic Algorithm (SGA) by Vose and his collaborators.In particular, they effectively computed the SGA heuristic, called the genetic operator (see e.g.[62]).During the last two decades the authors of the proposed contribution partially extended this model to several more complex stochastic searches interesting from the application point of view: 1. Island Model (IM) [52] The heuristic is focusing, and there is a finite number of fixed points (cardðKÞ\ þ 1).The computing experience shows, that typically the unique fixed point to heuristic exists and more fixed points appear occasionally [1].
Remark 3 If the assumptions H 1 ; H 2 for a family of stochastic searches F U;f hold, then: 1.Each search from F U;f processing the population of an arbitrary size l can be modeled by the ergodic Markov chain and the associated sequence of measures fp t l g t¼0;1;2;... has a weak limit p l independent of a starting population x 0 or/and initial distribution p 0 l (see [33,62]).2. The probability distribution p l can be computed as a solution of the algebraic system, if we know the transition probability matrix Q of such process (see [35]).The computational cost of such task for a real wold problems is huge, because of a huge matrix dimension sðr; lÞ 2 (see ( 2)), so this way of asymptotic analysis is applicable only for searches in small universa U. 3. The sequence fp l g contains at least one sub-sequence fp l n g converging in distributions to some p Ã 2 MðK rÀ1 Þ for l n !þ1, moreover p Ã ðKÞ ¼ 1 (see [33,62]).4. Each trajectory of the heuristic iterates x 0 ; Hðx 0 Þ; H 2 ðx 0 Þ; . ..; H K ðx 0 Þ can be arbitrarily closely approximated by the trajectory x 0 ; x 1 ; x 2 ; . ..; x K of a finite, l-sized population for arbitrary x 0 2 X l ; K [ 1 with an arbitrarily large probability m, if the population size is sufficiently large l [ N (see Theorem 13.2 in [62]). 5.The fixed points of the heuristic can be well approximated by the finite population vector x t 2 X l in the stochastic sense, i.e. x t will be sampled from an arbitrarily close metric neighborhood of K e ' K with the arbitrarily large probability m, if the size of population l and the number of steps t are sufficiently large (see [8]).

Remark 4
The above considerations lead us to the following qualitative conclusions: 1.The stochastic population-based searches with a dynamic sampling measure adaptation that belong to F U;f are in fact the machine learning processes, that gather more and more information about the problem characterized by a fitness f, when the number of iteration grows.2. We may conjecture, that the maximum information about the problem that can be gathered by the family F U;f is contained in the fixed points of heuristic z 2 K, that are the frequency vectors of the limit populations representing most exhaustive searches (infinite sample after infinite number of steps).3. Roughly saying, the family F U;f might be assessed as ''well tuned'' to the solving problem, if its members are effective learning processes, i.e. the information about the solutions they are able to gather is satisfactory for the user.
The crucial questions that remain are: what could be the validation criteria for the family F U;f to be ''well tuned'' and how such criteria could be verified?
One of the possible answers to the first question needs the additional construction of a special family of probabilistic measures over the admissible domain D. For each y 2 D r we select the set # y & D of points located closer to y then to other phenotypes w 2 D r ; w 6 ¼ y.If D is sufficiently regular, then the family of subsets f# y g y2D r is the Voronoi tessellation associated with phenotypes (see e.g.[34]).We can introduce now a new measure over D with the following density function: No matter how q x is only a partial function (it is not defined for n 2 D equally distanced from at least two phenotypes), it satisfies measðdomðq x ÞÞ ¼ measðDÞ and for all x 2 K rÀ1 we have q x 2 L p ðDÞ; p ! 1 (see [49]).We will further call q x the ''brightness'' of a population represented by the frequency vector x.
The idea of ''well tuning'' criteria was introduced in [49, Def.4.63] for a family of SGA searches and a finite set of local minimizers.Its extended version proposed below consists in assuming some numerical conditions on the brightness q z of all fixed points z 2 K to the heuristic H of the family of searches F U;f leading to find continuous subsets of the admissible domain (lowlands and minimum manifolds).
Definition 7 We will say, that the family of stochastic, population based searches F U;f with a focusing heuristic H possessing a finite set of fixed points K is well tuned to the set of lowlands if for each P y 2 Lowlands f ;D hold: The intuition standing behind the above Definition is as follows.If we represent the chart of ''brightness'' q z as an l-dimensional monochrome graphic, then the sets we are looking for (lowlands, minimum manifolds) should ''shine'' over a darker background.Because all stochastic global searches allow for some degree of ''blurring'', it is more convenient to assume that the central parts of the interesting basins of attraction have a ''shine''.

Remark 5
1. Recalling the Remark 1 we may observe, that all sets CðP y Þ; CðB w Þ, P y 2 Lowlands f ;D , B w 2 Mmanifolds f ;D are pairwise disjoint, which ensures, that each set we are looking for can ''shine'' separately (see Fig. 2).2. It can be proven (see [49,Th. 4.67]), that each ''brightness'' q z associated with a fixed point of heuristic z 2 K can be well approximated in L p ðDÞ norm (p ! 1) in the stochastic sense, i.e. with the arbitrary large probability m, by the sequence of ''brightness'' q x t ; x t 2 X l , if the size of population l and the number of steps t are sufficiently large.Now, we are able to gather main qualitative conclusions and suggestions derived in the first part of this paper.

Remark 6
1.The formal model presented in above sections shows, that if the family of stochastic searches applied F U;f is ''well tuned'' to the ill-conditioned problems P 1 ; P 2 (see Definition 5), then we can draw the information about lowlands and minimum manifolds by the proper post-processing of the limit population x K or the cumulative population 1 where K is a sufficiently large number of steps and x K 2 X l for sufficiently large l (see Remark 5.2).Such post-processing may consist in clustering, cluster separation analysis, local fitness approximation, etc. 2. The above reasoning shows us, that in order to solve P 1 ; P 2 we really expect to obtain a random sample with a probability distribution sufficiently close to at least one fixed point of heuristic.It is then much more reasonable to analyze the dynamics and asymptotic features of sampling measures fq x t g; t ¼ 0; 1; 2; 3; . .., then the dynamics of a single individual in the consecutive populations, as it is performed in the classical approaches.

The assumed ergodicity of the Markov chain modeling
each search from the family F U;f , ensures the asymptotic guarantee of success of each search for which X l \ K e 6 ¼ ;, i.e. the well approximation of at least one fixed point of heuristic can be reached in a finite number of steps, starting from an arbitrary x 0 2 X l .4. The possible concept of stopping a stochastic strategy solving one of the problems P 1 ; P 2 is to recognize, whether at least one population vector x t falls into the set of states K e , arbitrary close to the fixed points of the heuristic.The Remark 5.2 guarantees, that the associated measure q x t will ''shine'' over the basins of attraction of lowlands or minimum manifolds to be found if the family F U;f is ''well tuned''. 5.More formally, we can define the random variable H e l ¼ infft !0; x t 2 K e ; x t 2 X l g being the first hitting time (FHT) of the set K e \ X l by the Markov chain modeling the stochastic search.It can be proven, that the expected hitting time E x 0 ðH e l Þ of reaching H e l starting from x 0 2 X l is the unique non-negative solution to the linear system [14,35]: Notice, that E x 0 ðH e l Þ is wholly determined by the heuristic, because K e depends only on its fixed points and the matrix Q can be computed for arbitrary l from the formula (6).No matter how, the above system (8) allows for qualitative study of a mean complexity of solving problems under consideration, its practical application is restricted to the problems with moderate set of codes U because of a huge dimension of the system matrix Q. 6.The other, more practical possibility of verifying stopping condition is to check, whether the consecutive samples form clusters of a sufficiently high quality, i.e. sufficiently dense and well separated from each other.7. The third possibility is to couple the stochastic searches with a fitness deterioration, that ''fills'' consecutively the parts of basins of attraction recognized in each step or several steps of the algorithm.At the end of this strategy the resulting fitness becomes flat, so new heuristic has only one fixed point-the center of the simplex K rÀ1 [49,62].Such fitness imposes the chaotic behavior of the searching process, which can be recognized by analyzing Hðx t Þ in several consecutive steps (see e.g.[50,60,64]).8. Assessing whether the particular family of searches is ''well tuned'' is difficult in the computational practice.Typically, the algorithms with a stronger selection pressure are more likely ''well tuned''.Unfortunately, such algorithms are ineffective in a global search.The possible solution is to use a cascade of stochastic searches, in which the upper ones are designated to Fig. 2 The two graphs show the fitness functions (values at left axes) and density of a unique fixed point of a heuristic (right axes).The results were obtained for SGA type heuristic with proportional selection and mutation rate of 0.05.No crossover was utilized.The universum U collects all single byte binary codes that represent 256 uniformly distributed phenotypes in the domain D ¼ ½0; 10.Fixed points were computed analytically, see [27] global search, while the lowest ones deliver the sample concentrated in the basins of attraction of lowlands or minimum manifolds.Such proposition called HMS will be presented later in this paper.9.The Fig. 2 shows two examples of finding fixed points of heuristic to the family of SGA equipped with proportional selection and mutation only.The 1D domain was encoded using only single byte binary strings representing uniformly distributed phenotypes.Fixed points were computed analytically, see [27].The Example 1 shows, that the utilized SGA family is ''well tuned'' for the wide range of threshold parameter, so the brightness will ''shine'' on central parts of basins of attraction to both minimizers.The ''brightness'' of fixed point in Example 2 does not ''shine'' on the whole lowland areas leaving ''dark'' parts close to their border, so the family of searches is not ''well tuned'' in this case.Such behavior is observed for most stochastic searches using classical evolutionary mechanisms as selection and mutation.The complex stochastic search including the MWEA component is recommended in next Sect.3 to avoid this obstacle.10.The ''well tuned'' stochastic search can be also used as the first phase of solving particular problems, if only the finite number of isolated local minimizers have to be found (all minimum manifolds are singletons).In this phase the number of solutions and the central parts of their basins of attractions are recognized.The precise approximation of minimizers are performed in the second phase, by a steepest descent local methods started in parallel in each basin already recognized (see e.g.[60]).

Multi-Winner evolutionary algorithm
Multi-Winner Evolutionary Algorithm (MWEA) is a population-based stochastic search with the Multi-Winner Selection (MWS), which mimics the rules originally used for electing boards of directors in large corporations [13].It significantly increases the capability of identification of insensitivity set shape components.Here we introduce MWEA Markov model and derive the formula for its probability transition, i.e. the transition probability matrix.We will study an artificial genetic system which at each genetic epoch t ¼ 0; 1; 2; . . .takes the parental population P t , to create an intermediate offspring population O t and a resulting population P tþ1 , which will be an input to the next epoch.Both P t and P tþ1 are multisets of codes from U of the cardinality l represented by the frequency vectors x t ; x tþ1 respectively.The offspring can be also represented by the frequency vector Using the notation of frequency vectors we obtain g t ðiÞ ¼ lx i t , c t ðiÞ ¼ ky i , 8i 2 U and We can compute the union of multisets containing clones of codes from U (see [49]).Let A ¼ ðU; gÞ and B ¼ ðU; wÞ be two arbitrary multisets, so that card ðAÞ ¼ P i2U gðiÞ ¼ ,, card ðBÞ ¼ P i2U wðiÞ ¼ v and ,; v\ þ 1, then: Remark 7 If the parental and offspring populations are Apart from the frequency vector notation, we will use the notation of multisets based on the permutational power of a set (see [49,Def. 2.13]).We can introduce an equivalence eqp & U l Â U l , so that two strings d; f 2 U l satisfy ðd; fÞ 2 eqp if there exists a permutation from a symmetric group S l that maps d to f.Each multiset ðU; lxÞ associated with a frequency vector x 2 X l can be represented by a class of abstraction ½n x eqp of a sequence n x ¼ ðn x 1 ; . ..; n x l Þ 2 U l so that: & U is a set of codes represented in the multiset ðU; lxÞ, and q ¼ card ð supp ðlxÞÞ l.We will denote such a representation of the multiset ðU; lxÞ, corresponding to the permutational power of the set, as: The main advantage of this representation over the representation using the occurrence function is the possibility to distinguish between two individuals n x i , n x j , i 6 ¼ j which represent the same code b k p 2 supp ðlxÞ.All individuals are unambiguously labeled and linearly ordered by their indices in the chosen sequence n x which takes a form (11), while all permutations of n x represent the same multiset.
However, this representation is not mathematically precise and makes the Boolean operations difficult to formalize.It can be proven that both representations are equivalent if l\ þ 1 (see [49,Rem. 2.15]).
In Algorithm 1, we show the scheme of the evolutionary strategy that performs l þ l succession using MWS [13].
The stochastic, population-based strategy following the above schema (Algorithm 1) generates a random sequence of populations or, equivalently, a sequence of their frequency vectors The line 5 in Algorithm 1 represents the evolutionary search which can be modeled by a stationary Markov chain with a space of states X l , (i.e., it satisfies (4)) with a transition probability matrix .We will use also the stochastic operation mutate : X l !MðX l Þ characterized by a strictly positive transition probability matrix M 2 ½0; 1 sðr;lÞÂsðr;lÞ .Now, we intend to derive a transition probability matrix for the Markov model of Algorithm 1, assuming that we know the transition probability matrices Q and M. Proposition 1 Assuming l ¼ k, the probability distribution of the offspring O t frequency vector y t 2 X l at the t-th epoch of Algorithm 1 is given by the product Qp t l , where Q is the transition probability matrix of the SGA.
Let us now study the probability distribution of a sum of the current population at an epoch t and its direct offspring P t [ O t .The frequency vector of such multiset will be denoted by z t 2 X 2l and by Remark 7 we have We will denote by p t 2l 2 MðX 2l Þ the probability distribution of a frequency vector z t , so that: ð12Þ Proposition 2 Probability distribution p t 2l 2 MðX 2l Þ of the frequency vector z t can be obtained by the product p t 2l ¼ Ap t l where A 2 ½0; 1 sðr;lÞ Â ½0; 1 sðr;2lÞ is given by: where x 2 X l and z 2 X 2l .For each combination (x, z), it ensures that it is possible to reach z from x by adding a member y of X l .Only for such combinations, the probability Q x;y is copied to A. In that way, there is a certainty, that for each possible z (given x), one has x & z and the additional individuals are added according to Q x;y .
Let us now define an operator: which determines the outcome of election carried out on a sum of parents and children.Moreover, it allows to define a probability transition matrix: B 2 ½0; 1 sðr;2lÞÂsðr;lÞ ; associated with this step of evolution.The definition of E presented below employs the greedy-CC election rule and the plus-1 proportional utility function.The CC rule chooses the highest-ranking committee among all the possible ones, so it has exponential computation time.See Fig. 3 for an example election with the CC voting rule.The greedy version uses a simple heuristic, which favors the individuals which contribute the most to the scoring function.
The election is based on a family of utility functions: Next, we use a representation of the population ðU; 2lzÞ by the sequence n z 2 U 2l (see (10)), which allows to distinguish the individuals containing the same genotype.
Using utility functions (16) we can introduce a family of linear total orders , a 2 supp ð2lzÞ in the population n z so that:

<=> ð17Þ
On the right side, the first part (before the alternative operator) checks the utility values.The second part breaks the ties in the utility values using the ordering of n z , which is ensured by the representation (10).
Next, we define a family of permutations a 2 supp ð2lzÞ, so that pos z;a reorders the sequence n z to a sequence ordered by the relation 1 a .The image pos z;a ð1; . ..; 2lÞ of the naturally ordered set of indices will be the preference list associated with a genotype a 2 supp ð2lzÞ.
The resulting frequency vector w 2 X l will be obtained in l steps.In each step, the multi-winner procedure produces one element of a finite sequence of sets W 1 ; . ..; W l , called committees such that card ðW , Þ ¼ ,, , ¼ 1; . ..; l, and W , & W ,þ1 , , ¼ 1; . ..; l À 1.The coordinates of the vector w will be given by: ð18Þ The first element of the sequence of committees will be obtained as: The scoreðÁÞ function will be given by the formula: ð20Þ where pos z;n z i ðjÞ stands for the position of the coordinate of the population member n z j in the preference list pos z;n z i ð1; . ..; 2lÞ, associated with the genotype of population member n z i .The next elements W iþ1 , i ¼ 1; . ..; l À 1 of the sequence will be defined by the formula: ð21Þ Fig. 3 Example showing election with Chamberlin-Courant voting rule.We have 4 candidates (represented by heads) and 4 voters, who have the candidates ordered by preference.We want to choose a winning committee with two candidates, so we consider all possible committees of two and choose the committee that maximizes its total score.On the bottom right we show 6 possible committees and we intersect them with the voters.In each cell, we show the Borda score of the best ranking committee member for a voter.(For example, voter 1 and the first committee: the two committee members have scores 3 and 2, respectively, so we note 3, the better score, in the cell.)Summing each column, we get the total scores of all possible committees.We have a tie (two scores 11), so we arbitrarily choose one of them This is the main formulation of the greedy algorithm, as each candidate added to the winning committee maximizes the score surplus.Invariants are maintained by enforcing satisfaction of the inequality in arg max.
The function scoreðÁÞ calculates the k-Borda score of a committee (20).What it does, is for each voter to find the most preferred candidate from a committee and add its k-Borda score to the sum.The k-Borda score linearly depends on the position of the committee member on the preference list of a voter.
Transition matrix D 2 ½0; 1 sðr;lÞÂsðr;lÞ of the entire ðl þ lÞ scheme is then given by: Finally, the Kolmogorov equation for the Markov process associated with Algorithm 1 has the form: Remark 8 The Markov process modeling MWEA is ergodic, because D is a strictly positive stochastic matrix as a product of the stochastic matrix C and the strictly positive stochastic matrix M.

A formal model of the dynamics of HMS enhanced with MWEA
In this contribution we will concentrate on our recent strategy HMS/MWEA devoted to solving most challenging global search problems.This complex memetic strategy is equipped with a new component, MWEA -an evolutionary algorithm with the Multi-Winner Selection (MWS).The whole complex strategy turns out to be a global search tool especially well-suited for solving problems with many local solutions possibly surrounded by thick objective insensitivity sets.The strategy aims at providing the information about all the global solutions, even when these form an uncountable set.The algorithmic details of the described strategy are covered in our previous works.In this paper we will include a formal model of the dynamics of the strategy.In the model the strategy is represented by a homogeneous (stationary) Markov chain.We provide the details on the construction of the state space and the transition matrix.
In this paper after the model formulation we prove the ergodicity of the obtained Markov chain, which implies the asymptotic guarantee of success.The other properties, such as well-tuning of the whole strategy or a concept of stopping conditions, are subject to further studies.

HMS extended with insensitivity region approximation
The HMS is a complex stochastic strategy consisting of a multi-deme evolutionary algorithm and other accuracyboosting, time-saving and knowledge-extracting techniques, such as gradient-based local optimization methods, dynamic accuracy adjustment, sample clustering and additional evolutionary components equipped with a MWS operator aimed at the discovery of insensitivity regions in the objective function landscape (see e.g.[47,57] and the references therein).
The HMS sub-populations (demes) are organized in a parent-child tree hierarchy.The number of hierarchy levels is fixed but the degree of internal nodes is not.Each deme is evolved by means of a separate single-population evolutionary engine with a finite genetic universum such as SGA or MWEA.
In a single HMS global step (a metaepoch) each deme runs a prescribed number of local steps (genetic epochs).After each metaepoch, a change in the deme tree structure can happen: some of the demes that are not located at the maximal level of the tree can produce child demes through an operation called sprouting.It consists in sampling a set of points around the parent deme's current best point using a prescribed probability distribution: here we use the normal distribution.The sprouting is conditional: we do not allow sprouting new children too close to other demes at the target HMS tree level.HMS typically starts with a single parent-less root deme.The maximal-level child-less demes are called leaves.The evolutionary search performed by the root population is the most chaotic and inaccurate.The search becomes more and more focused and accurate with an increasing tree level.The general idea is that the higher-level populations discover promising areas in the search domain and those areas are explored thoroughly by the child populations.It is then the leaves that find actual solutions.
The well-tuning of the entire strategy is achieved by having well-tuned lower-level demes.However, the highlevel demes can, and even should not be well tuned, maintaining good exploratory characteristic.
The hierarchic structure of the HMS search is especially effective if the computational cost of objective evaluation strongly decreases with its accuracy, which is typically the case when solving Inverse Parametric Problems (IPPs) [14].
The results of such global phase are then transferred to the MWEA.Each deme from the highest level of the HMS hierarchy is treated as a cluster.We merge neighboring clusters using the hill-valley rule, i.e., we ascertain that there is no hill separating the two clusters to be merged [61].Such merged clusters are input to the MWEA, which raises their local diversity.
In this case, MWEA is well-tuned not to obtain best objective values, but to concentrate its sampling measure on the lowlands.Such concentration allows the next stage, local approximation, to determine the boundaries of the lowlands.
The next stages, not included in the formal model, consist in designing the local objective approximation for each set of individuals e Q i , where i is the identifier of an MWEA population, with the methods described in [47].We prepare the Lagrange 1 st order splines on a tetrahedral grid spanned over the e Q i points with the Delaunay's algorithm.Next, this function is mapped onto the space of 2nd order B-splines spanned over a regular polyhedral grid using either L 2 or H 1 projection.Both types of projections result in C 1 smoothness of the local objective representation.We compare the two projections to the kriging approximation [24].Let us denote by e f i the local approximation of the objective associated with the set of individuals e Q i .Finally, the level set of e f i , taken at a sufficiently low level with respect to the local minimum encountered, is taken as the approximation of the insensitivity set component, associated with each set of individuals e Q i .

HMS model basic notions
Now, let us recall some notions from [51] used to build the HMS formal model.The model was formulated for the HGS but the HMS inherits its main structural features from the predecessor.
• a family of m 2 N genetic universa U i with card ðU i Þ ¼ r i 2 N for i ¼ 1; . ..; m; • an associated family of encoding operators featuring the progressive increase of the search accuracy, i.e., for some d i 2 N describing the increment rate in the search accuracy between U i and U iþ1 ; • the following sequence of inheritance onto mappings and sets where we assume that card U i j n ¼ d iÀ1 for n 2 U iÀ1 ; i ¼ 2; . ..; m; • a family of probability distributions r 0 2 MðU 1 Þ and used for sampling initial populations in demes.• the family of fitness functions since U i is finite we can identify f i with the vector of its values indexed by n 2 U i ; • deme state spaces X 1 ¼ X r 1 l 1 ,. .., X m ¼ X r m l m with population sizes l 1 ; . ..; l m ; at HMS tree levels 1; . ..; m; • lengths of ''metaepochs'' kstep 1 ; . ..; kstep m 2 N; • one-step transition matrices Q i 2 ½0; 1 sðr i ;l i ÞÂsðr i ;l i Þ governing the deme evolution, cf. ( 5), ( 22), ( 23); obviously, the respective metaepoch transition matrices are Q kstep i i ; • the probability p prune 2 ½0; 1 of pruning one of stopped branches of the root; • a family of local efficiency stopping conditions of type described in [51], i.e., a family of positive thresholds lsc i such that the probability of stopping a deme evolution after executing a metaepoch in a deme state x 2 X i is given by where ½Á is the Iverson bracket, i.e., and ðÁ; ÁÞ is the standard inner product in R r i ; • a family of proximity relations meaning that ðn; xÞ 2 C i if an individual n 2 U i is close enough to a deme with population vector x 2 X iþ1 .
In the sequel we shall use the following functions: that selects the best individual from the population x that has the minimal genotype according to any fixed ordering in U i .We shall also use a kind of neighborhoods (proximity sets) related to the proximity relation: A particular example of proximity relation (31) that is useful in practice can be found in [51].

HMS tree
The HMS populations form a hierarchy represented here by m-level undirected graph HMSTREE ¼ hV; E; Fi: The vertices V correspond to HMS populations (demes), the edges E follow the parent-child relation between populations.In our formal model we assume that this graph shows all the possible populations and does not change over time.Operations that in practice create new demes (i.e., the sprouting) here simply activate an available one.
Similarly, the deme destruction (in the pruning) is here replaced with the deactivation.The number of children of each node can be different for different levels but at each level it is a constant k i (i ¼ 2; . ..; m).For the uniformity we set k 1 ¼ 1.The labeling encodes the path from the root to a given node.Namely, let us take the set of admissible deme numbers at the tree level i K i ¼ f1; . ..; k i g define K 1 ¼ fj 0 g ¼ fð1; 0; . ..; 0 |fflfflffl{zfflfflffl} mÀ1times Þg; K i ¼ fð1; j 2 ; . ..; j i ; 0; . ..; 0 |fflfflffl{zfflfflffl} mÀitimes Þ : j j 2 K j ; j ¼ 2; . ..; ig for i !2; K m ¼ fð1; j 2 ; . ..; j m Þ : j j 2 K j ; j ¼ 2; . ..; mg; Here, K is the domain of all labels (i.e., the image of F), K i is the set of labels of i-level demes, K par is the set of parental deme labels and K m is the set of leaf deme labels.
In the sequel we shall use two auxiliary functions len : K !f1; . ..; mg returning the length of a path j 2 K, i.e., the level of the deme with label j, and prefix i : K !K returning the length-i ''prefix'' of j.Namely len ðjÞ ¼ max l 2 f1; . ..; mg : j l [ 0 f g prefix i ðjÞ ¼ j 1 ; . ..; j i ; 0; . ..; 0 The root is obviously the unique node for which len ðjÞ ¼ 1.Furthermore, for each parental node j 2 K par (i.e., such that len ðjÞ\m) we can introduce the set of child node indices I j In the sequel we shall also make use of the set of all descendants of j together with j, i.e.,

The HMS state space
First, note that the HGS model differs from the one presented in [51] in some points.To make the structure more flexible we have added a pruning operation.The latter consists in deactivating a stochastically selected sub-tree that was previously stopped, i.e., it exhausted its search capabilities.The pruning operation is already used in the computational practice and provides a probabilistic way to stop ineffective computations.In this manner it also enables us to prove the ergodicity of the whole strategy.
The overall state of the algorithm is determined by the states of all active and potentially active demes.All such demes are one-to-one related to nodes of the HMSTREE structure described in Sect.4.3, hence they are also one-toone related to the node indices, i.e., the elements of K.The state of a particular deme with label j 2 K is determined by its frequency vector x 2 X len ðjÞ .Moreover, each non-root deme has a status indicator that can have one of values s j 2 finactive; new; active; stoppedg.The root deme can only have status active or stopped.
At the beginning the only active deme is the root, all the other are set as inactive.An initial population of the root deme is generated by sampling with return from U 1 according to a given probability distribution.
Below we summarize the meaning of the status values.
• A deme is inactive if it has not been activated yet by the sprouting operation or was pruned previously.To make it entirely formal, we assume that the deme vector of each inactive deme at the level i has a fixed arbitrary value from X i .This assumption affects neither the formal analysis nor the computational results in any way.• A deme j is new if it has just been sprouted by its parental deme.A new deme cannot sprout another deme.The population of the new deme is sampled according to a given distribution, hence the population vector is set appropriately to a specific value (the initial setting is removed).The status changes from new to active or stopped after having executed the deme's first metaepoch.In order to perform the sprouting the parental deme has to be active in at least one of the previous steps.• A deme is active if it was new or active in the previous metaepoch and the stopping condition is not satisfied.• A deme is stopped if the efficiency stopping condition is satisfied currently or was satisfied in the past, and the deme has not been pruned yet.In the case of a parental deme the status is also set to stopped when all its child demes have been activated (i.e., they have status active or stopped).Such a situation appears very rarely in the computational practice.The deme marked stopped once either stays stopped up to the end of computation and does not change its deme vector, or can be pruned with some positive probability.
Note that there are some relations among node status values and not all sequences of status values are possible.First, the HMS tree develops from the root towards leaves, hence if a deme is inactive, then all its descendants must bear the same status, i.e., This condition is naturally preserved by the pruning operation (cf.Algorithms 2-4).Second, the root is stopped if and only if all its children are not inactive.Summing up, the HMS state space can be described in the following way: x j 2 X len ðjÞ ; s j 0 2 factive; stoppedg; s j 2 finactive; new; active; stoppedg for j 6 ¼ j 0 ; Note that an HMS state is a vector indexed by the elements of HMSTREE.Each component of this vector has in turn two sub-components: a deme population vector x j and a deme status s j .Note also that X is finite provided all genetic universa U i are finite.
In the sequel we shall also need the following subsets of child node labels for j 2 K computed in a strategy state x 2 X.
I asn j ðxÞ ¼ I j n I in j ðxÞ; I s j ðxÞ ¼ j 2 I j : s j ¼ stopped I in j ðxÞ is the set of labels of nodes that are active in state x 2 X, I s j ðxÞ is the set of stopped node labels and I asn j ðxÞ is the set of labels of nodes that are active, stopped or new.We shall also use the function returning the label of the inactive child of j that has the minimal number mind : X Â K par 3 ðx; jÞ À! arg min j2I in j ðxÞ j len ðjÞþ1 2 K:

Algorithmic details
In this subsection we shall provide a detailed description of particular algorithms used in the HMS.It is based on the one presented in [51] but here we provide some clarifications and noteworthy modifications.First of all, let us note that the strategy can be highly parallelized: the demes (sub-populations) can be evolved in parallel with some well-defined synchronization points.Moreover, the HMS structural operations (sprouting and pruning) can also be parallelized to some degree.Therefore, we shall formulate the overall strategy as three types of algorithms that are run concurrently: the root deme algorithm, the mid-level deme algorithm and the leaf deme algorithm.The latter two differ only in that the leaves do not perform actions related to children management.
We assume that in the initial state of the strategy, all demes except the root are inactive and the root itself is set to be active, i.e., with s j 0 ¼ active and s j ¼ inactive for j 6 ¼ j 0 :The following auxiliary functions will be used as primitive building blocks of the algorithms presented in the sequel.They are explained here without details.
• -samples with return k times according to a probability distribution r.

•
-selects an element from list with the even probability.

•
-returns indices of the children of deme, i.e. the elements of I deme , that have either of provided statuses (see (36)).

•
-runs a metaepoch for the current deme, i.e., evolves population according to the transition matrix Q kstep level level .
• -checks the global stopping condition.

•
-checks the local stopping condition for deme.
Moreover, the algorithms use the functions b i and mind introduced above, see (32) and (40), respectively.
The deme synchronization is based here on messagepassing primitives.Namely we use two following operations:

•
-a non-blocking operation of sending message to another deme; • -a blocking operation of receiving message from another deme.
When there is a need to receive or send an object along with a message we use the overloaded functions and .The realization of and is a classical problem: their implementation can make use of, e.g., message queues.
Algorithm 2 shows the activity of the root deme.First, an initial population is sampled according to the distribution r 0 .Then, the main event loop is started.Its first stage is the execution of metaepochs in admissible subtrees followed by the execution of a metaepoch in the root itself.Note that during the first run of the loop there are no admissible subtrees, i.e., all children of the root are inactive.After the receipt of messages signaling metaepoch finishing the root checks the proximity of the best current individual to feasible children's populations, initiates the sprouting in admissible subtrees and, if possible, sprouts a new branch from the current best individual.The completion of the sprouting in the subtrees is followed by the stochastic pruning of a stopped subtree.The decision of pruning is taken with probability p prune : note that Binom(n, p) denotes the binomial distribution with parameters n and p.If the decision is positive, we select one of stopped children of the root and prune it, i.e., deactivate the children and all its successors.After the finish of the pruning the root checks if there are some inactive children.
If not, the root status is set to stopped, otherwise it is set to active.The final stage of the loop is the evaluation of a global stopping condition of the whole strategy.If the latter is satisfied, the computations are finished and all the demes are halted in the appropriate order.Otherwise, the loop proceeds to the next run.
Next, let us consider a mid-level deme activity shown in Algorithm 3.After the initialization of the deme status and a loop control variable, the mid-level deme starts an event loop that consists of handling orders sent by the parent deme.The following order types are handled.
• ACTIVATE-the deme is initialized, i.e., its population is sampled according to the distribution r seed len ðjÞÀ1 and the deme status is set to new.
• PRUNE-the deme gets deactivated along with all its children.• FINISH-the deme halts its computations and passes the message to all its children.• RUNMETA-the deme passes the message to all its children and then runs its own metaepoch if its status is new or active; afterwards, the deme waits for the READY responses from all their children that signal the end of the children-deme metaepochs; subsequently, the local stopping condition is checked, which can change appropriately the deme status; finally, the READY message is sent to the parent.• ISCLOSE-the deme checks if an individual sent from the parent is close to the current population, cf. ( 31).• RUNSPROUT -if the deme's children are not leaves they are requested to perform the sprouting; then, if the deme is active it performs the sprouting itself; next, it waits for the finish of the sprouting in the children; if there are no inactive children, the deme status changes to stopped; finally, the deme acknowledges the parent about the completion of the sprouting.
Finally, the activity of leaf demes is presented in Algorithm 4. Note that it is a simplified version of Algorithm 3 that omits all operations related to child management.To state it clearly, the event handling in leaves looks as follows.
• ACTIVATE -the population is sampled according to the distribution r seed mÀ1 and the deme status is set to new.• PRUNE -the deme simply gets inactive.
• FINISH-the deme halts the evolution.
• RUNMETA-the deme runs its metaepoch if its status is new or active; afterwards, the local stopping condition is checked, which can change appropriately the deme status; finally, the READY acknowledgment is sent to the parent.• ISCLOSE -the deme checks if an individual sent from the parent is close to the current population, cf. ( 31).

Transition operators related to HMS steps
The vast part of this subsection is a simplified version of the description provided in [51].For the full details we refer the reader there.Note that the model presented in [51] uses the agent-based framework, hence such notions as ''action'' arise therein naturally.To highlight the correspondence between the current model and the older one we preserve the basic terminology, at the same time not retaining agents themselves.Therefore, in the sequel ''action'' has exactly the same meaning as ''operation''.

General structure of operators
An HMS action a is here represented as a pair of functions is the decision function.It computes the probability of choosing the action a in state x 2 X, i.e., a is run with probability d a ðxÞ and rejected with probability 1 À d a ðxÞ.
The state transition function defines a non-deterministic state transition resulting from the execution of a: # a ðxÞðyÞ gives the probability of changing state from x to y as the result of a.A trivial yet important example of such an action is the no-op (donothing action) denoted here null.Its state transition function is the Kronecker delta, i.e., The overall Markov kernels related to any action a can be computed using the chain rule.It has the following form.
The first term on the right-hand side is connected to the state transition when the decision of executing a is positive.
The second term represents the case of the rejection of a when in fact we do not perform any state transition.
In the sequel we show decision functions and statetransition functions for the following non-trivial action types: • metaepoch actions meta j : j 2 K È É available for all demes; • sprouting actions sprout j : j 2 K par È É defined only for the parental demes; • pruning action prune available for the root deme.

Metaepoch operators
Now let us recall the stochastic operators for the meta j action.To this end let us consider two consecutive states x; y 2 X appearing during the HMS computation.We will denote by ðs j ; x j Þ the components of x and by ðt j ; y j Þ the components of y.
The decision function for the root deme j 0 is given by the following formula For lower-level demes j 2 K n K 1 the decision function has the form Now let us proceed to the state transition function for meta j .To this end denote i ¼ len ðjÞ.Then we have

.3 Sprouting operators
First let us introduce the following family of stochastic functions: where T i ðxÞ is the distribution of l iþ1 -times sampling with return from U iþ1 according to the distribution r b i ðxÞ i . To simplify the formulae defining operators let us introduce another auxiliary function a : X Â K par 3 ðx; jÞ À! card I in j ðxÞ 2 N; Using the above-defined notions we can formulate the decision function for the sprouting in demes j 2 K par Similarly we obtain the following form of the state-transition function for the sprouting.

Pruning operator
Now, let us proceed to the pruning operator.Note that this definition cannot be found in [51] because the model presented in that paper does not include the pruning.The decision function in our case has the following form.
The state transition function can be expressed as follows for some j 2 I s j 0 ðxÞ; where

The transition probability function for the whole HMS
First, let us recall that the superposition of two Markov kernels can be computed using the total probability law: In [51] we showed that if two actions a 1 , a 2 are either both metaepoch actions or both sprouting actions their kernels commute, i.e, which means that the outcome of the actions' execution is independent upon their order.Therefore, as in Algorithms 2-4 they can be safely run in parallel.The Markov kernel for the whole metaepoch step s meta : X !MðXÞ is the superposition of single-deme metaepoch Markov kernels .meta j for all demes j 2 K: ð55Þ Note that according to what has been written above the superposition in (55) does not depend on an order of labels in K. Similarly, the Markov kernel for the sprouting step s sprout : X !MðXÞ is the superposition of single-deme Markov kernels .sprout j for all parental demes j 2 K par .Therefore Here, again, the superposition is not sensitive to deme ordering.
The synchronization scheme used in Algorithms 2-4 divides each global step of the HMS into three subsequent stages: the metaepoch stage, the sprouting stage and the pruning stage.Therefore, the transition function for the whole strategy s : X !MðXÞ is the superposition of the respective Markov kernels, i.e., s ¼ .prune s sprout s meta : ð57Þ

HMS asymptotic analysis
In this subsection we consider an important asymptotic property of the whole strategy, i.e., its ergodicity, which can be understood as the guarantee of success, in the sense that if a complex dynamical system as HMS can reach any of its states in a finite number of steps, it can end up in any desired state in finite time.
Theorem 1 Assume that for every i ¼ 1; . ..; m ðH 1 stop Þ: the stopping conditions (30) are not trivial, i.e., the genetic universa U i , the fitness functions f i and the thresholds lsc i are such that for each x 2 X i we have Proof To prove the ergodicity of the considered chain we need to show its irreducibility and aperiodicity.
Recall that the irreducibility means that any two states x; y 2 X communicate, i.e., we can pass from x to y in a finite number of steps with positive probability.The proof shall proceed by showing that all elements of X are reachable from x 0 and that x 0 is reachable from any other state.
Let us start with the second claim.Let x be an arbitrary state.First observe that if the root has some children that are not inactive (i.e., they are new, active or stopped) any of them can be deactivated by means of the pruning operation in the following sequence of steps: 1.If it is new or active We execute a metaepoch (and possibly the sprouting) such that the local stopping condition gets satisfied and the status changes to stopped; 2. We prune the stopped child.
Step 1 has positive probability because it is a chain of positive-probability events.Namely, thanks to assumption ðH evo Þ we can pass between arbitrary elements of X i in one metaepoch with positive probability and thanks to assumption ðH 1 stop Þ, the probability that the reached population satisfies the local stopping condition is also positive.After the completion of step 1 the probability of running the pruning action is positive, see (51), and the probability of selecting the considered child is also positive, see (52).Therefore, the chain rule implies that the deactivation of the child deme in a finite number of steps has a positive probability.
Next, observe that, as long as there are the root's children that are not inactive, with positive probability we can repeatedly prune stopped demes without sprouting any new children between two subsequent deactivations.This means that the event of not sprouting a new deme after pruning a stopped one has the positive probability provided there are not-inactive children of the root.But thanks to assumptions ðH evo Þ and ðH 1 prox Þ this is a positive-probability event: a metaepoch in the root ending up with the best individual that falls close to any of not-inactive children.
This way, with positive probability we can prune allbut-one children of the root.The next step is that we evolve the root finishing in x j 0 at the same time pruning the last child.Such an event has a positive probability as well due to assumptions ðH evo Þ, ðH prune Þ and ðH 2 stop Þ.Therefore, we have shown the way to pass from an arbitrary state x 2 X to x 0 in a finite number of steps with positive probability.
To proceed the opposite way, take an arbitrary x 2 X and consider the following chain of possible events: 1. We start in x 0 evolving the root in such a way that we sprout the number of children that equals the maximal child number at level 2 in x, 2. We proceed the same way at lower levels at the same time pruning the branches that are inactive in x, 3. We evolve the not-inactive branches to reach the same status and population as in x.
The above events have positive probability thanks to our assumptions, which shows that we can pass from x 0 to x with positive probability.
To conclude the proof, we need to prove the aperiodicity of the chain, which, thanks to the irreducibility, is equivalent to the aperiodicity of any of its states.To this end, take x 2 X such that s j ¼ stopped for all j 2 K. Then no sprouting is possible because all demes are busy and no deme evolution occurs because all demes are stopped.Hence, as long as the pruning does not happen, which has positive probability thanks to ðH prune Þ, the state does not change, which means that x is aperiodic.h Remark 9 Assumption ðH evo Þ is satisfied for both SGA with positive mutation and MWEA (see Sects. 2 and 3).
Remark 10 For some natural proximity relations assumptions ðH 1 prox Þ and ðH 2 prox Þ reduce to geometric constraints on the neighborhoods C i with respect to the computational domain.This is, e.g., the case of proximity relation defined in [51] where the neighborhoods are discretizations of balls in R n that must have sufficiently small diameters in order to satisfy ðH 2 prox Þ.

Illustrative examples
The essential, qualitative results of this paper are presented in the preceding sections.Here, we mean to present in action the strategies which are supported by these formal postulates.We have carefully chosen these simulations from our previous publications [47,48], refining their results.
In particular, we will show two benchmark cases, followed by an engineering example.The first benchmark will use a fitness function with 4 non-convex regions of insensitivity.We will show how particular stages of the algorithm up to the local approximation work on this example.The second will use a fitness function with similar features, but with more regions of insensitivity-we present a comparison based on metrics for this case.In both benchmarks insensitivity regions and their approximations are constructed as level sets with cutoff 0.1, i.e. subsets of the domain where the objective function (or its approximation) assumes values less than 0.1.We will finish with the engineering example from the domain of underground hydrocarbon prospecting.

Benchmark with 4 regions of insensitivity
The first example consists in finding the shapes of the four insensitivity regions, which objective function is shown in Fig. 4 on the left, and it is meant to present the mechanics of the strategy.The strategy is composed of the global phase and the local phase with MWEA.In the global phase we use HMS-CMA-ES [46], and compare it with NEA2 [38].The former is supported by the Markov analysis we presented earlier in the paper, and the latter is a state-ofthe-art algorithm dedicated to multimodal problems.For more details about the objective function and the configuration, see [48,Sect. 3.1].
We show the results of running both variants of the strategy in Fig. 5.In the first row, there are clusters returned by the global phases.HMS-CMA-ES produces more focused samples, which better identify the lowlands.The clusters laying in the same basins of attraction, based on the hill-valley rule, are in the second row.For NEA2, this operation results in clusters that do not separate lowlands.In the final row, we show MWEA demes populations.For HMS-CMA-ES clusters, MWEA explores the neighborhoods of the lowlands, possibly allowing shape approximation, maintains separation and reducing overall number of evaluations.However, MWEA does not handle NEA2 clusters well: although lowland neighborhoods are explored, the lowlands are not separated, and it results in expendable objective evaluations.

Benchmark with 25 regions of insensitivity
Here, we present how the two variants of the strategy (NEA2 and HMS-CMA-ES based) perform on a benchmark with more lowlands, compared to the previous case.We show its objective function in Fig. 4 on the right.In this example, we perform calculations with evaluation budget varying from 2000 to 10000.After its exhaustion, the global phase is stopped and only the local phase continues, see [48,Sect. 3.2].
Figure 6 displays the results of the global and local phases.On the graphs, three metrics are shown in function of the budget: number of clusters after the global phase, after reduction and the ratio of covered regions.For a welltuned strategy, we expect the number of clusters to reflect the number of lowlands, especially after the reduction.The higher the region coverage ratio, the better.Unfortunately, the NEA2 features indicated in the previous example make it ineffective at identifying a larger number of lowlandsthe non-separated clusters are merged into a single one.HMS-CMA-ES fares much better in that regard, yielding also a higher region coverage ratio.
We also compare the quality of the lowland shape approximation with different methods: kriging, L 2 -projection and H 1 -projection.In Table 1 we collect Hausdorff distances of the approximations from the exact lowlands.Kriging achieves the best accuracy, with L 2 -projection closely behind.The samples obtained from using NEA2 prevent obtaining good approximations for any approximation algorithm.
For illustration, we present an example of the shape approximations for different approximation and global phase algorithms in Fig. 7.The approximations resulting from HMS-CMA-ES are better than NEA2 in general, and deficiency of the H 1 -projection is evident.

Engineering example
The magneto-telluric (MT) method [10,63] is a geophysical prospecting method which is used to determine the resistivity distribution in the Earth crust.It exploits the telluric currents induced by the solar wind, modeling the electromagnetic field.The underground formation influences these currents, which in turn are measured indirectly by antennae placed at the surface of Earth or its seafloor.This feature makes it possible to invert such measurements to obtain the unknown resistivity distribution.Being clean and inexpensive, it has been applied for different purposes, such as mapping of active faults [29], the study of volcanoes [21], and exploration of offshore hydrocarbons [66].
Figure 8 describes the selected Earth model for the MT problem.The computational domain consists of air and a 1D layered media where a 2D heterogeneity (grey rectangle) is embedded in one of the layers.The blue rectangle corresponds to the natural source in the ionosphere, while the red triangles correspond to the receivers at the Earth's surface.The physical domain is truncated with a Perfectly Matched Layer (PML) complemented with a Dirichlet homogeneous border condition imposed on its outer part, and hpÀFEM solves the problem, see [2] and [57,Sect. 3] for more details.
The task is to identify the resistivities of the subsurface regions presented in Fig. 8.We take the computation domain of the problem D ¼ ½0; 6 4 , to be able to apply HMS.The points from the computation domain are mapped to the resistivities by a function .
The resistivities depend on the type of subsurface region.Example approximate values of resistivities are: water 1 Xm, rock 1000 Xm, oil 20 Xm, shale 5 Xm.
We calculate the misfit for a single wave frequency m 1 ¼ 10 À1:2 Hz.Such a choice ensures sufficient penetration of the waves into the subsurface (in terms of depth) while maintaining resolution, see, e.g.[63].
In this case, in the global phase we only used HMS.For configuration and other details, see [47,Sect. 3.4].
The best misfit values in most of the demes fluctuate around 10 À11 and the median is around 10 À9 , which misfit level is comparable to results obtained when solving similar problem in [57].In Fig. 9, all points from the local phases are shown.An insensitive region in x 4 dimension is clearly visible.Some of the demes returned from the global phase contained individuals, which were within the range 0.3 in the 4D space from the real resistivity parameter vector, which was around the best achieved distance.In particular, the solution with best approximation of the q 3 value, that also had the misfit of a degree 10 À10 , was within 10 À3 of Fig. 6 Metric values for NEA2 and HMS-CMA-ES variants in the second case.The data points are shown for budgets ranging from 2000 to 10000 evaluations.Each point is an average of the metric value for 10 runs of each configuration.The error bars depict the maximum and the minimum value among these 10 runs the real q 3 value.So, the extracted clusters, which we created based on demes, inherited information about the subregion which may contain oil.However, such evaluation is an a posteriori claim.Because of the ill-conditioning, we cannot distinguish which points with good misfit  values correspond to the real resistivity parameters, however, the real parameters can be extracted with further informed analysis of the lowland regions.As a reference solution for the assessment of the local phase, we use an approximation based on evaluations gathered from multiple MT experiments (over 80,000 evaluations with different precision levels, 2000 chosen to cover the domain with the highest available precision solutions).For this test, we have identified 30 individual lowland regions by computing a level set of such an approximation of misfit, one of which contained the real resistivity parameters vector.
The statistics of the approximation errors for all the LBAs are presented in Table 2.Moreover, for each LBA, we calculate two dedicated coverage metrics, shown in Table 3.For each LBA, we determine the points which lie within the obtained lowland approximation (with either L 2 projection, H 1 semi-projection, or kriging), let us call them approximation points.
Firstly, we verified if each LBA covered the real resistivity parameters vector by checking if any of its approximation points lies within the distance 0.3 from the vector.Then, we calculated the percentage of the runs, that had such an LBA, and we show its values under name coverage of real parameters (column 2, Table 3).
For calculating the second metric we use the reference points, which are inside a particular reference lowland.Then, for each reference lowland we calculate how well the approximation points cover it, which is done by checking if a point from the reference lowland lies within the distance 0.1 from an approximation point.The lowland coverage equals to the percent of the reference points covered.The best lowland coverage is then the best coverage from the 30 reference lowlands (column 3, Table 3).
Kriging achieves the best lowland coverage in these results, but the results are too uncertain to make stronger claims.
6 General conclusions 1.The stochastic population-based search with a dynamic sampling measure adaptation, that can be modeled as a stationary Markov chain, might be treated as a machine learning process that gathers more and more information about the problem with each iteration.If the family of such searches has a focusing heuristic (see Definition 6), then we may conjecture, that the maximum information about the problem is contained in the set K of fixed points of heuristic, that are the frequency vectors of the limit populations representing most exhaustive searches (infinite sample after infinite number of steps) (see Remarks 4.1 and 4.2).Notice, that typically K is a singleton.2. The above reasoning shows us, that in order to solve P 1 ; P 2 (see Definition 5) we really expect to obtain a random sample with a probability distribution sufficiently close to at least one fixed point of heuristic, i.e.
falling into e-convex envelope K e of the fixed points, for a sufficiently small e.The analysis of dynamic and asymptotic features of sampling measures, seems to be more adequate in such case, than behavior analysis of single, selected individuals (e.g.best fitted ones) in the consecutive populations, as it is performed in the classical approaches (see Remark 6.2). 3.If the Markov chain modeling the search is ergodic, than it ensures the asymptotic guarantee of success i.e. the well approximation of fixed point of heuristic K e can be reached in a finite number of steps, starting from an arbitrary x 0 2 X l , if l is sufficiently large, so that K e \ X l 6 ¼ ; (see Remark 6.3). 4. If we apply the family of ''well tuned'' searches to solve the ill-conditioned problems P 1 ; P 2 , then we can draw the information about lowlands and minimum Coverage of real parameters is the percentage of the runs that resulted in covering the real resistivity parameters vector.Best lowland coverage metric shows the best achieved reference lowland coverage for each LBA manifolds by the proper post-processing of the limit population x K or the cumulative population 1 K P K t¼1 x t , where K is a sufficiently large number of steps and x K 2 X l for sufficiently large l.Such post-processing may consist in clustering, cluster separation analysis, local fitness approximation, etc. (see Remarks 4.3, 6.1 and 6.2). 5.The possible concept of stopping a stochastic strategy solving such problems is to recognize, whether at least one population vector x t 2 X l falls into the set of states K e .More formally, we may evaluate the proper statistic of the FHT of the set K e \ X l .In particular, FHT expectation might be computed from the linear system (8) (see Remarks 6.4, 6.5).The other, more practical possibility of verifying a stopping condition is to check, whether the consecutive samples form clusters of a sufficiently high quality, i.e. sufficiently dense and well separated from each other (see Remark 6.6).6. Assessing whether the particular family of searches is ''well tuned'' is difficult in the computational practice (see Remark 6.8).Typically, the algorithms with a stronger selection pressure are more likely ''well tuned''.Unfortunately, such algorithms are ineffective in a global search.The possible solution is to use a cascade of stochastic searches, in which the upper ones are designated to global search, while the lowest ones deliver the sample concentrated in the basins of attraction of lowlands or minimum manifolds.Such proposition called HMS/MWEA was presented later in Sect. 4. 7. We have proven that the HMS/MWEA strategy has the asymptotic guarantee of success in the sense that it can reach any of its states in a finite number of steps with positive probability (see Sect. 4).Therefore, we can expect that when run sufficiently long, the strategy will produce populations in leaf demes that will occupy significant parts of interesting regions.However mostly theoretical, this result forms a solid foundation for the confidence in the HMS/MWEA search capabilities.8.In Sect.5, the computational examples display how the HMS/MWEA strategy performs on benchmarks and in an engineering case.The strategy behaves as expected from the previous Markov chain analysis.The search is focused in the vicinity of the lowland regions.Thanks to this, decent lowland shapes approximations are then obtained.We also determine lowland regions of the engineering example's misfit function.This case proves more challenging, nevertheless, correctness of the obtained solution is retained.9. Currently, it is impossible to provide a comprehensive comparison of the presented strategy with respect to state-of-the-arts strategies.The reason is that the aim of our strategy is the insensitivity region approximation, which is illustrated in Sect.5, and possible competitors, like those described in [15,32,38], concentrate on finding the global minimizers or even a single global minimizer.Despite that, in Sect. 5 we compare the performance of our strategy with algorithms that provide partial coverage of its functionality.In particular, we compare its global phase with NEA2 [38] and its local-approximation phase with kriging method [24] (see Figs. 6, 7 and Tables 6-3).

2 .Definition 5
All different minimum manifolds are pairwise disjoint.3. Basins of attractions of two different minimum manifolds are disjoint.4. If two lowlands have disjoint closures then their basins of attractions are also disjoint.Finally, we introduce the ill-conditioned problems which will be studied in the sequel of this paper.Given the admissible domain D and the objective function f find: P 1 Approximation of all lowlands.P 2 Approximation of the central parts of basins of attraction for all minimum manifolds.

Fig. 4
Fig.4The plots of the first and the second benchmark functions in their domains[48, Fig. 1]

Fig. 7 Fig. 9
Fig.7Each graph presents contours of the insensitivity region generated by approximation methods compared to the exact contour[48, Fig. 4].We show L 2 -projection, H 1 -projection and kriging methods Definition 4 (see Definitions 35 and 38 in [30]) 1. Basin of attraction B B y of minimum manifold B y to the function f is the connected part of set x 2 D; f ðxÞ\h y È É \ ðR B y [ B y Þ that includes B y , where h y ¼ infff ðzÞ; z 2 oR B y n oDg. 2. Basin of attraction B P y of lowland P y to the function f is the connected part of set x 2 D; f ðxÞ\h y ¼ infff ðzÞ; z 2 oR P y n oDg.It can be observed, that B y & R B y and P y & R P y .Moreover Definition 3 does not depend on loc method if f is continuously differentiable.The definition of the basin of attraction for an isolated minimizer y can be obtained from the Definition 4.1 for the case B y ¼ fyg.
simply connected, closed set so thatP y & CðP y Þ & B P y , 2. 8z 2 K q z !threshold a.e.inCðP y Þ and q z \threshold a.e. in D n CðP y Þ. Replacing Lowlands f ;D by Mmanifolds f ;D , P y by B y and B P y by B B y in the above definition, we get a similar definition of well tuning with respect to the minimum manifolds.

Table 2
[47,ocal phase test, approximation errors[47, Tab.6] Conflict of interestThe authors declare that they have no conflict of interest.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.