A niching cross-entropy method for multimodal satellite layout optimization design

Satellite layout optimization design (SLOD) relies on solving a high-dimensional and multimodal optimization problem, in which there exist multiple global optimal solutions. Existing algorithms for SLOD focus on seeking only one approximate global optimum. However, finding multiple solutions simultaneously could provide more design diversity for the designers. To alleviate this problem, multimodal optimization method is studied for SLOD in this paper, and an improved niching-based cross-entropy method (INCE) is proposed. INCE consists of an improved niching strategy, cross-entropy method-based offspring generation and a cross operator. CEC2013 benchmarks and satellite layout optimization design problem are investigated to verify the validity and feasibility of the proposed INCE. Compared with several state-of-the-art algorithms, the proposed algorithm performs better.


Introduction
Satellite overall design is a typical multidisciplinary design optimization (MDO) problem [1][2][3]. Satellite layout optimization design (SLOD) is considered as a key step in the satellite MDO [4], which studies how to optimize the place position of satellite objects to ensure meeting the requirement of mission and engineering constraints. Previous work usually takes the moment of inertia as the objective [4]. The smaller moment of inertia means less energy that the whole satellite system consumes. With the increasing of the number of layout components and the shapes of layout components becoming more complex, the solution of SLOD remains challenging due to its high dimensionality and complicated constraints. What is more, it should be noted that SLOD is a typical multimodal optimization problem, which has multiple optimal solutions simultaneously. How to obtain more and better near optimal layout solutions simultaneously over one single optimization in SLOD remains an open problem.
Recently, to obtain near global optimal scheme for designers, much work about the application of meta-heuristic optimization methods has been devoted to solving SLOD [5] problem. For instance, Cagan et al. [6] applied the simulated annealing algorithm to solve the general three-dimensional component layout. Sun and Teng [7] proposed a two-staged layout method, where the ant colony optimization algorithm is adopted to optimize the detailed layout of satellite module. Huo et al. [8] applied a human-guided genetic algorithm to solve the layout design. Zhang et al. [9] developed a hybrid algorithm integrating particle swarm optimization, genetic algorithm and quasi-principal component analysis. Teng et al. [10] adopted a dual-system cooperative coevolution algorithm based on genetic algorithm to overcome the shortcoming of premature convergence. Shi et al. [11] proposed a modified artificial bee colony algorithm with adding a gauss mutation to improve local search ability. A new energy landscape paving heuristic method, as a kind of Monte-Carlo-based global optimization algorithm, is presented for satellite layout optimization design in [12]. Chen et al. [4] proposed a practical satellite layout optimization design approach based on the enhanced finite circle method. These researches could obtain only one global optimal design scheme in one single optimization. However, designers usually need multiple global optimal layout schemes or superior local optimal schemes for the design reference. To alleviate this problem, multimodal optimization for SLOD should be developed.
Unlike many multi-objective optimization algorithms that intend to solve multi-objective problem [13][14][15], the aim of multimodal optimization algorithms is to solve the singleobjective problem with multiple global optima. The process of their solutions mainly includes two steps. The first is to divide the whole population into multiple subpopulations. The second is to solve the optimum for each subpopulation using the appropriate algorithm efficiently.
In the first step, in general, there are mainly three types of methods to divide the population into multiple subpopulation. The first is using complicated models including several sub-models to simulate the problem [16]. For instance, Gaussian mixture model (GMM) is constructed to solve the multimodal optimization problem. However, the estimation of parameters in GMM based on expectation maximization algorithm (EM) needs a large amount of computation [16]. What is more, it is still difficult to construct the proper GMM because the number of Gaussian distributions needs to be pre-determined. The second type is based on clustering strategy such as K-Means [17]. However, in this methodology, the number of clustering is difficult to determine adaptively according to the landscape of problem. It may lead to two problems. One is that the individuals in one cluster may be located around multiple peaks. Another one is that the individuals in multiple clusters may be located around one peak, which brings out more calculation cost. Both of them would result in the decreasing of the efficiency of multimodal optimization. The last is based on niching strategy to divide the sampling populations into subpopulations, which is intrinsically a clustering method based on fitness sharing scheme. So far, many meta-heuristic algorithms are combined with niching techniques to solve multimodal optimization problems [18]. However, niching strategies also suffer from various drawbacks such as the sensitivity of parameter including niche radius and clustering size, inferior performance in irregular multimodal surfaces and reliance in fitness landscapes.
In the second step, after dividing the population into multiple groups using the methods above, multimodal optimization would search for the optimum belonging to each group. So, the high efficiency of the adopted algorithm plays a key role in multimodal optimization. Unlike genetic algorithm (GA) [19,20] and particle swarm optimization (PSO) [21,22], CEM proposed by Rubinstein in 1997 [23] is a kind of probabilistic-based meta-heuristic algorithms, which selects the elite samples to update the probabilistic model. Compared with other algorithms, it possesses faster convergence and less pre-defined parameters. Recently, as a member of the class of meta-heuristic, many improved strategies have been conducted based on traditional CEM for optimization. Generally, current researches mainly focus on the improvement of single optimization methods, which hope to enhance the global search ability of algorithms and decrease the computing cost. For example, Murat and Onur [24] enhanced the efficiency of algorithm in both exploration and exploitation by separating the samples into two parts to calculate the mean value and the standard deviation. P. Lopez and E. Onieva [25] presented a hybrid heuristic of GA and CEM to solve the continuous optimization problem, which show excellent performance for high-dimensionality instances. Dirk P et al. [26] demonstrated the effectiveness of the cross-entropy method for solving difficult continuous multi-extremal optimization problems which still only hopes to find one global optimum. However, there exists no related work to conduct multimodal optimization based on CEM.
Aiming at the multimodal SLOD problem, this paper designs a novel multimodal optimization algorithm called improved niching-based cross-entropy method (INCE). First, aiming at the sensitivity of the parameters in current niching strategies in the first step of the multimodal optimization, an improved radius-adaptive niching strategy based on speciation and crowding clustering is utilized to divide the population. Second, to improve the efficiency of the optimization algorithm in the second step, the improved CEM with elitism strategy and local search is adopted. Third, to ensure the balance of the exploitation ability and the exploration ability of the algorithm, a cross operator between different subpopulations is utilized.
The rest of the paper is structured as follows. In section "Brackground", the satellite layout optimization design problem is presented, wherein the optimization formulation and the constraints are described in detail. Then, the basic concepts of relative multimodal optimization are explained including the niching strategy and classical CEM. In section "Improved niching-based cross-entropy method", the proposed INCE is elaborated with the niching strategy incorporated. In Sect. "CEC2013 benchmark functions", the evaluation criterion of multimodal optimization is discussed. Then, the feasibility and validity of the proposed method are tested on widely used CEC2013 test function set. In section "14-component satellite layout problem", the proposed method is adopted to solve a practical satellite layout optimization design problem successfully. The conclusion is discussed in section "Conclusion".

Mathematical model for SLOD
In this section, a three-dimensional satellite layout optimization design model is constructed [5]. The factors including optimization objective, design variables and constraints are discussed.
The whole mathematical model of SLOD could be expressed as follows: where (θ x , θ y , θ z ) are the inertia angles, N denotes the number of layout components, (δθ x , δθ y , δθ z ) are the corresponding allowable angle errors, J x (X), J y (X), J z (X) are the moments of inertia along the x, y, z axis of the whole satellite. (x c , y c ) represents the real centroid coordinates of system and(x e , y e ) is the expected centroid. (δx e , δy e ) is the maximum permissible centroid error. ΔV i j stands for the overlap volume between the i th object and j th object. More details could be seen in [5]. According to Eq. 1, the SLOD is a single-objective optimization problem with complicated constraints. To handle these six constraints, the penalty function method is adopted to convert the layout optimization problem to an unconstrained optimization problem. The final objective fitness function could be expressed as where ω i stands for the weight factor of the penality function to control the constraints.

Classical cross-entropy algorithm
The process of CEM as an optimization tool could be divided into two steps: 1) Generate random samples according to given probability distribution.
2) Update distribution parameters according to elite samples to get optima.
One advantage of Cross-entropy method is that it could utilize any type of probability distribution according to specific problems. Generally, normal distribution is adopted due to the fast convergence when deviation standard is small. In detail, the solution process of CEM for optimization purposes is presented in Algorithm 1.

Input:
Input: Population P with N members, elite individuals percent ρ , tolerance error ε , distribution parameters (μ 0 , σ 0 ) ; Output: The global optimum y * ; 1 Randomly initialize the population P 2 while termination is not meet do 3 Generate N independent individuals and calculate their fitness value; 4 Sort the whole population in descending order and select ρ N elite individuals; 5 Update distribution parameters according to Eq. 3 and Eq. 4; 6 end 7 Return The global optimum y * where I (·) is defined as Assume maximizing a function S y over a given set Y , denote the only maximum( S (y * ) = max x∈Y S(y))) by y * . CEM regards the solution of optimum as a rare probability event and reaches the global optimum by sampling based on pre-defined distribution. First, initial random samples are generated by normal distribution (μ 0 , σ 0 ), where μ 0 controls the mean value center position of samples and σ 0 controls the boundary of sampling, representing the ability of exploitation and exploration, respectively. Then, we calculate the fitness values S(y i ) and sort them in descending order. γ is the pre-fined threshold to control the selected elite samples.

Niching strategy for multimodal optimization
Generally, to cope with multimodal optimization efficiently, various improved niching methods are proposed and combined with EAs [27]. Currently, speciation and crowding are two main basic niching methods.
where S i is the i th niche, P is the rest population which excludes the individuals in i −1 niches, x j stands for the individuals in P, x seed is regarded as the best individual in current population P and r is the niche radius. D is the dimensionality of design space. dist(a, b) represents for the Euclidean distance between a and b.
In crowding, each generated child is compared with the nearest individual from a crowd formed by randomly selecting K parents in the population. Then if the child is better, it will replace the compared parent. This process is formulated as arg min

Improved niching-based cross-entropy method
Overall, the key step of realizing multimodal optimization faces many challenges. In the first step, the key of variated niching methods is how to divide the populations into multiple subpopulations reasonably. It mainly faces the challenges of determining the parameters of niching methods such as niche radius and maintaining the diversity of the population. Besides, some typology-based methods [28] [29] also bear the large computational burden. In the second step, the choice of the appropriate algorithm has great effects on the performance of multimodal optimization. Its challenges result from adopting more efficient algorithm that possesses fast convergence and high accuracy simultaneously. Confronted with these difficulties above, to improve the effect of CEM in multimodal optimization, we develop an INCE with three improved strategies. First, differing from many multimodal optimization methods based on niching method with randomly selected parameter, an improved twostaged niching method is proposed to realize adaptive-radius clustering. Second, to enhance the efficiency of CEM, a new distribution parameters estimation method with local search such as SQP is conducted during the process of evolution. Third, a cross operator is used to generate new individuals between different niches to enhance the exploration ability of the algorithm.

Input:
Input: Population P with N P members, CEM tolerance error ε, CEM population size n cem , CEM elite sample ratio ρ , elitism archive Ω, iteration number t; Output: Elitism archive Ω including multiple optima; 1 Randomly initialize the population P, t = 0; 2 // Clustering the population into multiple niches 3 Using Algorithm 3 to partition the population into c groups; 4 // Evolution by CEM and find the promising areas 5 if t = 0 then 6 Estimate the distribution parameters σ 0 i (i = 1, 2, ..., c) of each group according to Eq. 10; 7 end 8 t = t + 1; 9 for each group do 10 while σ t i < ε do 11 Calculate the fitness value of each individual;

12
Estimate the distribution parameters μ t and σ t i according to Eq. 9 and Eq. 11, respectively; 13 end 14 end 15 // Local search 16 for each group do 17 Select the best individual s i after the CEM evolution; 18 Apply SQP in the initial value of s i ; 19 end 20 // Elitism strategy and cross operator 21 Combine all the best individuals s i into elitism archive Ω and calculate the size of best individuals n best , then keep the N P best ones; 22 Generate N P − n best individuals according to Eq. 12; 23 Stop if the termination criterion is met. Otherwise go to step 3; 24 Return Elitism archive Ω in including multiple optima The flowchart of INCE is shown in Fig. 2. The entire procedure is summarized as Algorithm 2. After discussing the primary idea behind this paper, the concrete description of each component of the algorithm will be elucidated in the following sections.

The improved two-staged niching method
Generally, speciation-based niching includes two types of cluster methods, which are performed based on two cluster criteria. One is clustering the population based on size, the other is based on niche radius. In the process of clustering, the population would be sorted in order according to their fitness values. Then, size-based speciation would divide them into multiple niches with equal numbers of individuals. Radius-based speciation would divide them based on their similarities according to Eq. 6.
The first strategy could guarantee the fixed size of individuals in each niche. However, it could cause the loss of diversity because two individuals located in different peaks might be divided into a niche due to similar fitness value. In the second strategy, niche radius is the only one parameter which influences the performance of the algorithm. If it is too small, the computational cost will increase and the algorithm will face the difficulty of easily getting trapped in the local optimum region. If it is set to a large number, the performance of the well-designed algorithm for multimodal optimization will become worse and the niche strategy will lose its advantage. To solve it, though some dynamic radius strategies are adopted [30], the sensitivity of parameters remains challenging.
In this section, an improved adaptive-radius niching method is proposed. The main idea includes two stages. In the first stage, a more rational adaptive niche radius strategy is proposed according to the landscape of fitness function. In the second stage, to guarantee the diversity of the population, the numbers of each niche would be adjusted by generating new individuals or eliminating poor individuals.
We take a simple function as an example. The process of the first stage in the improved niching strategy could be illustrated in Fig. 3. In it, A, B, C and D are the points that represent the population (for maximization problem). A represents the best individual in current population, while C stands for the poorest one. Denote x i−1 , x i as B and C, respectively, then the distance between A and C would be selected as the niche radius.

-Second stage: Adjust population based on the equal members
After dividing the population into multiple groups, to improve the balance between exploitation and exploration, a kind of population adjusting strategy is adopted. If a group includes farther more individuals than others, then the poor individuals in this group would be eliminated. Reversely, new individuals in small groups should be generated. Detailed process could be seen in Algorithm 3.

Input:
Input: Population P with N P members; Output: the whole population; 1 // First stage: Divide the population based on adaptive dynamic radius 2 Sort the individuals in descending order according to fitness values; 3 while P is not empty do 4 Sort the individuals in descending order according to fitness values; 5 The best individual is set as a seed and calculate the distances of all the rest individuals and it; 6 Sort the corresponding distance in ascending order and calculate the distance r i of the individual which make the distance shaking firstly with the seed; 7 Select the individuals according to Eq. 6 as the species and delete them from P; 8 end 9 // Second stage: Population adjusting based on the equal members 10 Calculate the number of species and the population size of each species c ; 11 for each species do 12 if the population size is more than c then 13 Sort all the individuals in order and delete the poor individuals; 14 end 15 if the population size is less than c then 16 Select the individual that is farthest with the seed in this species and denote it τ ; 17 while the population size is more than c do 18 Randomly generate new individuals within the radius of the species seed; 19 If the distance of the new individual with the seed is farther than that of τ and the seed, then delete it from the population;

CEM-based offspring generation
After partitioning the population into multiple niches, INCE starts to estimate the probability distribution of each niche. Suppose the population size is N P, and the selected niche radius is r , the total number of niche n niche would be determined according to Eq. 6.
Among the methods of estimating parameters in CEM, Gaussian mixture model (GMM) is adopted to realize multimodal optimization due to the mechanism of including multiple sub-models. However, expectation-maximization algorithm (EM) is widely used to estimate the distribution parameters of GMM for a more precise model, which brings out huge computation cost [16]. In our multimodal optimization, after the whole design space is partitioned multiple subregions, the key step of CEM in each niche is enhancing the efficiency not constructing complex model to approximate the landscapes. Thus, a new estimation strategy of distribution parameters with elitism strategy and local search is proposed.
The estimation of the other parameters is discussed as follows in detail.
A) The estimation of μ i In most improved strategies in CEM, the updating mechanism of μ i is calculating the mean value vector of selected individual. For complicated objective functions in single optimization, the scheme could have better global search ability because it merges different individual information though sometimes converging slowly. However, the search spaces after dividing the population into many niches are relatively narrow in multimodal optimization. To enhance the convergence speed, the position of the best individual is selected as μ i , given by where x i,best is denoted as the best individual in i th niche. By this scheme, the center of sampling could be shifted rapidly to detect more promising area only if one individual far from the μ i is found better.
B) The estimation of σ i In CEM, the standard deviation σ i plays a vital role in the exploitation and exploration of the algorithm. If σ i is set as a large number, algorithm could search more design space. Conversely, small σ i could provide strong exploitation ability.
-The estimation of the initial σ 0 i After the first clustering by improved niching method, each niche could only exist several individuals, which could not give enough valid standard deviation information. To solve it at first iteration, a sigma coefficient α related to the upper bound and lower bound is adopted to control the distribution of the initial population to sample enough individuals.
-The estimation of the σ 0 CEM is a kind of evolutionary algorithm based on Monte-Carlo technique, which samples in whole design space. So, the population size of CEM is usually set to large such 2000. However, the feasible solution space is divided into many sub-spaces in INCE, whose corresponding search area is relatively narrow. To save computational resource, generating less individuals such as 50 could still meet the purpose of optimizing in each niche. To provide enough sample information, all the individuals in the niche would be utilized to calculate the standard deviation, given by where i represents all the individuals in niche, n i stands for the size of population in i th niche. C) Local search based on SQP Some current reports prove that the combination of local search and meta-heuristic algorithms could obviously enhance the performance of meat-heuristic algorithms. Due to the stochastic process in meta-heuristic algorithms, the robustness and accuracy are inferior to the gradient-based method such as SQP. Local search such as SQP also faces the sensitivity of the position of the selected initial individual. Combing the gradient-based method and the meta-heuristic algorithm could solve the dilemma of them, which means the appropriate initial individual could be provided by the process of heuristic process.
The framework of INCE enhances the diversity of the population by sampling in multiple groups. In this way, multiple promising areas covering global optimum could be found. However, on one hand, due to the less populations in each group compared with CEM, INCE does not have strong exploitation ability even though having detecting the global optimum region. On the other hand, pursuing the high accuracy of computation would result in the loss of population diversity. To solve this problem, SQP as a kind of local search is utilized in INCE, which is used to search local optima when the appropriate initial point is given by INCE.

Cross operator
The performance of clustering strategy has great influence on the number of global peaks searched by the multimodal optimization algorithm. What is more, the diversity of the population gets decreased greatly within the evolutionary process. To overcome the loss of the diversity to some extent, a cross operator between the best individuals in each niche is proposed after evolutionary process by CEM. In detail, denote the size of the best individuals in each niche n best , to keep the total population size N P , then the rest N P − n best individuals would be generated according to Eq. 12.
where x d j , x d k represent the best individuals of the j th , k th niche. x d i stands for the new generated individual by cross operator. D is the dimension of the design variables.

CEC2013 benchmark functions
To verify the validation and practicability of INCE proposed in this paper, we compare it with speciation-based DE (SDE) [31], crowding-based DE(CDE) [32] and multimodal estimation of distribution algorithm (LMSEDA) [27] on CEC'2013 multimodal function set containing 20 test functions. There exists mass of local optima around the global optimum in these test functions (Fig. 4). The peak information of the test function set could be seen in [33]. Generally, the evaluation of the multimodal algorithms mainly includes four criteria, namely the Average Number of Function evaluation (ANF), the Peak Rate (PR), the Success Rate (SR) and the Average Distance (ADC) [34].
PR measures the average percentage of all known global optima found over multiple runs.
where P R denotes the number of global optima obtained at the end of the i th run, N K P is the number of known global optima, and N R is the number of runs. SR measures the percentage of successful runs (a successful run is defined as a run where all known global optima are found) out of all runs.
where N S R denotes the number of successful runs. ANF measures the average computational cost over multiple runs.
where NU M i denotes the number of function calculation at the end of the i th run. ADC measures the average computational accuracy over multiple runs.
where f (x * ) denotes the actual optima and f (x opt i ) denotes the optima found by optimization method at the end of the i th run.
To make fair comparisons, the initial population sizes are all set as the same for four algorithms. The maximum number of fitness evaluations (Max_Fes) for all the algorithms are set the same as the reports [32,34]. In INCE, ρ is set as 0.1. When the procedure is stopped, the number of function evaluation is recorded. In this paper, 1E-5 is chosen as the accuracy level, which means that an optimum is considered to be found if a solution with the distance below 1E-5 to actual optima. The results are averaged over 30 independent runs of each method in our experiments. Other parameters are illustrated in Table  2.

Accuracy and multimodality
The obtained statistical results by different multimodal optimization algorithms are presented in Table 1. The '+' represents the number of the best results among 4 algorithms on 20 test functions. The best ADC, ANF, SR and PR are highlighted in bold. In addition, to have a better view of the performance of INCE, we visualize the final fitness landscape of multiple global optimum selected from the final population when algorithm terminates on nine visual functions, which are shown in Fig. 5. Red points stand for the optima found by INCE (Table 2).
Observing Table 1 and Fig. 5, we can draw the following conclusions: From the perspective of computational accuracy, we can see that the performance of INCE for all test functions is significantly superior to CDE and SDE. In detail, the average value of ADC over 30 runs could reach over E-14, while that of SDE and CDE could only reach E-4. It mainly results from the local search of SQP in INCE.
From the perspective of the PR value, which is the most important criterion to evaluate the performance of the multimodal optimization algorithms. Overall, we can see that INCE could achieve the best performance over 19 test functions; while, LMSEDA, SDE and CDE could only reach 10, 4 and 4, respectively. F1, F2, F3, F4 and F5 are all low-dimensional functions, and four algorithms all seek all optimum. However, with the increasing of the number of the global optimum, other algorithms such as SDE and CDE are

Effect of parameters of CEM (population size, sigma coefficient and tolerance error)
To observe the sensitivity and robustness of the different parameters of CEM, we conduct numerical experiments on these 20 test functions over 30 runs by changing three parameters including population size, sigma coefficient and tolerance error. When the parameter is changed, other parameters are set the same as Table 1. Differently, the PR value is selected as the only one criterion. In detail, the population sizes are set as 20, 30, 50 and 100 respectively. The sigma coefficients are set as 1/10,1/20,1/30 and 1/50. The tolerance errors are set as 0.001,0.01,0.05 and 0.1. The results in Table  1 are used to make some comparisons, which is highlighted in italic in Table 3. Other statistical results are shown in Table  3.
From Table 3, we could obtain these conclusions: -The effect of the population size of CEM  From the left part, it could be seen that the population size of CEM is important for the performance of INCE. As for low-dimensional test functions, it would be better to select a small population size. As for high-dimensional test functions, it would be better to select a big population size. It is because with more individuals, INCE could possess better search ability due to the population-based characteristic.
-The effect of the sigma coefficient of CEM From the middle part, we could know that the coefficient does not influence the performance of INCE greatly. The algo-rithms with different coefficients obtain similar PR values. The size of the coefficient is helpful to ensure the balance the exploration ability and exploitation ability of INCE. If it is small, it could possess better exploitation ability. Conversely, it has better exploration ability if the coefficient is large.
-The effect of the tolerance error of CEM From the right part, we can conclude that the tolerance error of CEM influences the performance of INCE greatly. With the increasement of the dimension of the problem, the smaller tolerance error should be selected to improve the converging process.

Effect of local search
Similarly, to observe the influence of SQP to the performance of algorithm, we also conduct some experiments on 20 test functions by single CEM with SQP and without it. The tolerance errors of F1-F17 and F17-F20 are set as 0.001 and 0.0001, respectively. The corresponding population sizes of CEM are set as 400 and 800, respectively. The statistical results over 20 runs are shown in Table 4. From Table 4, we could see that SQP is helpful to enhance the exploitation ability of the multimodal optimization algorithm, particularly in the low-dimensional objective functions. Apart from it, local search in multimodal optimization also accelerates the evolutionary process, which means more computation resources could be saved to detect other areas. Without SQP, the enhancement of the search accuracy of CEM needs the high-leveled tolerance error (ε) to control, which brings out more computation cost compared with SQP. Though the effect of SQP is not apparent in high-dimensional test functions, it also helps to improve the accuracy of the obtained solution.

Effect of cross operator for enhancing diversity
Furthermore, to study the influence of the cross operator to the performance of the algorithm, we also conduct the numerical experiments without the cross operator to make a comparison with INCE. Other parameters are all set as the same as section "CEC2013 benchmark functions". The statistical results over 30 runs are shown in Table 5.
From Table 5, except F1-F5 and F9, the cross operator is helpful to enhance the exploration ability of the algorithm. Both of them on F1-F5 all obtain all global optima due to the briefness of these test functions. However, in other test functions such as F12, the cross operator could help to detect other areas to locate more optimum, the PR value of which without cross operator could only reach 0.688. Though the niching strategy is used to cluster the population again, the diversity of that still decreases rapidly. The cross operator helps to enhance the diversity of the population, which is the key of multimodal optimization.

Discussion about the improved two-staged niching method
The method of estimating the radius r is important in multimodal optimization. Compared with previous work, the criteria of determining a peak in INCE is simple but efficient. It is better that the number of groups after clustering is lightly more than the real number of global optima of the optimization problem. In our experiments, the number of groups after clustering by our method is about 1.5-2 times that of practical optima.
The proposed niching method can also be combined with other heuristic algorithms. Apart from the adaptive clustering strategy, the second stage to adjust the number of the population is also adopted. It can help to improve the balance of exploitation and exploration ability of the algorithm. After the first stage, some sharp areas which have a global optimum may do not have enough individuals. Then this method could generate more individuals around it to seek the optimum. Conversely, if the broad areas which have a global optimum have too many individuals, we could delete some poor individuals to reduce the calculation cost. It could be easily combined with other heuristic algorithm to realize the purpose of multimodal optimization. It could still be improved such as merging some clustering according to some metrics. Then, it may be more efficient.

14-component satellite layout problem
In this section, the performance of INCE is tested on a 14components layout optimization problem. Combined with the SLOD model introduced in section "Mathematical model for SLOD", the case is to optimize the positions of 14 cylinder-components placed in the simplified cylinder satellite module. Enhanced DE is developed to solve it [5]. However, it could not give multiple optimum schemes simultaneously. To demonstrate the validity of INCE proposed in SLOD, we conduct numerical experiments on them to intend to offer more design schemes.

Problem description and parameters setting
In this problem, 14 cylinder components need to be installed on the two surfaces of the bearing plate in the simplified satellite module (see Fig. 1). The detailed data information is illustrated in Table 8. The left 7 components are installed on the up surface of the bearing plate. while the right 7 com-ponents are installed on the down surface of that. All the components are of the same height as 10 mm and the radius of the satellite module is 50 mm. Combined with Eq. 1, the number of layout component N = 14. Thus, the dimension of design variable of the optimization problem is 28. Denote the position of the i th component as X i = (x i , y i ), then the constraint −50 mm < x i < 50 mm, −50 mm < y i < 50 mm must be meeted. Apart from it, the constraints of non-overlap between the different layout components also need to be meeted.
In addition, the given relevant allowable values of parameters related with constraints are set as δx e = δ y e = 3 mm, δθ x = δθ y = δθ z = 0.3 rad .The violation in any constraints in the engineering problem is applied via penalty function to the objective of problem to handle inequality and equality constraints, which has been illustrated in Sect. 2.1. The six weight factors of the penalty function are set as: w i = 1000(i = 1, 2...6). The population size, elite sample percent, sigma coefficient and tolerance error are set as 400, 0.1, 1/10 and 0.0001, respectively.

Comparison with other existing global optimization algorithms
To testify the global search ability of INCE in SLOD, we compare it with four other global optimization algorithms. Thses algorithms mainly include PSO, PSOSQP, DESQP, DESQPDE, CEM and CEMSQP. Among them, DESQP, DESQPDE and PSOSQP have been adopted to solve SLOD in previous work [4,5]. We also use CEM and CEMSQP to make a comparison to illustrate the improvement of CEM by us. Detailed parameters setting for DESQP could be seen in [5]. The population size of pure CEM and CEMSQP is set as 1000. The elite sample percentages are all set as 0.1, which is the same as INCE. All of these algorithms are averaged over 50 runs. Statistical report comparing INCE with them in solving this problem which obtained the state-of-the-art optimal design in similar methods is presented in Table 6. The final four rows present the best near optimal layout schemes of four algorithms. We also select three superior near optimal layout schemes obtained by INCE, which is illustrated in the first three rows. The boxplot of the statistical results of these algorithms is also presented in Fig. 6. From Fig. 6, we could clearly see that INCE is better than other algorithms. The moment of inertia of the seached near optimal scheme could reach below 2.2kg · m 2 . The current state-of-art result is above 2.3 kg m 2 , which is reported in [5]. What's more, almost all of runs of INCE could obtain the similar result that is below 2.3 kg m 2 . We present the iteration history of INCE, PSOSQP and DESQPDE in one single optimization, which is shown in Fig. 7. PSOSQP and DESQPDE are two superior algorithms to solve SLOD, which present the state-of-the-art performance in previous work. In one random experiment in Fig. 7, we could see that INCE possesses faster convergence speed. Futhermore, INCE could find better optimal layout scheme compared with these two algorithms. The final fitness values obtained by INCE, PSOSQP and DESQPDE are 3.012 kg m 2 , 2.451 kg m 2 and 2.223 kg m 2 respectively. From the results, we could see that INCE outperforms DESQPDE not only in the number of obtained optima but also the obtained best layout scheme. In addition, the corresponding best layout schemes of INCE and DESQPDE are shown in Table 9, Figs. 10 and 11.
Apart from comparing the best optimal scheme optimized by them, the robustness of these algorithms is a also important. Due to the strong and complex constraints of SLOD, the optimized layout scheme may do not meet the constraint. In our statistical result, we also record the success rate in Table  7. From  6. From Table 7, we also observe that the standard deviations of centroid offset and inertia angles offsets by INCE are not  smaller than other algorithms. It also proves the diversity of obtained optimal schemes by multimodal optimization to some extent (Table 10). To study the influence of hyper-parameters on the performance of INCE in SLOD, we also change the population size of CEM and the tolerance error, which is presented in Figs. 8 and 9. From Fig. 8, we could see that with the increasing of the population size of CEM in INCE, the global search ability and robustness are both enhanced. From Fig. 8, we could see that there exists one optimal tolerance error value to make the performance of INCE the best in SLOD. When applied to solve the engineering problem, INCE with the tolerance error 0.01 and 0.0001 both present the superior performance. This conclusion is corresponded with that in multimodal test functions.

Comparison with two other existing multimodal optimization algorithms
To testify the multimodal optimization ability of our proposed INCE, we also compare it with SDE and CDE. In the real-world engineering problem, a level regarded as the optimum is determined, which means the objective value below The level regarded as the optimum that would be regarded as the optimum. It is because that the landscape of SLOD is not like the test function, which has multiple absolutely equal optima. It is also why we need to test the performance of our proposed method in multimodal test functions before solving directly SLOD. First, the relationship between the population size of CEM in INCE and the multimodal performance is investigated, which is presented in Table 10 and Fig. 12. Different from the study in Sect. 4.2.2, we just run the algorithms for once time and evaluate the number of obtained near optimal layout schemes. We change the population size in CEM of INCE as 50, 100, 200 and 400, respectively. to solve this 14-component layout optimization problem. Four level values including 2.6kg ·m 2 , 2.7kg ·m 2 , 2.8kg ·m 2 and 2.9kg ·m 2 are selected, respectively. From the results, we could observe that INCE could obtain competitive multimodal performance when the population size of CEM is set as 200 or 400. When 2.7kg · m 2 is regarded as the level of the optimum, they could obtain 114 and 147 different layout schemes in one single optimization, respectively. Similar to the conclusion in Sect. 4.2.2, with the increasing number of population size, the multimodal optimization ability of INCE also gets enhanced.
We also implement two other multimodal optimization algorithms including CDE and SDE to solve SLOD to make a comparison with INCE. The population sizes of CDE and SDE are both set as 400. The numbers of objective function evaluation for these three algorithms are all set as 100000. The r in SDE is set as 150. The K in CDE is set as 5. The result obtained by them is listed in Table 11. From Table 11, we could see that INCE outperforms CDE and SDE not only in the obtained best layout scheme but also the number of obtained optima. Though SDE and CDE could also obtain multiple near optimal schemes simultaneously, the number of that is relatively less than INCE. It results from that the setting of r or K in them is difficult to determine to some extent. INCE could obtain 147 layout schemes even though the level regarded as the optimum is set as only 2.5kg · m 2 . Thus our proposed method could provide more and better layout schemes for designers in one single optimization.

Discussion about the performance of INCE in solving SLOD
With the increasing of number of layout components, the solving of SLOD becomes more complex due to the increasing of dimensionality. The search space is also larger. Thus, existing global optimization algorithms of solving SLOD are easily trapped in local optimum such as the results presented in Figs. 6 and 7. The proposed INCE in this paper divides the search space by improved clustering strategy. Then, combined with CEMbased offspring generation and local search, the algorithm iteratively search multiple optima in sub-space. Due to the schemes of clustering the population and reproducing new individuals by estimating the distribution parameters, the whole algorithm possess better balance between the exploitation and exploration. The above statistical result also proves that INCE is farther better than exsiting algorithms to solve the same layout optimization problem including the global optimization ability and multimodal optimization ability.

Conclusion
In this paper, an improved niching-based cross-entropy method is proposed for multimodal satellite layout optimization design. Unlike traditional optimization methods in SLOD that only could obtain one near optimal scheme, the proposed INCE could achieve multiple near optimal schemes simultaneously in single optimization. In this method, aiming at the two key steps in multimodal optimization, three improved strategies are proposed. First, an improved twostaged niching scheme is proposed to conduct adaptiveradius clustering on the population. Besides, to enhance the efficiency of CEM, a new parameters estimation method with elite strategy and local search is adopted, which helps to search the optimum more fastly compared with traditional methods. Finally, a cross operator between different niches is proposed to enhance the diversity of population. To verify the effectiveness of the proposed INCE, we conduct numerical experiments on CEC2013 multimodal functions test set and a practical layout optimization design problem. INCE achieves the best performance in 19 multimodal test functions compared with three multimodal algorithms. When applied to a 14-components layout problem, INCE could obtain the stateof-art optimal design, the moments of inertia of which could reach 2.1471kg·m 2 . INCE could still obtain nearly 150 different design schemes when the level regarded as the optimum is 2.5kg · m 2 . Thus, the feasibility and validity of the proposed INCE are testified in test functions and layout design engineering problems.