A hybrid multi-objective bi-level interactive fuzzy programming method for solving ECM-DWTA problem

Electronic countermeasure (ECM) has become one of the most significant factors in modern warfare, in the course of combat, the electronic jamming allocation tasks need to be flexibly adjusted with the change of combat stage, which puts forward higher requirements for the modeling and solution method of this kind of problems. To solve the ECM dynamic weapon target assignment (ECM-DWTA) problem, a hybrid multi-target bi-level programming model is established. The upper level takes the sum of the electronic jamming effects in the whole combat stage as an optimization objective, and locally optimizes the ECM weapon (ECM-WP) assignment scheme in each stage. The lower level takes the importance expectation value of the target subjected to interference and combat consumption as double optimization objectives to globally optimize the ECM-WP assignment scheme. Focus on solving this complex model, a hybrid multi-objective bi-level interactive fuzzy programming algorithm (HMOBIF) is proposed, in this method, exponential membership function is used to describe the satisfaction degree of each level. When solving the multi-objective optimization problem composed of membership functions in the upper and lower levels, we use the MOEA/D algorithm to obtain the Pareto Front (PF) solution set, and then each solution in PF is evaluated and selected by the TOPSIS multi-criteria evaluation method. This local and global interactive optimization process of bi-level model is actually the process of executing observation-orientation-decision-action loop in practical combat. According to the current example, we conduct numerical simulation on the parameters in the model and obtain the parameter values suitable for the model solution. The computational experiments on different scale ECM-DWTA problems show that HMOBIF method is superior to four bi-level programming algorithms in terms of performance index, and can better solve ECM-DWTA problems.


Introduction
The dynamic assignment of jamming task in ECMs is an important problem to be solved and optimized urgently in the intelligent modern warfare. Dynamic allocation of electronic interference can be regarded as an extension of the DWTA problem. The main purpose of the DWTA problem is to allocate our limited weapon resources to different kinds of targets according to the change of combat stages, differ-between each two operational actions (stages). The considering of building this kind of model is generally based on the state of combat targets and operational progresses. By observing the state of combat targets, commanders can determine the distribution scheme of our army's weapon at different times. Hocaolu [1] established a nonlinear target programming model, and according to this model, the simplex method was used to solve it. With the development of intelligent algorithms, many scholars have applied a variety of intelligent algorithms to solve DWTA problems. Lai et al. [2] and Liu et al. [3] used the improved particle swarm optimization algorithm (PSO) to solve the "shoot-look-shoot" problem model. Zhao et al. [4] and Chen et al. [5] used the improved genetic algorithm (GA) to solve the first and second problems, respectively. However, in a certain operational scenario, for example, we need to carry out two attacks on enemy targets, after the first attack, it is necessary to judge the damage degree of enemy targets, and then determine the targets which need the second attack. 2. "Multi-stage" dynamic model. The biggest difference between this model and the "shoot-look-shoot" model lies in the handling of combat targets. The previous model observed all enemy targets repeatedly until the target damage reached the expected level. Xu et al. [6] solves the multi-objective programming model using MOEA/D, taking into account the stochastic nature of the combat mission and the synergy effect of the combat objects when building the DWTA model. Aiming at the slow convergence rate and the low search efficiency in solving sensor-weapon-target assignment problem [7,8] propose an improved genetic algorithm with a new initialization method utilizing rule-based heuristic factors. To this end, according to the defense area analysis Jia et al. [9] propose a network security situation awareness method using stochastic game in cloud computing environment, uses the utility of both sides of the game to quantify the network security situation value. But in this model, threat assessment can only be carried out on the number and type of enemy targets before each combat stage without the observational decision. It is necessary to distribute our weapons according to the threat levels of enemy targets at different times. 3. "Dynamic game" model. At present, there are few studies on this kind of model. Such model treat the use of weapons by both sides as chips in game theory, and then allocate weapons according to the course of battle. The main idea of modeling is to calculate the operational benefits of both sides' weapons and various target information, and solve the Nash equilibrium solution according to the operational benefits at different operational moments, to obtain the distribution results of our weapon target in the equilibrium solution. For example, Zhang [10] models the game allocation during air-ground combat, Ma et al. [11] discuss how to use game theory for the search of allocation space solutions, and Zhang et al. [12] draw on grey theory to further discuss the game solution for DWTA.
To better solve the above-mentioned various types of DWTA problem model, many optimization methods (i.e., approximation algorithms, heuristic algorithms, machine learning algorithms, etc.) have been widely applied. Operations research methods are the basic means to solve planning problems. Uhm et al. [13] modeled the weapon target assignment problem for missile intercept defense and improved the branch and bound method for integer nonlinear programming by using a simulated annealing algorithm, which made the problem well solved. Shin et al. [14] used mixed-integer linear programming (MILP) method to solve the weapon target assignment model of ground-based missiles striking ground targets, which reduces the operational time window. Aiming at the problem of weapon target allocation for air defense attack targets, Sonu [15] and Jiang et al. [16] used the modified crow search algorithm (MCSA) and dynamic allocation auction algorithm to solve the "multi-stage" DWTA model in air defense operations. Compared with all kinds of intelligent algorithms, heuristic algorithms may get better conclusions when solving some specific problems. Zhang et al. [17] take the statistical marginal revenue of both sides as the revenue value of decision-making and carry out the final dynamic weapon allocation through the revenue rolling in different operational stages. To solve the "multi-stage" problem, Wang et al. [18] proposed a knowledge-based incremental constructive heuristic algorithm and compared it with the random sampling algorithm based on random arrangement, which showed its superiority. Also, due to the optimization objectives are not fully considered in the construction of many models, in many models, multi-objective evolutionary optimization algorithms (MOEA) are adopted, such as MOEA/D [7,19], NSGA-II [20], and C-TAEA [21], and good solution conclusion is obtained. In recent years, the great development of artificial intelligence algorithms have inspired many scholars to study DWTA. Qu et al. [22] established the combat task allocation network using the network optimization theory and solved this problem using the distributed intelligent agent model. Shojaeifard et al. [23] transformed the constraints of the DWTA model appropriately, and successfully obtained the optimal weapon allocation scheme using recurrent neural network (RNN) models. Jiang et al. [24] established an Adaptive Weapon-to-Target Assignment (AWTA) model that can adaptively predict the hit probability of an intercept missile and uses the framework of reinforcement learning algorithm (RL) to solve the model.
The above models and algorithms have made great contributions to solve the DWTA problems, but also have the following problems: 1. In the process of modeling, the operational command process is not considered. The combat command process is an "observe-oriented-declare-act" (OODA) loop [17,18].
In the "shoot-look-shoot" model only the "observeoriented" stages are considered, and further modifications to the model are required. 2. For the above model, the objective function is relatively simple when considering the combat effect, and the multiobjective function is not comprehensively considered from the operational "efficiency-cost ratio". 3. ECM-DWTA, as one of the DWTA problems [25][26][27], has not been studied deeply and effectively enough regarding the modeling and solving of this problem.
In this paper, based on summarizing the advantages and disadvantages of the existing DWTA problem models as well as its solving methods, combined with the combat actual background of electronic interference task allocation, we propose the allocation scheme of the actual electronic interference dynamic problem as the new model of DWTA problems. A hybrid multi-objective bi-level interactive fuzzy programming method is established to solve this problem, and finally, the simulation results are discussed with some examples. The main motivations and contributions of this paper are as follows.
1. The studying of ECM-DWTA problem so far has been modeled with a lack of consideration of OODA loops in actual operations currently. Therefore, in the modeling process of electronic jamming dynamic allocation, the OODA on the operational process and the operational effect is fully considered, and bi-level programming is used to describe the command decision. The upper level model corresponds to the global optimization process of combat, and the lower level model corresponds to the local optimization process of each combat stage. 2. Given that there is little simultaneous consideration of operational efficiency and operational cost in the existing ECM-DWTA problem, we fully consider the "efficiencycost ratio" of electronic jamming, and use MOEA/D to solve the model in each operational stage and the whole operational process. When selecting individuals from the solution set of multi-objective model, we propose a MOEA/D-TOPSIS method to solve it. 3. Aiming at the ECM-DWTA model we rebuild, a hybrid multi-objective bi-level interactive fuzzy programming method (HMOBIF) is proposed, and we improve the traditional construction method of satisfaction score function. Through satisfaction iteration among different levels, the subjective judgment of the commander can be added into the process of combat assignment to a certain extent. 4. In our proposed HMOBIF algorithm for solving multilevel programming problem, we borrow the idea of fuzzy programming method and use the decision fuzzy number of decision-makers to define the satisfaction function values in each levels, and then find the optimal solution that meets the requirements through the transfer of satisfaction values.
The remainder of this paper is arranged as follows. "Problem formulation" briefly introduces the background of the dynamic distribution of electronic countermeasure interference and formulates the mathematical model for ECM-DWTA. "Model solution based on hybrid multi-objective bi-level interactive fuzzy algorithm (HMOBIF) discusses the detailed process of solving the model and HMOBIF is proposed. The test problems are described, and the numerical results are discussed in "Experiment results and discussion", and "Conclusions and future works" offers conclusions.

Background of problem
The ECM jamming combat scenario considered in this paper is narrated as follows. We are the electronic attacker, and the enemy is the defensive. The defender has J weapon platforms to defend an asset. They are connected by communication networks and can combine to finish a task collaboratively, each communication link is regarded as a combat target. The attacker has a total of I electronic jamming equipment including five types of combat equipment (i.e., air-based radar, land-based radar, land-based communication, air-based communication, and land-based electro-optical countermeasures equipment), which may interfere with the enemy electronic targets. As shown in Fig. 1, The combat start time is t = 0, the commander will distribute weapons based on the degree of threat of the target. During the enemy finishes the combat, we have a time interval to interfere with these targets, which can be divided into several stages with a fixed length. A stage is the minimum combat time unit. Assume that there are T stages. In each stage, we need to allocate weapons according to the real-time threat levels of enemy electronic targets and the war damage of our ECM equipment.
From the background, it can be seen that with the changes of combat stages, the assignment of electronic jamming is carried out in the OODA loop. We are inspired by this. If the dynamic jamming assignment of ECM (DJA-ECM) can be closely combined with the OODA loop, the solution set of this problem will be closer to the practical combat. Therefore, in this paper, we build up the relationship between different level of the bi-level programming model and the OODA loop framework, which can be represented by  The scheme of transfers between stages is denoted by x i j (t) (i = 1, 2, · · · , I , j = 1, 2, · · · , J , t = 1, 2, · · · , T ), and the scheme of ECM-WPs is denoted by x i j (t + 1). For ease of reading, Table 1 summarizes all the notations that are used throughout this paper.

Upper level model
(1) Objective. The purpose of the upper level model is to complete the optimal distribution of weapons in each combat stage, and its optimization objective is a single-objective function stands for the sum of jamming effects for each combat stage in the optimal scheme. Therefore, the objective function of the upper level model is expressed as where p i j (t) is used to denote the interference benefit of the ith ECM-WP interfere with the jth target at stage t,   The distance between the two sides, the minimum and maximum distance that can implement effective interference; angle min(max) The angle between the two sides, the minimum and maximum angle that can implement effective interference; f Hi( j) , f Li( j) The minimum and maximum frequency of effective interference and operational working frequency band; The adaptive parameter of space for the ith and angle ∈ angle min , angle max , 0 otherwise; The adaptive parameter of frequency for the ith x i j (t) = 1 if the distribution targets for ith ECM-WP has no change, 0 otherwise; = 1, and 0 otherwise combining the indicator with the jamming-to-signal ratio K , which is given by (2) Constraints.
The constraint (3) represents the jamming effect of each combat stage must be greater than the specified effective suppression coefficient K a . Constraint (4) represents that the number of ECM-WP needs to be larger than the number of the opponent's jamming targets. Constraint (5), on the other hand, indicates that the limit on the number of jamming targets per ECM-WP equipment in a single operational phase needs to be less than or equal to a. The following two constraints limit the maximum number of electronic targets ECM-WP can interfere in a single-stage, respectively. It depends on the combat ability of the ith ECM-WP. A platform with multi-objective combat ability can be regarded as multi-platforms with single-objective combat ability. The constraint (6) represent the scheme of ECM-WPs.

Lower level model
(1) Objective The purpose of the lower level model is to complete the optimal distribution of weapons during the whole combat stages, and its optimization objective is double-objective functions stand for the importance expectation value of the target subjected to interference and combat consumption, respectively. The importance expectation value of the target subjected to interference is expressed as where V (t) is used to denote the total importance expectation value for stage t, which is given by where α j (t) represents the interference importance expectation value coefficient of the jth target at stage t , S j (t) represents the operating state coefficient of the jth target at stage t, P r j S j (t) = k is used to denote the probability of effectively interfering with the jth target in case S j (t) = k, which is given by Then, the operating consumption during the whole operation process is expressed as (2) Constraints.
The constraint (11) represents the scheme of ECM-WPs that should meet the adaptive parameter of overall interference for the ith ECM-WP interfere with the jth target at stage t, which is given by where C time represent the adaptive parameter of time, space and frequency for the ith ECM-WP interfere with the jth target at stage t, respectively. x i j (t) is reasonable if and only if C sum i j (t) = 1.

Hybrid multi-objective bi-level model
The hybrid multi-objective bi-level model is given as follows: Upper level Lower level

Model solution based on hybrid multi-objective bi-level interactive fuzzy algorithm (HMOBIF)
The basic DWTA model and the multi-level programming model studied in this paper have been proved to be NP-hard [28,29], there is no polynomial-time algorithm to obtain the exact solution. At present, there are several methods to solve bi-level programming problems, such as Kth-Best algorithm [30], branch and bound algorithm [31], MPECs algorithm [32], descent algorithm [33], trust region algorithm [34], intelligent optimization algorithm [35] and fuzzy interactive algorithm [36], etc. When solving a bi-level programming model, it is very essential to find an equilibrium point in the conflict between the upper and lower levels of solutions, and the obtained solution is called the Stackelberg-Nash equilibrium solution [37]. On the other hand, the transfer of satisfaction values between levels in bi-level programming requires the existence of a metric, and we adopt the idea of fuzzy logic to use an appropriate membership function for the satisfaction value representation, which further leads to an optimal solution that satisfies the requirements. In current research, fuzzy logic has made full progress in the process of solving many optimization problems [38,39]. For example, using type-2 logic combined with heuristic algorithms for representation and solution of optimization problems [40], using fuzzy logic combined with neural networks for optimization and prediction of models [41], etc. Inspired by the above kinds of approaches, we explore the use of bi-level programming for solving the ECM-DWTA problem. In the following, when solving the two-level model, the satisfaction function is used as the interaction factor between the two levels, through the satisfaction calculation between them, the solution result of the final model is obtained. In addition, because of the lower level is a multi-objective optimization model, we propose the MOED/D-TOPSIS algorithm to solve the optimal value in the solution set of the satisfaction.

Definition of satisfaction function
To better describe the satisfaction of each level and make the decision-maker's subjective decision more closely combined with the objective calculation results, we borrowed the idea of fuzzy programming [27,42] in the description process, using different membership functional to map the satisfaction values of the optimization objectives of each level, thus adding the fuzzy programming method to the framework of bi-level programming, making the whole algorithm process more in line with the actual requirements of combat. When calculating the degree of satisfaction, the membership function corresponding to the model index should be determined first. In the previous bi-level programming model, many scholars have used the linear, hyperbolic and parabolic membership functions [43]. In the actual combat process, the combat factors related to ECM can generally have a great impact on the combat results, so, in this paper, we will use the exponential membership function to describe satisfaction. Let X be the solution set of the ECM-DWTA model, the satisfaction of the upper and lower model are δ U and δ L , Z k (x) (k = 1, 2, 3) represent the ECM-DWTA problem model's object functions, and represent the upper and lower levels of the symbol in this paper, respectively. We use the degree of satisfaction solved by the membership function to express the degree of acceptance of the decision-maker, the expression is Invert the membership function, and the satisfaction degree of the inverse function represented the rejection degree of the decision-maker, the expression is where In the above two formulas, s is a constant, where U l , L l denote the maximum and minimum In summary, the satisfaction calculation formula is where β l is used to denote the fuzzy coefficient of satisfaction, represents the decision preference, β l ∈ [0, 1], if β l = 0, the satisfaction was biased towards acceptance; if β l ∈ (0, 1), the satisfaction was biased towards the degree of rejection; β l = 1 indicates that the degree of acceptance is as important as that of rejection. In this way, according to the commander's decision preference, the satisfaction function of each level can be calculated concretely through the above formula.

Sum approach of multi-objective satisfaction function
In the interaction process of the whole bi-level programming model, the optimization objective of the upper level is a single function, and the satisfaction function value is directly calculated according to Eq. (17). The optimization objective of the lower is the double function, and the satisfaction degree of each objective function needs to be treated by a sum approach. Inspired by the sum approach strategy in MOEA/D [44], the Tchebycheff approach [45] is used to aggregate the lower level satisfaction functions in this paper. Let the weight decomposition vector of the global solution space be λ = (λ 1 , λ 2 , · · · , λ N ) T λ , N is used to denote the population size, the reference points is Z * = min {Z k (x) |x ∈ X }. According to Tchebycheff approach, the two kinds of membership functions of the model are expressed as follows: Using the MOEA/D to solve the multi-objective optimization model composed of the above two kinds of membership functions, solution set X (the Pareto Front (PF)) satisfying the requirements can be obtained. After successfully selecting the PF solution sets of each layer satisfying the requirements, further selection of each individual in the PF is required, and the solutions selected by the upper level solution set need to be transferred according to the satisfaction function as the basis for the lower level solution. The individual selected by the lower solution set can be used as the final basis for the commander to make decisions in the combat process.
1.1 Extend λ to T λ , mark as N S l = N S 1 l , N S 2 l , · · · , N S t l , · · · , N S Tλ l ; 1.2 Calculate initial population x in s 1 (Z 1 (x)) , s 1 (Z 2 (x)) , · · · , s 1 (Z m (x)), update the reference point Z * . 3 Set x l is the sub-target population to s 1 (Z i (x)) (using Tchebycheff approach), 4.3 Construct the weighted normalized decision matrix V = (v i j ) n×m , for every v i j = w j r i j , w j ∈ w. 5: Determine the positive ideal solutions A + and the negative ideal solution A − : j is the maximum of v i j in jth column, j = 1, 2, · · · , m. 6: Choose the satisfied solution. 6.1 Calculate the separation measures from A + and A − for each solution: , i = 1, 2, · · · , n 6.2 Calculate the relative closeness C for each solution and select the most satisfied solution x: The bigger C is the more satisfied the solution is.
The two solution selective processes of transferring the upper level optimal solution to the lower level and selecting the final optimal solution from the lower level optimal solution both require further selection in the PF set obtained by the multi-objective optimization. In fact, this problem can be regarded as a multi-attribute decision-making problem with the objective function of the multi-objective optimization model as the attribute value. Therefore, we used TOPSIS in combination with MOEA/D to obtain the final optimal solution. The flow framework of MOEA/D-TOPSIS algorithm proposed in this paper is shown in Algorithm 1.

HMOBIF
In the following, we use the hybrid multi-objective bi-level interactive fuzzy programming method (HMOBIF) proposed in this paper to solve the ECM-DWTA model. The steps are as follows.
Step 1 Initialize β l (l = 1, 2), obtain the satisfaction membership function s l (Z k (x)) (k = 1, 2, 3) according to formulas (17)(18)(19), and determine the prior minimum satisfaction θ l according to the prior knowledge of a decision-maker, where θ l ∈ (0, 1). Both of them need to satisfy the following formula: Step 2 MOEA/D is used to optimize the satisfaction of each objective function, and the model is as follows: The initial local optimal solution of the model is obtained. Then, the initial satisfaction solution satisfying the preference of model (21) should be selected from the PF optimal solution set of the model.
Step 3 Determine if the formula (20) is valid.
If s 1 (Z k (x * )) ≥ θ 1 , the initial solution is selected as the global optimal solution, and the flow proceeds to step 8; If s 1 (Z k (x * )) < θ 1 , it is necessary to reduce θ 1 and adjust it according to the specified step, and then the flow goes to Step 4.
If the equalization factor is within the equalization interval, go to Step 8; If the equalization factor exceeds the equalization interval, go to Step 5.
Step 5 Continue to judge.
If δ 1 < δ L 1 , the set satisfaction degree is too high, and θ 1 decreases according to the set step path; If δ 1 > δ U 1 , the set prior satisfaction degree is too low, and θ 1 increases according to the set step path.
Loop the above judgment until the following conditions both are satisfied.
When both conditions are satisfied, the algorithm is terminated and the flow goes to Step 8; otherwise, the flow goes to Step 6.
Step 6 Continue to judge.
If the condition of (23) in the termination condition is not satisfied, the lower level will reduce its satisfaction degree according to the set step path; If both conditions are not met, go to Step3; If that condition of (24) in the termination condition is not satisfied, continue to judge: (i) If δ 2 > δ U 2 , and θ 2 increases according to the set step path; (ii) If δ 2 < δ U 2 , and θ 2 decreases according to the set step path.
The resulting lower level satisfaction is θ 2 .
Step 7 Update the model (21) to get the model shown in the following formula: The MOEA/D is used to solve the multi-objective optimization model (25), and it is found that the locally optimal solution is x * * go to Step5.
To express the logical relationship of the above steps more intuitively, we use Fig. 4 to represent the flowchart of the hybrid multi-objective bi-level interactive fuzzy programming method for solving the ECM-DWTA model.

Experiment results and discussion
To evaluate the performance of HMOBIF for ECM-DWTA, four experiments, EX-1, EX-2, EX-3 and EX-4, were carried out. We compare the simulation results of the model parameters and the algorithm in solving the bi-level model. According to the comparison of the simulation results, the optimal values of the model parameters and the comparison results of the algorithm performance are obtained. All compared methods were coded in MATLAB language and executed on an Intel Core i7 2.6-GHz PC with 32 GB of memory. The runtime units were CPU seconds.

Experimental problem set
The settings of the parameters in the ECM-DWTA model in all experiments are shown in Table 2. The maximum number of iterations N max is set to 1000 and 2000, which will be discussed in subsequent simulations.
In the process of simulation, we set up four cases of comparative experiments according to the different scales (Ex-1 to Ex-4), which are shown in Table 3.
In actual combat, K i j (t) is generally determined by the performance of ECM equipment and electronic targets [47]. To facilitate the calculation, this paper randomly expands on the basis of a certain probability, rand() is the function for generating a random number in the range of [0, 1].
Besides paying attention to the changes of the optimal solution, the objective function value and the membership function value, the jamming value-cost ratio is also compared, and the expression of this metric is The range of ROI is (0, 1). Obviously, a larger value means a better interference allocation scheme.

Parametric discussion and model solution
Firstly, we discuss the choice of a membership function type. Under the condition of β 1 = β 2 = 1, the linear, hyperbolic, parabolic, and the exponential membership function used in this paper are used to solve the model, let each comparison algorithm independently solve the above cases 30 times. Throughout the simulation, we run the MOEA/D-TOPSIS algorithm with N max set to 1000 and 2000, respectively. The results are shown in Table 4. On the table, the index values under each membership function were the average values after multiple operations, where s * 1 and s * 2 are the optimal values of satisfaction degree of each level, i.e., s * 1 = arg max {s 1 (Z 1 (x)) , s 1 (Z 2 (x)) , s 1 (Z 3 (x))}, s * 2 = arg max {s 2 (Z 1 (x)) , s 2 (Z 2 (x)) , s 1 (Z 3 (x))}. We show the optimal value of the metric in bold. Analyzing Table 4, the mean value of s * 1 , s * 2 and ROI of exponential membership functions are almost larger than the other three membership functions. It is also illustrated that the exponential membership function is suitable in the ECM-DWTA model than the other three membership functions. In addition, as the number of iterations in the algorithm increases from 1000 to 2000, all the indicators under the exponential membership functions are greater than the rest, and their advantages are more obvious. It further illustrates its adaptability.. Next, we set N max = 2000 and select the exponential membership function to solve the ECM-DWTA problem for four scales, and Fig. 5 shows the solution results for the first two scenarios among them.
The times for successfully solving four ECM-DWTA problems of different sizes are 23.78 s, 59.01 s, 92.26 s and    (b) The best solution on the EX-2.

Fig. 5
The optimal solutions of the first two scenarios among the four scale scenarios 146.44 s, respectively. In Fig. 5, the grey, blue and orange squares in the figure represent three different types of ECM equipment, respectively. The numbers in the squares represent the serial numbers of enemy electronic targets. It can be seen that the ECM-DWTA problem can be successfully solved using the algorithm proposed in this paper combined with the exponential membership function. Then, we make a comparative experiment on the fuzzy satisfaction coefficient β l (l = 1, 2) in the algorithm. Let us assume that β 1 and β 2 choose a value between 0 and 1. We will exhaust all possible values for β l (l = 1, 2). When β 1 = β 2 = 1, the upper and lower decision-makers consider both acceptance and rejection. To make the simulation experiments general and comprehensive, we experimented with the value of β l for all cases and separately for the number of algorithm iterations of 1000 and 2000, respectively. When β 1 = β 2 = 0, it indicates that the upper and lower decision-makers adopt an accepting attitude. When β 1 = 1, β 2 = 0, the upper decision-makers take the attitude of considering both acceptance and rejection, while the lower decision-makers take the attitude of acceptance. When β 1 = 0, β 2 = 1, it indicates the attitude of upper decisionmakers to accept, while lower decision-makers consider both acceptance and rejection. The experimental results are shown in Table 5. We show the optimal value of the metric in bold.
From the comparison of the above four experimental results, it can be seen that when β 1 = 0, β 2 = 0, the model has the worst solution result. This is because only adopting the attitude of complete acceptance in the bi-level model can reduce the number of iterations of the model satisfaction degree, resulting in the optimal solution not being found. In addition, when the N max = 1000, the values of β 1 and β 2 change from β 1 , β 2 = {1, 1}, β 1 , β 2 = {1, 0} to β 1 , β 2 = {0, 1}, the larger the scale of the simulation experiment, the better the solution. This is because, in the case of a small scale, the bi-level decision-makers consider both acceptance and rejection factors at the same time, which is more conducive to the determination of satisfaction. However, when the scale example expand, excessive consideration of rejection factors will affect the solution efficiency of the model to a certain extent in the upper level model (single optimization objective). When β 1 = 1, β 2 = 0, that is, the upper decision-makers take the attitude of considering both acceptance and rejection, which can make the model obtain faster efficiency in the upper single-objective model and obtain enough satisfaction iteration in the lower doubleobjective model. Therefore, in large-scale problem solving, the result is the best in this case. In addition, when the number of iterations of the algorithm is increased to N max = 2000, the optimal value is obtained in all cases except for the EX-4 metric which does not obtain all the optimal values under the condition of β 1 = 1, β 2 = 0. Therefore, this parameter value is used in the discussion of the algorithm performance in the subsequent sections.

Comparison and simulation of bi-level programming model algorithms
In the research of all DWTA problems, the setting of different models will greatly affect the choice of solving algorithms. At present, there is no research on constructing the DWTA model by bi-level programming. Therefore, in this paper, when choosing the comparison algorithm for model solving, several representative algorithms for solving multi-level programming are selected for simulation experiments, i.e., Kth-Best algorithm [30], MPECs algorithm [32], trust-region algorithm [34], fuzzy interactive algorithm [36]. The first four algorithms have no parameter setting, while the parameter setting of the fifth algorithm is the same as that of the fuzzy membership function in this paper.
Further, in the HMOBIF, we use the MOEA/D-TOPSIS algorithm to solve the multi-objective planning model for each level and find the optimal value in each PF. The MOEA we use in searching the PFs is MOEA/D. To illustrate the applicability of this algorithm to the ECM-DWTA problem, we will incorporate the current classical high-performance MOEAs (i.e., NSGA-II [20], RVEA [48], MOIBA/AD [49]) into the algorithmic framework shown in Fig. 4, conduct comparative experiments, and discuss the reasonableness and efficiency of the algorithm.
Five comparison algorithms are used to solve the ECM-DWTA problem in four combat examples, running independently 40 times. Table 6 shows the statistics of the computational time (CT ) and the ROI metrics. Figure 6 shows the statistical distribution of the computational time and the ROI metrics of the 5 algorithms boxplot, respectively.
Based on the comprehensive analysis of Table 6, Fig. 6, except that the performance of the algorithm proposed in this paper in solving ROI index in EX-1 case is slightly lower than that of the four comparison algorithms, the performance indexes of CT and ROI in the other three scenarios are the best among the five algorithms for solving bi-level programming. In addition, the Kth Best algorithm and the MPECs algorithm adopt linear solution strategy in solving multi-level programming problem, which leads to great shortcomings in calculating the running time of the model. Trust-region algorithm and fuzzy interactive algorithm do not comprehensively consider the subjective judgment factors of two-tier decision-makers, which leads to the ROI value not reaching the optimal result. From the boxplot of the comparative experiment, it can be seen that the algorithm proposed in this paper has better stability when solving the model in different scale examples, and has greater convergence advantage than the comparative algorithms (Fig. 7). Table 5 The comparative experiments of the fuzzy satisfaction coefficient β l
In Fig. 8, the results of the three metrics of all algorithms in solving the ECM-DWTA model of the four combat cases show a declining trend. This is because, with the advance of the combat stage, ECM weapons are constantly joining in the combat, and the number of enemy electronic targets interfered with will be more and more, the number of noninterfered will be less and less, and the number of ECM weapons allocated to the target will also be less and less. (a) CT of EX-1.   In the following, we will discuss the rationality of the multi-objective evolutionary optimization algorithm MOEA/D chosen for the model solving process. To better illustrate the performance of the algorithm, we solve the model (21) using several comparative MOEAs, present the results, and statistic the results of the parameters reflecting the performance of the algorithm to comprehensively analyze the suitability of the chosen MOEA/D.
In the selection of parameters, we choose the two, Inverted generational distance (IGD) and Hypervolume (HV) [50], which can reflect the quality of the solution set of the MOEAs, and the time taken to run the solution model (21), where IGD and HV can reflect the distributivity of PF, and the smaller value of the former indicates the better performance of the algorithm, and the larger value of the latter indicates the better performance of the algorithm. Table 7 and Fig. 8 show the performance parameter solution statistics and the solution set frontier in solving model (21) for the four compared algorithms with 1000 and 2000 iterations, respectively. Table 7 shows the results for each parameter at the number of iterations of 1000 and 2000, respectively, and the optimal values of each parameter are marked in the table using bolded fonts. The contents of the table in parentheses indicate the standard deviation of the parameter results, and the last row indicates: the statistical number of results for each comparison algorithm with better than (+), worse than (−) and equal to (=) performance, using RVEA as the benchmark. As can be seen, among the results of the four comparison algorithms solving model (21) in the HMOBIF framework, MOEA/D obtains the majority of the optimal parameter solutions, and the algorithm performance advantage is not obvious enough in Ex-1, but the optimal metric results are obtained in Ex-3 and Ex-4 with the expansion of the example size. On the other hand, MOEA/D also obtains better results in the final statistical results, indicating that this algorithm obtains a better distribution of solution sets (PF) when solving the model. Similarly, Fig. 8 shows the distribution of the final solution set frontier obtained by each comparison algorithm in solving model (21). It can be seen that the solution set obtained by MOEA/D is better distributed for different number of iterations of the algorithm, and the better values of the objective function based on the PF obtained can be taken than the rest of the MOEAs. In summary, the MOEA/D algorithm can achieve the solution of the multi-objective model in the HMOBIF algorithm well, and has a strong applicability.

Conclusions and future works
In this paper, the traditional DWTA problem is reasonably extended and the ECM-DWTA problem model is established according to the combat characteristics and combat background of ECMs in modern warfare. The bi-level programming is used to describe the problem in this model, and the interaction between the bi-levels simulates the OODA loop of operational command decision, which makes the dynamic assignment of ECM weapons closer to the actual combat situation. We proposed the HMOBIF, considering both the acceptance and rejection degree of the decision-maker, and effectively solve the ECM-DWTA model. Simulation experiments were carried out in two aspects. The first is the comparison and simulation of the membership function type selection and the satisfaction coefficient. The other is the comparison and simulation of several common algorithms   (e) Ex-1, N max=2000 (f) Ex-2, Nmax=2000 (g) Ex-3, Nmax=2000 (h) Ex-4, Nmax=2000 Fig. 8 PF results of compared algorithms in solving model (21) for solving multi-level programming and the algorithms proposed in this paper. The simulation results show that: (i) The exponential membership function selected in this paper is more suitable for the construction of the ECM-DWTA model. (ii) The parameters β 1 = 1, β 2 = 1 are more suitable for solving small-scale problems, while parameters β 1 = 0, β 2 = 1 are more suitable for solving large-scale problems. (iii) Compared with the four algorithms for multi-level programming, the algorithm proposed in this paper is more suitable for the ECM-DWTA model, has better results and better stability.
In the future work, we have the following 3 different research directions. First, the ECM-DWTA model proposed in our paper still has a gap with the actual combat scenario, it is necessary to further expand the model more consistent with the operational reality. Secondly, we will apply the HMOBIF to solve multi-level programming problems in other fields, and further study the algorithm for solving multi-level programming problems to the better performance and stability. Finally, we will further explore the application and solution methods of machine learning algorithms in ECM-DWTA problems, and compare the current very popular machine learning methods for solving combinatorial optimization problems with classical optimization methods to obtain faster and better solution algorithms to meet the needs of modern operations.