Introduction

Many real-world optimization issues involve multiple conflicting objectives. They are known as multiobjective problems (MOPs). Since their objectives may often conflict with each other, optimizing one may degrade the others [1]. As a result, what we ultimately obtain is a Pareto optimal set. To approximate such a tradeoff set, a variety of multiobjective evolutionary algorithms (MOEAs) have been proposed during the decades [2, 3].

Recent evidence suggests that MOEAs can well address some common MOPs but struggle with the MOPs with a high degree of complexity [4, 5]. Considering large-scale multiobjective problems (LSMOPs), traditional MOEAs often suffer from a significant decrease in convergence, commonly known as the curse of dimensionality. This phenomenon is explained by the exponential growth of the search space, stemming from the increasing number of decision variables of LSMOPs [6,7,8]. To overcome the problem, many researchers have worked on LSMOPs and developed considerable specialized MOEAs. These endeavors can be classified into three categories, namely decision variable grouping-based MOEAs [9, 10], decision space reduction-based MOEAs [11, 12], and novel search strategy-based MOEAs [13].

Over the past several years, one of the increasingly common LSMOPs exhibited a sparse property within the large-scale decision variables. Such sparse LSMOPs wildly exist in scientific research, such as neural network training [14] and feature selection [15]. In these issues, the majority of decision variables of the Pareto optimal solutions must be precisely zero value, whereas others must be some exact non-zero value. Moreover, many practical engineering applications also contain solutions with sparsity. For instance, in agricultural production, the farmers need to identify a few available days from a long year for crop watering [16]. These days correspond to the non-zero variables, whereas the others correspond to the zero variables. Note that current MOEAs designed for LSMOPs usually undergo a performance decrease on sparse LSMOPs [17, 18]. The primary reason is their neglect of the sparse structure of solutions. When optimizing sparse LSMOPs, they often start with a uniformly sampled population and have to evolve the population from a dense high-dimensional to a sparse high-dimensional space, which poses a significant challenge to the evolutionary operators [19]. Moreover, the commonly-used evolutionary operators, such as simulated binary crossover and polynomial mutation, have a relatively low probability of generating exact zero variables and thus cannot well approximate sparse solutions [19, 20].

Considering the sparse structure, researchers naturally integrate some novel operators into MOEAs to address sparse LSMOPs. For example, Kropp et al. [19] proposed an optimization subroutine called sparse population sampling (SPS). SPS can bring a sparse population rather than a uniformly sampled one during the initialization and can be applied to any MOEAs for sparse LSMOPs. Inspired by SPS, [21] further suggested a varied striped sparse population sampling operator (VSSPS). Its purpose is to generate at least some individuals in each non-zero dimension of decision variables. In S-ECSO [22], it introduced a strongly convex sparse operator to fine-tune the obtained solutions, thus strengthening the sparsity of the evolving population. More recently, ST-CCPSO [23] employed a sparse truncation operator to compute the cumulative gradient of each decision variable and determine whether a variable should be set to zero by comparing the current gradient with the last accumulative gradient of the solution.

Fig. 1
figure 1

Illustration of the two-layer encoding for sparse LSMOPs

Unlike the above approaches, another popular class employs a two-layer encoding (see Fig. 1) for coping with sparse LSMOPs. From the figure, the high-level layer is represented by a set of real vectors, which record the values of non-zero decision variables during the evolution. The low-level layer is represented by a set of binary vectors, which record the positions of zero decision variables. Then, an individual or its vector of decision variables could be jointly encoded by the real-valued and binary-valued vectors. Consequently, the zero and non-zero variables can be separately optimized based on the low-level and high-level layers. Compared to the original one-layer encoding, the two-layer encoding has two advantages. Firstly, the latter involves two types of decision variables, which makes it adaptable to various sparse LSMOPs. In detail, we can solely activate the high-level layer when optimizing binary sparse LSMOPs and comprehensively employ both layers when facing continuous sparse LSMOPs. Secondly, the separation of optimizing zero variables in the latter can help to introduce sparsity detection and maintenance in optimizing continuous sparse LSMOPs. The quickly generated zero variables can effectively enhance the sparsity of the population and improve the convergence speed of MOEAs. Due to the versatility in optimizing different types of variables and efficiency in optimizing zero variables, the two-layer encoding has become a popular topic in this field.

In SparseEA [17], Tian et al. pioneered the usage of two-layer encoding in addressing sparse LSMOPs. They also elaborately designed a novel population initialization and a specialized evolutionary operator to highlight the encoding. Inspired by SparseEA, STEA [24] also used the two-layer encoding and decomposed the large-scale problem into several low-dimensional subproblems by using a sparse rank-1 approximation. The aim is to overcome the shortcomings of traditional variable-based grouping and analysis methods. In PM-MOEA [25], a two-layer encoding-based evolutionary pattern mining was proposed to detect non-zero variables. During the mining, the non-zero variables were divided into maximum and minimum candidates and continuously optimized toward the optimal values. In MOEA/PSL [12], a restricted Boltzmann machine (RBM) and a denoising autoencoder (DAE) were employed to reduce the binary and real search space, which are represented by the binary vector and real vector of two-layer encoding, respectively. Then, the evolutionary operators were performed in the reduced space, and the evolving results were mapped back to the original space. In MOEA-ADR [26], Geng et al. further grouped the elements of binary vectors and also used a refined RBM to reduce the binary search space. In SGECF [27], a sparsity-guided elitism co-evolutionary framework was proposed to enhance the learning capability of dominated solutions. In each iteration, the best sparsity, which was calculated by the binary vectors of the non-dominated solutions, was used to guide the co-evolution between the non-dominated and dominated solutions. In RSMOEA [28], a dynamic guided operator was proposed to further improve SparseEA. It incorporated the information of binary vectors of the non-dominated solutions as well as the fitness score for each individual to guide the evolution of the population towards the sparse optimal solution.

Despite broad usage, the two-layer encoding encounters difficulties balancing the optimizations of zero and non-zero variables [29]. It is primarily because the value of an encoded individual is more likely to be controlled by the low-level layer, which is mainly responsible for the optimization of zero variables. When the value of a variable in the binary vector is zero, the value of the corresponding position in the individual equals exactly zero regardless of the value of the corresponding variable in the real vector. This phenomenon may weaken the control of the high-level layer, which is used to optimize non-zero variables. Such an imbalance will make the population hardly converge to the sparse Pareto optimal solutions within limited function evaluations. Therefore, how to balance the control of the high-level and low-level layers is pending.

To tackle the issue, SparseEA2 [29] employed a cascading strategy, where the real variables could be optimized if the binary variables at the corresponding positions were flipped. Similar to SparseEA2, DSGEA [30] further adopted a dynamic grouping to identify the mutation positions for both real and binary vectors and then optimized the real vectors according to the mutation results of the binary vectors. Recently, DM-MOEA [31] also used a randomized grouping to ensure the mutation of real vectors based on the mutation results of the binary vectors. By adopting such a cascading strategy, we can witness that the control of the low-level layer is relaxed as the influence of the low-level layer on the individual is partly reduced. It will lead to an improvement in optimizing non-zero variables to some extent. However, what cannot be ignored is that the strategy makes the optimization of non-zero variables highly dependent on the optimization of zero variables, as the evolution of the real vector is influenced by the values of the binary vector.

More recently, several studies have focused on developing a matching strategy instead of the cascading strategy [20, 32]. In TS-SparseEA [32], the algorithm first evaluated the preference of a binary vector for multiple real vectors, which represent all the individuals of the evolved offspring. Then, each binary vector must match its most preferred real vector, and a set of new individuals was recombined based on the above fashion. The preference is described as follows: a variable of the binary vector with exactly zero value prefers the real variable with a much smaller value at the corresponding position. Then, the preference of the binary vector on a real vector is calculated by accumulating the preference of all of its binary variables on the corresponding real variables. By doing this, the variable that has a significant value in the real vector has a higher probability of surviving rather than being combined with a zero-valued variable in the binary vector. Similar to the cascading strategy, the matching strategy can also relax the control of the low-level layer since it aims to find an appropriate combination between the binary and real vectors. Moreover, it can alleviate the dependency of the optimization of non-zero variables on the zero variables as it allows an independent evolution of both real and binary vectors. Therefore, compared to the cascading strategy, the matching strategy can be recognized as a more suitable way for balancing the control of high-level and low-level layers.

To summarize, the researchers have reached a consensus that striking a balance between the optimizations of zero and non-zero variables is of great importance. Under the topic, we can observe visible progress in improving the two-layer encoding. However, the above mentioned literature still puts the low-level layer in the first place. For example, the real vector cannot undergo mutation ahead of the binary vector in the cascading strategy. The real vector also cannot select the binary vector in the matching strategy according to the preference. Therefore, there is still room for balancing the control of the two layers on the encoded individuals within the two-layer encoding. This paper recognizes that it is necessary to use the information of the high-level layer and enhance its control property. To reach the goal, we propose to improve the two-layer encoding by building a two-way association between the two layers. The primary contributions are summarized as follows.

  1. 1.

    A mutual preference calculation method (MPC) is first proposed. Instead of traditional matching strategies emphasizing the binary vector only, we additionally consider the preference of the real vector and put forward a new distance for calculating their mutual preference. By adopting MPC, MOEAs can comprehensively exploit the information of both binary and real vectors included in the two-layer encoding.

  2. 2.

    A two-way matching strategy (TWM) based on MPC is then proposed. To build a two-way association between the two encoding layers, TCM allows a one-to-one matching of binary and real vectors when both are in the top ranks of each other’s preferences. Essentially, the two-way association balances the influence of two layers on the encoded individual by relaxing the control of the low-level layer and enhancing the control of the high-level layer, thus reaching the balance between the optimizations of zero and non-zero variables..

  3. 3.

    To verify the effectiveness of MPC and TWM, we propose a new MOEA equipped with the two modules and then conduct extensive experiments on 32 sparse LSMOPs benchmark problems. The results show that the proposed MOEA exhibits a promising performance on the benchmark problems compared to the recent MOEAs, and the two modules have the potential to overcome the drawbacks of the two-layer encoding.

The remaining contents are organized as follows. Section “Related work” introduces the related work and emphasizes the two-layer encoding. In Sect. “Methodology”, the motivation of this paper and the procedures of MPC and TWM are detailed. Then, the experimental results are presented and discussed in Sect. “Experimental studies”. Finally, the conclusion and future work are suggested in Sect. “Conclusion and discussion”.

Related work

Two-layer encoding for sparse LSMOPs

By considering the structure of sparse LSMOPs, Tian et al. proposed a hybrid representation called two-layer encoding in [17]. Since then, it has been recognized as a commonly used encoding for sparse LSMOPs [26, 30, 33]. In the two-layer encoding, a solution \(\varvec{x}_i\) in the population with size N (\(i= 1, 2,\ldots ,N\)) is jointly represented by a binary vector and a real vector rather than a single vector.

Algorithm 1
figure a

PC(P)

$$\begin{aligned} \varvec{x}_i= & {} (x_{i,1}, x_{i,2}, \ldots , x_{i,D})^{\textrm{T}} \nonumber \\= & {} (dec_{i,1}\times mask_{i,1}, dec_{i,2}\times mask_{i,2}, \ldots , dec_{i,D}\nonumber \\ {}{} & {} \quad \times mask_{i,D})^{\textrm{T}}\nonumber \\= & {} dec_{i} \odot mask_{i} \end{aligned}$$
(1)

where \(dec_{i}\) is a real vector and \(mask_{i}\) is a binary vector. The variable \(dec_{i,j}\) describes the exact value of \(x_{i,j}\), and \(mask_{i,j}\) controls whether to express \(dec_{i,j}\) (\(j = 1,2,\ldots ,D\)). That is, if \(mask_{i,j} = 1\), then \(x_{i,j} = dec_{i,j}\), otherwise \(x_{i,j} = 0\). The symbol ’\(\odot \)’ means performing element-wise multiplication for \(dec_{i}\) and \(mask_{i}\) in the same position. By doing so, a decision vector x with a large number of zero variables can be easily generated. Then, a sparse population can be created through the following way:

$$\begin{aligned} \begin{aligned} P&= P.Dec \odot \, P.Mask \end{aligned} \end{aligned}$$
(2)

where P.Dec represents the high-level layer consisting of all the real vectors in the population P. It is recognized as being in charge of optimizing the non-zero variables of sparse LSMOPs. In contrast, P.Mask denotes the low-level layer, which consists of all the binary vectors. It mainly undertakes the detection and optimization of the zero variables of sparse LSMOPs.

Matching strategy for two-layer encoding

To improve the two-layer encoding, our previous work [32] introduced a simple matching strategy. It mainly consists of two modules: a preference calculation (PC) method and a one-way matching (OWM) strategy. The following paragraphs will briefly describe these two modules.

(1) Preference calculation: The goal of using this module is to evaluate the preference of each binary vector for all the real vectors and then generate the corresponding preference list. Algorithm 1 detailed the pseudo-code of the method. From the code, we can observe two key components, including the distance calculation and the preference list representation.

Algorithm 2
figure b

OWM(P, \(\Psi _{Mask}\))

On the one hand, the distance calculation essentially provides an available tool for evaluating the preference of binary vectors. In detail, the preference is described as follows: a variable in mask with exactly zero value prefers the variable in dec that presents a much smaller value at the corresponding dimension. Then, a variable of dec with a significant value will obtain a higher probability of surviving rather than being combined with a zero-valued variable in mask. For example, given two binary vectors: \(mask_{1}= (0,1)^{\textrm{T}}\) and \(mask_{2}= (1,1)^{\textrm{T}}\) and two real vectors: \(dec_{1}= (1,0.6)^{\textrm{T}}\) and \(dec_{2}= (0.1,0.6)^{\textrm{T}}\). In this case, \(mask_{1}\) prefers \(dec_{2}\) to \(dec_{1}\). It is because \(mask_{1}\) has a zero value in the first dimension, and \(dec_{2}\) has a smaller value than \(dec_{1}\) in the same position. When Algorithm 1 combines \(dec_{1}\) with \(mask_{2}\) to constitute \(\varvec{x}\) based on Eq. (1), the significant value in the first dimension of \(dec_{1}\) can be maintained in \(\varvec{x}\). Instead, if \(dec_{1}\) is combined with \(mask_{1}\), the value of the first dimension in its combined solution will be set to zero, and we call it the first dimension of \(dec_{1}\) is controlled by \(mask_{1}\). To reach the above goal, the preference of mask on a member dec is calculated by computing the overall distance in the following way

$$\begin{aligned} \begin{aligned}&D(mask_{i},dec_{j}) =\frac{mask_{i}\cdot dec^{*}_{j}}{\Vert mask_{i}\Vert \Vert dec^{*}_{j}\Vert } \\&\qquad =\frac{\sum _{d=1}^{D}mask_{i,d}\times dec^{*}_{j,d}}{\sqrt{\sum _{d=1}^{D}\left( mask_{i,d}\right) ^{2}}\times \sqrt{\sum _{d=1}^{D}\left( dec^{*}_{j,d}\right) ^{2}}} \end{aligned} \end{aligned}$$
(3)

where D denotes the dimension of the decision variable, \(dec^{*}_{j}\) is obtained by normalizing \(dec_{j}\) according to the lower bound and upper bound of each variable.

On the other hand, the preference list provides information on the preference ranking of binary vectors for real vectors. For example, suppose that P is a population with size \(N=3\), then the preference list of P.Mask can be represented by

$$\begin{aligned} \begin{aligned}&\Psi _{mask_1}:(2, \quad 3, \quad 1)^{\textrm{T}}\\ \Psi _{Mask} = \,\,&\Psi _{mask_2}:(1,\quad 3, \quad 2)^{\textrm{T}} \\&\Psi _{mask_3}:(1, \quad 2, \quad 3)^{\textrm{T}} \end{aligned} \end{aligned}$$
(4)

where \(\Psi _{mask_1}\) denotes the preference list of \(mask_{1}\) in P.Mask. The vector \( \Psi _{mask_1}=(2,3, 1)^{\textrm{T}}\) means that \(mask_{1}\) prefers \(dec_{2}\) over \(dec_{3}\) and \(dec_{1}\).

Fig. 2
figure 2

Example of the marriage matching problem, where both the one-way and two-way associations are involved

(2) One-way matching: According to the preference list \(\Psi _{Mask}\), OWM aims to assign an unpaired real vector to every binary vector. In Algorithm 2, each binary vector \(mask_{r}\) should be matched with its most preferred real vector \(dec_{s}\) and then a set of new solutions Q was recombined based on the two-layer encoding. It is worth noting that the cooperation of the two modules can help to relax the control of the low-level layer since they aim to find an appropriate combination between the binary and real vectors within the two-layer encoding.

Methodology

Motivation

It is essential to strike a balance between the optimizations of zero and non-zero variables, as the overcontrol raised by the low-level layer in the two-layer encoding may hinder the evolution of the population. As highlighted in [32], using the association to improve the two-layer encoding can enhance the population’s quality. However, this paper notices that the association in TS-SparseEA follows a one-way manner, which may bring new issues in two aspects.

Firstly, the one-way association puts the low-level layer in the first place, which does not favor balancing the control of two layers of the two-layer encoding. Turning to the reality, many similar scenarios follow a two-way association that considers the preference of both sides, e.g., the marriage matching problem [34]. Figure 2 shows an example of such a problem, and we can see each man has a preference list for the women and vice versa. If we consider the man’s preference only, a possible matching result is shown in the figure. It shows that Alan is Jessie’s least favorite man, although Jessie occupies the first rank on his preference list. Likewise, Bob and Alice stand at the last rank of Gina’s and Eric’s preference lists, respectively. Such a result makes matching unfeasible as they tend to end their current marriages. The result will differ if we pursue a two-way association by considering the mutual preference. Since Eric prefers Gina the most and vice versa, the matching {Eric, Gina} is stable regardless of the matching results of others or the matching order used. Besides, Alan prefers Alice the most whereas Alice is second on Alan’s preference list. Bob is second on Jessie’s preference list and vice versa. The above two are both acceptable results. We also note that using the one-way association can obtain the same result as the two-way due to the random ordering. However, the latter can more effectively approximate a near-optimal solution. Therefore, the two-way association seems superior to the one-way in the marriage matching problem. The above conclusion may also apply to the problem in this paper. The one-way association may lead the binary and real vectors to combine with the objects that are not very suitable for themselves. The mismatch will weaken the balance between the two layers in the two-layer encoding and thus influence the optimizations of zero and non-zero variables.

Secondly, assigning a relatively good candidate for each binary vector is challenging. This is because the number of available real vectors decreases as the matching progresses. Similar observations have also been found in the marriage matching problem. For example, given a preference list \(\Psi _{Mask}\) as shown below.

$$\begin{aligned} \begin{aligned}&\Psi _{mask_1}:(3, \quad 4, \quad 1, \quad 2)^{\textrm{T}}\\ \Psi _{Mask} = \,\,&\Psi _{mask_2}:(4,\quad 3, \quad 2, \quad 1)^{\textrm{T}} \\&\Psi _{mask_3}:(1, \quad 2, \quad 3, \quad 4)^{\textrm{T}}\\&\Psi _{mask_4}:(1, \quad 4, \quad 3, \quad 2)^{\textrm{T}} \end{aligned} \end{aligned}$$
(5)

From Eq. (5), suppose that \(mask_1\) and \(mask_2\) have already matched with \(dec_3\) and \(dec_4\), respectively. Then, \(mask_3\) or \(mask_4\) will participate in the matching through a random selection in the next step. It is worth noting that both of the two binary vectors prefer \(dec_1\) to \(dec_2\). If Algorithm 2 selects \(mask_3\) for matching first, \(mask_4\) will match with its most minor favorite object \(dec_2\). Such a phenomenon is difficult to avoid but seems to be improved by considering a two-way association, according to the example illustrating the marriage matching problem. Besides, suppose we impose proper constraints on the preference list, such as decreasing the length of the preference list in Eq. (5). In that case, the probability of the unfavorable matching can be reduced.

Based on the above analysis, we observed that there is still room for improving PC and OWM. Incorporating a two-way association enables more effective utilization of information from binary and real vectors and facilitates a better balance. Consequently, we attempt to propose and develop MPC and TWM. The next subsections will detail these two modules.

Algorithm 3
figure c

MPC(P)

Fig. 3
figure 3

Examples of the calculation of \(D(dec_{i},mask_{j})\) and \(D(mask_{i},dec_{j})\)

Mutual preference calculation

As illustrated above, it is important to consider the two-way association between the low-level and high-level layers in the two-layer encoding. Therefore, instead of traditional matching strategies emphasizing the binary vector only, we additionally consider the preference of the real vector in this paper. By using MPC, we have established a two-way association between the binary and real vectors and thus can employ their mutual preference in the following steps.

Algorithm 3 mainly presents the pseudo-code of MPC. Its main feature is that both \(\Psi _{Dec}\) and \(\Psi _{Mask}\) are involved. The procedure begins with computing \(\Psi _{Dec}\) (lines 1 to 9) and ends with reusing Algorithm 1 to obtain \(\Psi _{Mask}\) (line 10). Moreover, we design a new distance for defining the mutual preference and simultaneously generating \(\Psi _{Dec}\) and \(\Psi _{Mask}\). To be specific, the distance \(D(dec_{i},mask_{j})\) is calculated as shown below.

$$\begin{aligned} \begin{aligned} D(dec_{i},mask_{j})&=\frac{1}{1+\sqrt{\sum _{d=1}^{D_1}(dec_{i, nz_d}-mask_{j, nz_d})^2}} \\&\quad + \frac{1}{1+\sqrt{\sum _{d=1}^{D_2}(dec_{i, z_d}-mask_{j, z_d})^2}} \end{aligned} \end{aligned}$$
(6)

where \(nz_d\) or \(z_d\) represent the position of the d th non-zero variable or zero variable in rounded \(dec_{i}\), respectively. \(D_1\) and \(D_2\) represent the total numbers of non-zero variables and zero variables in rounded \(dec_{i}\), respectively. Note that, \(D_1\) and \(D_2\) satisfy the constraint \(D_1+D_2 \le D\), where D is the dimension of \(dec_{i}\). Similarly, the distance \(D(mask_{i},dec_{j})\) is calculated as shown below.

$$\begin{aligned} \begin{aligned} D(mask_{i},dec_{j})&=\frac{1}{1+\sqrt{\sum _{d=1}^{D_3}(mask_{i, nz_d}-dec_{j, nz_d})^2}} \\&\quad +\frac{1}{1+\sqrt{\sum _{d=1}^{D_4}(mask_{i, z_d}-dec_{j, z_d})^2}} \end{aligned} \end{aligned}$$
(7)

where \(nz_d\) or \(z_d\) represent the position of the d th non-zero variable or zero variable in \(mask_{i}\), respectively. \(D_3\) and \(D_4\) represent the total numbers of non-zero variables and zero variables in \(mask_{i}\), respectively. \(D_3\) and \(D_4\) also satisfy the constraint \(D_3+D_4 \le D\), where D is the dimension of \(mask_{i}\).

Figure 3 further explains how to calculate \(D(dec_{i},mask_{j})\) and \(D(mask_{i},dec_{j})\). For \(D(dec_{1}, mask_{1})\) and \(D(dec_{1}, mask_{2})\), we firstly detect the non-zero positions and zero positions in rounded \(dec_1\) according to whether the value at each position equals to 1. Secondly, we pick the same non-zero and zero positions for \(mask_1\) and \(mask_2\) and then separate non-zero and zero variables in groups. Finally, we compute the distances \(D(dec_{1}, mask_{1})\) and \(D(dec_{1}, mask_{2})\) according to Eqs. (6) and (7). Regarding \(D(mask_{1}, dec_{3})\) and \(D(mask_{1}, dec_{4})\), a similar way is adopted, but the only difference lies in the detection of non-zero positions and zero positions. The calculation of \(D(dec_{i}, mask_{j})\) adopts a threshold method: the value of a variable in \(dec_{i}\) greater than 0.6 is considered a non-zero position, and the value of a variable less than 0.4 is classified as a zero position.

Algorithm 4
figure d

TWM(P, \(\Psi _{Mask}\), \(\Psi _{Dec}\))

Two-way matching

Although the matching strategy presented in [32] has addressed the problem to a certain extent, there is still room for improving two-layer encoding. From the contents above, we know that MPC aims to build the preference list of real vectors on binary vectors, which provides more information about the low-level layer during the evolution. It has the potential to improve the matching and establish a two-way association. Based on MPC, an intuitive idea is to design a module that enables the binary and real vectors to match each other’s most preferred individuals based on their mutual preference. Therefore, we propose TWM in this paper, and Algorithm 4 details its entire procedure. Despite that, two critical explanations should be given below.

(1) Two-stage framework: The primary function of the first stage of TWM is performing a one-to-one matching of binary and real vectors when both are in the top ranks of each other’s preference list (lines 1 to 30 in Algorithm 4). In each loop, a binary vector \(mask_{r}\) is randomly selected first (line 8), and \(dec_{s}\) with the first rank in the preference list \(\Psi _{mask_{r}}\) is considered (lines 10–11). Then, TWM conducts the one-to-one matching between \(dec_{s}\) and \(mask_{r}\) if one of the following conditions is satisfied:

  1. (i)

    \(mask_{r}\) is in \(\Psi _{dec_{s}}\), and \(dec_{s}\) has not been matched (lines 12 to 16).

  2. (ii)

    \(mask_{r}\) is in \(\Psi _{dec_{s}}\), and \(dec_{s}\) prefers \(mask_{r}\) over the current matching object \(mask_{p}\) according to \(\Psi _{dec_{s}}\) (lines 17 to 23).

Note that there are some binary and real vectors remaining. The procedure of the first stage cannot provide them with suitable matching objects based on mutual preference. Instead of ignoring these vectors, the second stage of TWM (line 31) reuses them by performing the matching based on Algorithm 2. This procedure aims to maintain valuable solutions for evolution as much as possible.

(2) Incomplete preference list: Instead of using the entire preference list, TWM adopts an incomplete one. In detail, only the first ml indexes in \(\Psi _{mask_i}\) and and the first dl indexes in \(\Psi _{dec_i}\) are maintained, where \(0< ml \le D, 0 < dl \le D\) (lines 3 to 6).

The explanation for the usage of an incomplete preference list is that the number of available vectors will decrease as the matching progresses, which has been mentioned in the motivation of this paper. Furthermore, we have observed that the binary vectors controlled by the low-level layer gradually become consistent with each other during the evolutionary process. The observation may suggest that they are approaching the global optima. However, this may also mean that many of binary vectors are trapped in the local optima. In other words, the real vectors controlled by the high-level layer have a greater chance of combining with one of these similar binary vectors, making the population further approach the local optima. Therefore, we use two parameters ml and dl to constrain the length of the preference list. Its advantage is that it ensures that the vectors in each other’s preference list are prioritized and improves the search capabilities of both sides during evolution. Then, the first stage of TWM aims to assign a relatively good candidate for each binary or real vector, whereas the second stage focuses on addressing the remaining vectors.

Figure 4 empirically depicts the process of MPC and TWM. Through the figure, we can find that a two-way association between the high-level layer and the low-level layer can be built. Compared to PC, MPC can enhance the control of the high-level layer. Besides, TWM can further improve the control of the high-level layer while relaxing the control of the low-level layer as OWM does. Therefore, the synthetical application of two modules can balance the influence of two layers on the encoded individual and thus reaching the balance between the optimizations of zero and non-zero variables.

Fig. 4
figure 4

An example for illustrating MPC and TWM

Algorithm 5
figure e

TS2-SparseEA

Integrating modules into MOEAs

To verify the effectiveness of MPC and TWM, we integrate these two modules into TS-SparseEA [32], the first algorithm that considers the matching strategy. Since then, the new algorithm has been called TS2-SparseEA. The pseudo-code of the algorithm is outlined in Algorithm 5.

To begin with, TS2-SparseEA randomly initializes a population with size N with the help of the two-layer encoding (line 1). Then, a binary weight optimization framework is performed in the first stage (line 2) to find a better sparse initial population in a reduced binary search space. More details of the framework can be found in [32]. In the second stage, N solutions are selected through the tournament selection in each generation (line 4). Then, the initial offspring solutions are generated using crossover and mutation operators (line 5). To solve the issues highlighted in the introduction, MPC and TWM rectify the offspring solutions represented by the two-layer encoding (lines 6 and 7). Finally, the environmental selection identifies N elite solutions based on the non-dominated front number and crowding distance (line 8).

Table 1 IGD values obtained by five compared algorithms on SMOP1-SMOP8 with 100, 500, 1000, and 3000 variables

Since the proposed TS2-SparseEA is an improved version of TS-SparseEA, the time complexity of most of its procedures remains unchanged. Our analysis solely focuses on the complexity of MPC and TWM. For MPC, since both of the calculations of \(\Psi _{Mask}, \Psi _{Dec}\) cost \(N^2\) times (N is the population size), the time complexity of MPC is \(O(N^2)\). As for TWM, the time complexity depends on the number of matches in each stage. Suppose that the number of successful matches in the first stage is T and the length of remained indexes in \(\Psi _{Mask}\) is ml, then the time complexities of the two stages are O(mlN) and \(O(\frac{[1+(N-T)](N-T)}{2})\), respectively. Therefore, the time complexity of TWM is \(O(N^2)\).

Experimental studies

In this section, we conduct extensive experiments to validate the effectiveness of the proposed TS2-SparseEA. Initially, we compare the algorithm equipped with MPC and TWM to some tailored MOEAs (both two-layer encoding-based and one-layer encoding-based algorithms) for sparse LSMOPs on benchmark problems. Subsequently, we perform the ablation study to evaluate the validity of each module. Finally, we discuss the parameters involved in MPC and TWM. All the experiments are conducted on a PC with Intel(R) Xeon(R) CPU E5-2690 v4 @2.60 GHz CPU, 64 GB RAM, Windows 11, and MATLAB R2022b.

Experimental settings

This subsection briefly introduces the compared algorithms and benchmark problems. Unless specified otherwise, we maintain the parameters throughout the experiments at their default values in MATLAB, implemented by PlatEMO [35].

(1) Compared algorithms: Four two-layer encoding MOEAs tailored for sparse LSMOPs are selected for comparison, namely, SparseEA [17], MOEA-PSL [12], PM-MOEA [25] and TS-SparseEA [32]. Also for the sake of fairness, we compare the proposed algorithm with some recent algorithms based on one-layer encoding, namely, S-ECSO [22], SPS [19], S-NSGA-II [21], and LERD [36].

For the parameter setting, we maintain the default parameters in TS-SparseEA with \(r_{eval} = 0.1, ngroup=50\). Besides, the lower sparsity bound \(b_l\) and the upper sparsity bound \(b_u\) in SPS are set to 0.5 and 0.1, which is consistent with the setting in the reference. The other algorithms are parameterless.

(2) Test problems: The sparse multiobjective test suite SMOP is selected for the experiments. The problems in SMOP encompass properties such as multi-modality, deception, and low intrinsic dimensionality. It will cause some difficulties for MOEAs. More details about these benchmark problems can be found in supplementary materials in literature [17].

In the experiments, we keep the sparsity of the Pareto solution as 0.1, which is consistent with the default value in [17], indicating that the number of non-zero variables is ten percent of the total number of decision variables. The number of decision variables for each problem is set to 100, 500, 1000, and 3000, respectively. Accordingly, the number of evaluations for each problem is set to 10,000, 50,000, 100,000, and 300,000, respectively. The number of objective functions for each problem is set to 2, and each compared algorithm has been run 30 times in each test problem to make the experimental results more realistic and reliable.

Fig. 5
figure 5

IGD variation curves obtained by SparseEA, MOEA/PSL, PM-MOEA, TS-SparseEA, and TS2-SparseEA on SMOP1 to SMOP8 with 1000 decision variables

Fig. 6
figure 6

Parallel coordinates plot of the decision variables of solutions obtained by five compared algorithms on SMOP1 with 1000 decision variables

Fig. 7
figure 7

Comparison of the proportion of non-dominated solutions of TS2-SparseEA and TS-SparseEA in the union of the final populations of the two algorithms

Comparison on two-layer encoding-based MOEAs

Table 1 presents the quantitative results of the inverted generational distance (IGD) metric on SMOP1-SMOP8 by five compared algorithms. For each cell in the table, the value outside the brackets represents the average IGD value [37], while the value inside the brackets indicates the variance of IGD. In experiments, we use the Wilcoxon rank-sum test with a significance level of 0.05 to compare the algorithms. The symbol ’\(+\)’ means the algorithm performs significantly better than TS2-SparseEA, ’−’ denotes that the algorithm performs significantly worse than TS2-SparseEA, and ’\(\approx \)’ implies no significant difference between the two algorithms. The best result is highlighted in a gray background in bold.

In general, TS2-SparseEA performs the best on 17 of 32 test problems, while TS-SparseEA outperforms the other MOEAs on 4 of 32 test problems. Especially on SMOP1 and SMOP7, the proposed algorithm has achieved the best performance on these two test problems with all of the 100, 500, 1000, and 3000 decision variables. The above results show that the matching strategy can improve the two-layer encoding, whereas the proposed MPC and TWM can further improve the matching strategy. These two modules can effectively solve the problem of over-control of the low-level layer and promote the co-optimization of non-zero and zero variables. We also note that PM-MOEA performs the best on SMOP4 to SMOP6 compared to the other four algorithms. The reason is that the importance of optimizing the non-zero variables is similar to those zero variables in SMOP4 to SMOP6, while PM-MOEA can continuously optimize variables of indistinguishable importance using evolutionary pattern mining.

Table 2 IGD values obtained by one-layer encoding-based algorithms on SMOP1-SMOP8 with 100, 500, 1000, and 3000 variables
Fig. 8
figure 8

Final populations in objective space obtained by S-ECSO, SPS, S-NSGA-II, LERD, and TS2-SparseEA on SMOP1, SMOP3, SMOP7, and SMOP8 with 1000 and 3000 variables

For a more detailed analysis, Fig. 5 depicts IGD variation curves among the five compared algorithms on SMOP1 to SMOP8 with 1000 variables. We notice that TS-SparseEA and TS2-SparseEA significantly outperformed other algorithms on almost all problems in the early stages of evolution. This is because they both use OWM to weaken the control of the low-level layer to some extent. The only difference is that TS2-SparseEA further considers the two-way association, and optimizing non-zero and zero variables becomes a more collaborative relationship. Therefore, TS2-SparseEA has the opportunity to surpass TS-SparseEA (see SMOP1, SMOP2, and SMOP7 in Fig. 5). Figure 6 further shows the parallel coordinate plots of the decision variables of the solutions obtained by five compared algorithms on SMOP1 with 1000 variables. All the variables located at the right of the red vertical line are zero in the Pareto-optimal solutions. It can be easily observed that the decision variables of the solutions obtained by SparseEA and MOEA/PSL are mostly far from zero. Although PM-MOEA achieves an acceptable performance on zero variables, it has optimized a few decision variables that should have non-zero values to zero. As for TS2-SparseEA, most variables of the obtained solutions are equal to zero, while the non-zero variables are also optimized to the exact values. However, TS2-SparseEA performs better than TS-SparseEA in the case of the optimization of non-zero variables since there are more variables located at the left of the red vertical line that are far away from zero.

To further demonstrate the superiority of TS2-SparseEA over TS-SparseEA, Fig. 7 shows the proportion of non-dominated solutions of TS2-SparseEA and TS-SparseEA in the union of the final populations of the two algorithms. In the figure, each blue matrix represents the number of non-dominated solutions of TS-SparseEA in the union, while the orange one represents the number of non-dominated solutions of TS2-SparseEA. The yellow one indicates the total number of non-dominated solutions in the union. From the figure, it is noticeable that TS2-SparseEA tends to have more non-dominated solutions in the union than TS-SparseEA, particularly in the problems where both two algorithms perform well in Table 1. This phenomenon indirectly indicates that TS2-SparseEA exhibits better convergence and diversity.

Comparison on one-layer encoding-based MOEAs

The one-layer encoding is a classic encoding scheme that has also been extensively studied. We thus compare the proposed algorithm with some one-layer encoding-based algorithms.

Table 2 presents the quantitative results of IGD metric on SMOP1-SMOP8 by five compared algorithms. The symbols and other information in the table are consistent with those described in “Comparison on two-layer encoding-based MOEAs”. According to the Wilcoxon rank-sum test, TS2-SparseEA significantly outperforms S-ECSO, SPS, S-NSGA-II, and LERD on 28, 32, 20, and 32 test instances, respectively. Notably, we have observed that none of the compared algorithms can simultaneously dominate SMOP3, SMOP4, and SMOP5. For example, S-ECSO outperforms the other four algorithms and even outperforms all two-layer encoding-based algorithms on SMOP4. It is because S-ECSO employs a population sparsity-driven approach to design a strongly convex sparse operator, thus effectively addressing the low intrinsic dimensionality issues, such as SMOP4. Figure 8 further plots the final population distributions of all the compared algorithms on SMOP1, SMOP3, SMOP7, and SMOP8 with 1000 and 3000 variables. As can be seen from the figure, S-NSGA-II has achieved a good performance as it can not only efficiently generate the initialized populations but also continuously maintain the sparsity of the population. In summary, the proposed TS2-SparseEA also achieves an overall good performance on most of the problems, which demonstrates its effectiveness.

Ablation study

Since the two modules are used in conjunction, neither will prevent the other from functioning. Therefore, we simultaneously embed both in other algorithms (SparseEA, MOEA/PSL, and PM-MOEA) to verify their versatility in this subsection. Note that we compare both IGD and Hypervolume (HV) [38] metrics among the above three improved algorithms and original algorithms, including SparseEA-MT, MOEA/PSL-MT, and PM-MOEA-MT, which are equipped with MPC and TWM modules.

Table 3 IGD and HV values of three improved algorithms and three original algorithms

Table 3 presents the results of the ablation study. For each cell in the table, the value outside the brackets represents the column metric of the enhanced algorithm, while the value inside denotes the performance of the original algorithm. The symbols ’\(+\)’, ’−’, and ’\(\approx \)’ mean the algorithms equipped with two modules perform significantly better, worse, and approximately equal to the original algorithms. The results show that the three algorithms equipped with MPC and TWM significantly outperform the original algorithms in terms of both IGD and HV metrics. They also indicate that the proposed MPC and TWM have good applicability under the two-layer encoding, which can further simplify the control problem of the low-level layer and promote the co-optimization of non-zero and zero variables.

Table 4 Investigation of parameter effect with Taguchi method on SMOP1 with 1000 decision variables
Table 5 Comparison between TS2-SparseEA without tuned parameters and TS2-SparseEA with tuned parameters on SMOP1-SMOP8 with 100, 500, 1000, and 3000 decision variables

Parameter discussion

As described in Algorithm 4, there are two parameters in the proposed two modules, namely ml and dl. Here, the values of ml and dl determine the length of the preference lists of Mask and Dec, respectively. Changes in ml and dl will either increase or decrease the number of successfully matched vectors in the first stage and a suitable combination of parameters will help the population converge. In this part, we will employ the Taguchi method of the design-of-experiment to explore the effects of these two parameters on the performance of the algorithm.

In detail, we involve three levels for each parameter, i.e., \(ml \in \{1, 5, 10\}\) and \(dl \in \{1, 5, 10\}\), and then consider the degree of influence of each parameter on the three levels by Taguchi method. By doing so, nine orthogonal combinations of levels are generated. Then, we carry out TS2-SparseEA 10 runs with each combination and collect each non-dominated solution set \(N\!D\!S_{i}\). When all the combinations have been tested, the union non-dominated solution set \(N\!D\!S\) is obtained. Finally, the response value (RV) of each combination is the percentage of the solutions from \(N\!D\!S_{1}\) to \(N\!D\!S_{9}\) in \(N\!D\!S\). The larger the RV value, the better the performance of the combination. Based on the RV values, we can compute the average RV for each level of each parameter.

Table 4 reveals the result on SMOP1 with 1000 decision variables for investigating parameter effect with the Taguchi method. It is easily observed that ml is more significant than dl for its more considerable Delta value, which reflects the average RV gap among different parameter levels. We can also observe that the factor level 1 for ml achieves the highest RV value, and the level 2 for dl performs the best. Then, we can recognize that the optimal parameters combination is \(ml = 1\) and \(dl = 5\) for SMOP1 with 1000 decision variables.

Table 5 further compares TS2-SparseEA without tuned parameters and the algorithm with tuning by the Taguchi method. All of the initial parameters, tuned parameters, and corresponding IGD values on SMOP1-SMOP8 with 100, 500, 1000, and 3000 decision variables are exhibited. From the table, TS2-SparseEA cannot enjoy an improvement on most of the problems by using the Taguchi method. The result illustrates that TS2-SparseEA might be not very sensitive to these two parameters.

Conclusion and discussion

Sparse LSMOPs have become a prevalent class of problems in real-life scenarios. Their complexity may even intensify in the case of increasing dimensions. Recent studies leverage sparsity prior information and employ tailored two-layer encoding to tackle sparse LSMOPs, but there is still room for deep consideration. This paper proposes two new modules for improving the two-layer encoding, including MPC and TWM. They can help build a two-way association, which balances the influence of two layers on the encoded individual by relaxing the control of the low-level layer and enhancing the control of the high-level layer. By doing so, the balance between the optimizations of zero and non-zero variables can be facilitated. Moreover, MPC and TWM can seamlessly apply to any two-layer encoding-based MOEAs. Extensive experiments have also demonstrated their capability and superiority in solving sparse LSMOPs.

While the proposed two modules exhibit significant performance, a few limitations still remain. For example, we have observed that the diversity-related decision variables may sometimes mislead MPC and TWM. One possible explanation for the phenomenon is that these variables typically have fluctuating values distributed across different intervals during evolution. Although they are non-zero variables, they can be easily misclassified as zero variables when their values are relatively small. Therefore, our future work will focus on pre-detecting these diversity-related decision variables when using MPC and TWM for the two-layer encoding.