Comparing reference point based interactive multiobjective optimization methods without a human decision maker

Interactive multiobjective optimization methods have proven promising in solving optimization problems with conflicting objectives since they iteratively incorporate preference information of a decision maker in the search for the most preferred solution. To find the appropriate interactive method for various needs involves analysis of the strengths and weaknesses. However, extensive analysis with human decision makers may be too costly and for that reason, we propose an artificial decision maker to compare a class of popular interactive multiobjective optimization methods, i.e., reference point based methods. Without involving any human decision makers, the artificial decision maker works automatically to interact with different methods to be compared and evaluate the final results. It makes a difference between a learning phase and a decision phase, that is, learns about the problem based on information acquired to identify a region of interest and refines solutions in that region to find a final solution, respectively. We adopt different types of utility functions to evaluation solutions, present corresponding performance indicators and propose two examples of artificial decision makers. A series of experiments on benchmark test problems and a water resources planning problem is conducted to demonstrate how the proposed artificial decision makers can be used to compare reference point based methods.


Introduction
Many real-world optimization problems involve conflicting objectives which are to be optimized simultaneously. These problems are known as multiobjective optimization problems. The ultimate aim of multiobjective optimization is to support a decision maker (DM) to find his/her most preferred solution. Up to now, various multiobjective optimization methods have been proposed by both the Multiple Criteria Decision Making (MCDM) and the Evolutionary Multiobjective Optimization (EMO) communities [7,20,30,56]. They can be divided into no-preference methods, a priori methods, a posteriori methods, and interactive methods according to the role of the decision maker in the solution process [21,29]. Among them, interactive methods have been widely developed due to several advantages [9,29,30,33,45]. For example, the DM can learn progressively from the solution process and adjust his/her preferences. In this case, only solutions that the DM is interested in will be obtained. Thus, the computational complexity can be reduced in comparison with approximating the whole set of Pareto optimal solutions.
In the solution process with an interaction method, we can often identify two phases: a learning phase and a decision phase [33]. The learning phase is important for a DM to learn about the possibilities and limitations of the problem. In the decision phase, the DM can focus on the region identified in the learning phase to find better solutions. Different interactive methods require different types of preference information from the DM, such as reference points, desirability of trade-offs, or the classification of objective functions [33]. Among them, reference point based methods are very popular owing to several reasons. For example, it is convenient and intuitive for the DM to specify a reference point consisting of desirable objective function values since the method provides objective function values for the DM, and, thus, no cognitive mapping is necessary [30]. Besides, reference points can be specified without considering the trade-offs among objectives. Thus, the DM's burden is relatively low [24].
The problem of how to compare interactive methods arises naturally with the emergence of diverse interactive methods. A posteriori methods like multiobjective evolutionary algorithms can be compared with some performance metrics (see, e.g., [25,54]). However, comparing interactive methods is not very straightforward. Only limited studies (see, e.g., [3, 4, 8, 10-12, 26, 28, 35, 37, 39, 44, 58]) have focused on this topic and few of them are valid for reference point based interactive methods. In this paper, we concentrate on the comparison of reference point based methods.
The DM is a key element of applying and also comparing interactive methods. Using human DMs to compare interactive methods has some difficulties. Firstly, it may be rather expensive because human DMs involved in the experiments should have appropriate domain expertise. Furthermore, the order of using interactive methods will influence the comparison results because a human DM learns and what he/she has learnt when interacting with a method will affect his/her decisions on the subsequent methods. To eliminate such effects, a sufficiently large number of human DMs is required so that different groups of DMs would use the methods in different orders. Some research involving human DMs in the experiments has been conducted, and the DMs were asked to express their feelings about interactive methods such as ease of use and the degree of satisfaction on the final solution [3,8,11,12]. However, these experiments were limited in e.g., the range of problems studied and the number of human DMs involved, see [6,55].
Considering the above difficulties, a good alternative is to develop artificial DMs (ADMs) which can replace human DMs for comparing interactive methods quantitatively whenever human DMs are not available. In such comparisons, there are no real preferences to follow. Instead, ADMs interact with each interactive method and measure the results of the method. Opposed to using human DMs, experimenting with ADMs is cheaper and less time consuming. Besides, a large number of experiments can be conducted without considering human fatigue [6], and quantitative results can be obtained. Most existing ADMs take forms of utility functions (also referred to as value functions). For example, Malakooti and Ravindran [28] used linear utility functions as ADMs to compare multiobjective linear programming methods. Mote et al. [35] assumed a nonlinear utility function. Reeves and Gonzalez [39] adopted linear and multilinear utility functions. Four types of utility functions including quadratic, square-root, exponential and L 4 -norm were considered in [44]. López-Ibáñez and Knowles [26] utilized linear scalarizing functions to assess an interactive EMO method.
Nevertheless, utility functions as ADMs cannot compare reference point based methods because they cannot provide reference point information required by these methods. Utility functions are only suitable for comparing non ad hoc interactive methods [45] such as trade-off based interactive methods since utility functions can provide the type of preference information required by these methods. To our best knowledge, only a few ADMs have been developed for comparing reference point based methods [1,2,4,37,38]. The ADMs developed in [4,37,38] all consists of a steady part which includes a pre-defined aspiration point or a value function, and a current context which includes the knowledge about the problem gained during the solutions process. In [37], reference points are generated by a decision tree-based approach. The ADM proposed in [4] evolves the reference point through particle swarm optimization. Reference points are generated in [38] by updating the potential region according to the predefined value function. In these ADMs, the search is maintained towards the steady part, which may hinder the exploration of the whole Pareto front. The ADMs proposed in [1,2] distinguish the learning phase and decision phase. Different ways of generating reference points are developed for the two phases. However, these two ADMs are only suitable for comparing reference point based interactive EMO methods.
In this paper, we propose a new ADM for comparing reference point based methods quantitatively. Similar to the ADMs developed in [1,2], our ADM also has a learning phase and a decision phase. It should be pointed out that this is the only similarity between our ADM and those two ADMs. Our ADM is designed with a modular structure containing three modules: learning, evaluation, and decision. The learning module and the decision module are used to generate reference points in the two phases, respectively. Specifically, a structured learning approach is designed in the learning module to generate reference points systematically based on Pareto optimal solutions derived so far to explore different regions of the search space, which facilitates identifying a region of interest (ROI). Besides, a polyhedral cone-based method is proposed to update reference points in the decision module so that the solutions in the ROI can be adjusted according to the current preferred solution. The evaluation module has two roles. One is evaluating solutions found so far and identifying a preferred solution at each iteration of the decision phase. The other is to measure the final results of interactive methods at the last iteration.
Compared to the ADMs developed in [4,37,38], our ADM does not need to converge to a fixed part at the very beginning. On the contrary, it behaves differently in two phases, reflecting the DM's different needs. In comparison with the two ADMs proposed in [1,2], our ADM generates reference points in quite different ways. Those two ADMs assume that solution processes of all reference point based EMO methods being compared are conducted simultaneously. The reference point is updated based on the solutions generated by all methods. If a new method is to be compared, the whole comparison will start from the beginning, and all methods have to be run again. Our ADM treats all methods to be compared individually and does not have this limitation. Moreover, our ADM is applicable with any reference point based methods, not only EMO methods. It should be emphasized that our ADM does not enhance reference point based methods, it interacts with different methods and measures the final results of them for the sake of comparing them automatically.
By adopting utility functions with and without noise to evaluate solutions and providing two associated performance indicators to evaluate the final results in the evaluation module, two examples of the proposed ADM are presented in this paper. They are used to show how the proposed ADM can be utilized to compare reference point based methods and capture the differences among those methods. The main contributions are summarized as follows: • A new modular ADM is proposed for comparing reference point based interactive multiobjective optimization methods automatically and quantitatively. It is not limited to some specific type of methods, only the preference information must be a reference point. • The proposed ADM takes into account different needs that typically can be seen in the learning and the decision phases, and two different mechanisms of generating reference points are developed according to different needs. • The proposed ADM's modular structure offers a flexible way of creating different type of ADMs which mimic different interaction processes. • Two performance indicators are proposed to be used with the new ADM.
The rest of the paper is organized as follows. In Sect. 2, some concepts and notations are given. Section 3 is devoted to presenting the proposed ADM in terms of its actions in each phase and proposing two performance indicators to be used with the ADM. Two examples of the ADM are provided, and how to utilize one of them with a reference point based method is also illustrated. Sections 4 and 5 demonstrate how the ADM can be used to compare different reference point based methods on benchmark test problems and on a real-world problem, respectively. Finally, conclusions are drawn and some future work is discussed briefly in Sect. 6.

Concepts and notations
Generally, a multiobjective optimization problem can be defined as follows: where k (k ≥ 2) objective functions f i : S → R for i ∈ {1, 2, . . . , k} are to be minimized simultaneously. The decision vector x = (x 1 , x 2 , . . . , x n ) belongs to the feasible region S ⊂ R n . For each x, the objective vector z = f(x) belongs to the objective space R k . The image Z = f(S) of the feasible region is called the feasible objective set. Given two feasible decision vectors x 1 and x 2 , x 1 is said to dominate x 2 if x 1 is not worse than x 2 in all objectives and strictly better in at least one objective. A feasible decision vector x is said to be Pareto optimal if and only if there is no feasible decision vector which dominates it. The corresponding objective vector is called a Pareto optimal objective vector. We denote the set of all Pareto optimal decision vectors as E. The image f(E) of E is called Pareto front. Note that we use the term Pareto optimal solution to refer to a Pareto optimal objective vector in this paper. The most preferred solution (MPS) is the Pareto optimal solution that the DM is most satisfied with [33].
As stated in the introduction, we concentrate on comparing reference point based methods, rather than proposing or enhancing such kind of methods. In the following, we give a brief introduction on reference point based methods. A reference point g = (g 1 , g 2 , . . . , g k ) is composed of aspiration levels of the DM for all objectives reflecting desirable objective values. It is said to be achievable if its aspiration levels can be achieved or improved simultaneously by a feasible solution; otherwise, it is said to be unachievable [41]. When providing reference points, it is often useful for the DM to know the ranges of the objective values in the Pareto front. We denote z * = (z * 1 , z * 2 , . . . , z * k ) with z * i = min x∈S f i (x) for i = 1, 2, . . . , k to be the ideal objective vector which gives the minimum value of each objective in the feasible region. The nadir objective vector z nad = (z nad 1 , z nad 2 , . . . , z nad k ) is composed of the maximum value of each objective in the Pareto front, i.e., z nad It is usually difficult to obtain z nad and one may have to settle for approximations, for example, a payoff table [29] or other special ways [15,47]. An estimate of z nad can also be determined by using the information provided by the DM, if available.
At each iteration, a reference point based method uses the reference point(s) provided by the DM to generate one or more Pareto optimal solutions. If the DM is not satisfied with any solution, he/she is expected to specify a new reference point for the method to generate new solutions. Figure 1 illustrates how an example of a reference point based method works in two iterations. This method produces k+1 Pareto optimal solutions based on one reference point. The solution process will continue until the DM finds a satisfactory solution.
Up to now, many reference point based methods have been proposed. Some methods, such as the reference point method [51], the reference direction (RD) method [23], the light beam search (LBS) [22], and the satisficing trade-off method [36], rely on solving transformed single-objective subproblems to generate Pareto optimal solutions. This means that at each iteration, they formulate a scalarizing function like an achievement scalarizing function (ASF) based on the DM's reference point and minimize it by using an appropriate single-objective optimizer. Various forms of ASFs have been developed (see, e.g., [27,31,42,51]), and a common ASF is the following augmented ASF [52]: where w = [w 1 , . . . , w k ] is a weighting vector and ρ is a sufficiently small positive number. One can prove that it generates Pareto optimal solutions for both achievable and unachievable reference points and any Pareto optimal solution with trade-offs between ρ and 1/ρ [29].  Some reference point based methods solve (1) directly by using a multiobjective evolutionary algorithm at each iteration. They use the DM's reference point to guide the evolution of the population towards the DM's preferred region on the Pareto front. If the DM does not find any satisfactory solution, he/she can supply a new reference point. A new run of the multiobjective evolutionary algorithm will be implemented to find new solutions. Even though evolutionary algorithms cannot usually guarantee Pareto optimality, for simplicity we refer to their solutions as Pareto optimal ones here. Examples of this type of methods include R-NSGA-II [17], RD-NSGA-II [13], LBS-NSGA-II [14], g-NSGA-II [34], r-NSGA-II [43], preference based evolutionary algorithm (PBEA) [49], interactive weighting achievement scalarizing function genetic algorithm (interactive WASF-GA) [40], and others, for surveys, see, e.g., [5,53].
In the next section, the ADM we build for comparing reference point based methods is introduced.

Proposed ADM
To compare reference point based methods, the ADM needs to participate in the solution process. The interaction between the ADM and a method to be considered is shown in Fig. 2. The ADM has solutions provided by the method as input and a new reference point as output. The structure of the proposed ADM is given in Fig. 3. We assume that the ADM will first have a learning phase and then start a decision phase. The learning and decision modules are responsible for generating reference points in the learning phase and the decision phase, respectively. The evaluation module works in the decision phase. We denote the iteration number in the whole solution process by t, and the iteration number in the decision phase by t d which counts from 0 when this phase starts. In what follows, the ADM is introduced

ADM's actions in the learning phase
Owing to the modular structure of the proposed ADM, different types of learning modes of ADMs can be developed to simulate different ways of specifying reference points. As stated in the introduction, we focus on a structured learning approach to facilitate a systematic learning of feasible solutions. At each iteration, the solutions which have been generated by the considered interactive method are used to identify a relatively large unexplored region of the objective space. Then, a reference point is determined accordingly with the desire of obtaining new solutions in that region. Here an unexplored region refers to a region inside which no solutions have been found by the considered method yet. The details are as follows.
Firstly, we determine a series of unexplored regions by finding neighboring solutions. We denote an extreme point as the objective vector which has the minimum value of one of the objective functions on the Pareto front [48]. Considering all the extreme points and all the non-dominated solutions generated by the interactive method and passed to the ADM, we give a general definition of neighboring solutions for any number of objectives: For any two solutions z a and z b , denote z ab = [z ab 1 , . . . , z ab k ] with z ab i = min(z a i , z b i ), i = 1, 2, . . . , k. Then, z a and z b are defined as neighbors if no other solutions are dominated by z ab . Under this definition, the region dominated by z ab is an unexplored region where the interactive method has not generated solutions inside it yet. . For problems with more than three objectives, it is also easy to find all neighboring solutions because we only need to determine the dominance relations of solutions. By finding all pairs of neighboring solutions, a series of unexplored regions of the objective space can be obtained.
Secondly, as the learning phase aims to learn about the possibilities of the problem, our ADM is assumed to regard the "largest" unexplored region as promising and want to explore it for finding new solutions. That is, among multiple unexplored regions, the "largest" one is identified as the next region to be explored. Considering that different unexplored regions may overlap when they spread upwards while what we really want to search is the region near the Pareto front, we use the distance between each pair of neighboring solutions to measure the size of each unexplored region. An unexplored region is regarded as the "largest" if Neighboring solutions in a three-objective case. The gray region is the unexplored region determined by A and D the corresponding pair of neighboring solutions has the largest distance among all pairs' distances. Different forms of distance can be used and the normalized Euclidean distance is used in this paper: where z * * = [z * * 1 , z * * 2 , . . . , z * * k ] is a utopian objective vector which is slightly better than z * . The components of z * * are given by z * * i = z * i − ε i for all i = 1, 2, . . . , k where ε i is a small positive number [45]. The utopian point is used instead of z * in case that the denominator in (3) is zero or very small.
Note that when the Pareto front is disconnected or an interactive method poorly responds to the reference point, there may be the case that none of the newly generated solutions locate within the desired largest unexplored region. In this case, the largest unexplored region may always be selected as the largest in the following iterations. To avoid this situation, the same region is allowed to be chosen only once.
Finally, suppose the two neighboring solutions corresponding to the largest unexplored region are z a * and z b * , the point . . , k is taken as the new reference point. With this point, the ADM wants to explore the largest unexplored region for finding solutions in it.
To sum up, the steps of the structured learning approach for generating a new reference point is as follows: 1. Find all pairs of neighboring solutions among all extreme points and all the solutions generated by the interactive method and passed to the ADM. 2. Select the pair with the largest normalized Euclidean distance, denoted by z a * and z b * . Fig. 6 Impact of the initial reference point on the second one: a the initial reference point is close to the Pareto front; b the initial reference point is far away from the Pareto front . When the initial reference point g 1 is close to the Pareto front, the solutions obtained by an interactive method are likely to be close to each other as shown in Fig. 6a where B, C, and D are solutions obtained based on g 1 . Then, g 2 will be relatively far away from the Pareto front and a large unknown region will get explored. When g 1 is far away from the Pareto front, the solutions may be far away from each other as exhibited in Fig. 6b. The structured learning approach can lead the method to fill in those large and unexplored regions determined by these solutions step by step. All in all, by exploring the largest unexplored region at each iteration, the proposed ADM can learn about the Pareto front for identifying its ROI which is the region around its preferred solution among all the solutions passed to it in the learning phase.

ADM's actions in the decision phase
As mentioned, the decision phase contains the evaluation module and the decision module. In what follows, the tasks of the two modules are introduced.

Task of the evaluation module
At each iteration of the decision phase, the evaluation module evaluates all the solutions already generated and passed to the ADM and identifies the ADM's preferred solution at the moment. At the last iteration, it is also responsible for measuring the quality of the ADM's preferred solution. In our implementations, we adopt utility functions and corresponding performance indicators to achieve quantitative evaluation and measurement. Different kinds of utility functions can be used, so that we can experiment with different ADMs to see for example how reference point based methods perform under different styles of providing preferences and how their outcomes differ. In this paper, we consider the following two utility functions as examples: 1. Deterministic utility function. We denote a utility function by U (z) ∈ R. The preferences of the ADM with this utility function are always stable in the decision phase. To distinguish from a random utility function defined below, we call U a deterministic utility function. 2. Random utility function. By introducing noise into U we get a utility function which we call a random utility function with the form U (z, The preferences of the ADM adopting this utility function are unstable. Its noise follows a normal distribution N (0, σ t d ) with a mean 0 and a standard deviation σ t d . There are different choices for setting up σ t d such as keeping it as a constant or making it decrease with t d . A decreasing σ t d implies that the preferences of the corresponding ADM are getting more and more stable as the decision phase proceeds.
Incorporating each of the above two utility functions, two examples of the proposed ADM can be obtained, and these two ADMs will behave differently in the decision phase. Each ADM can be used to compare reference point based methods separately. In our experiments, we use both of them to compare not only the performance of different reference point based methods, but also their capability under different styles of providing preferences. In fact, other types of utility functions can also be chosen.
Since maximizing a utility function U is equivalent to minimizing the disutility function U − = −U , the following deterministic disutility function U − and random disutility function U − will be used in our experiments to form two different ADMs: where w = [w 1 , . . . , w k ] is a weighting vector, σ t d is assumed to decrease gradually and be reduced to zero at the final iteration, which means that the randomness of U − is gone in the end and U − becomes the same as U − . By using a disutility function, solutions generated to the ADM can be compared (and ordered), and the solution with the minimum disutility value at each iteration, denoted by , can be recognized. Particularly, the final output of the interactive method, namely the solution most preferred by the ADM finally, is called the final solution and is denoted by z f inal . Note that where z f inal is chosen from depends on the interactive methods to be compared. In the literature, many methods like the reference point method [51] only show newly obtained solutions to a DM at each iteration. The final solution will be chosen from the solutions obtained at the final iteration. Few interactive methods like the LBS [22] allow a DM to save a set of desirable solutions and select a satisfactory solution from this set. Naturally, one can keep an archive to store all obtained solutions in real implementations. In our experiments in Sect. 4, the final solution is chosen from solutions obtained at the final iteration.
Denote the solution which minimizes U − by z M P S . It represents the real MPS of both U − and U − . The corresponding disutility value, denoted by U − * , gives the minimum value of U − . The maximum value of U − on the Pareto front is denoted by U −max . An example of a performance indicator is the normalized difference between the disutility value of z f inal and U − * : The smaller the difference value is, the better the final solution is. As z f inal is selected according to the disutility function, it may not be the one closest to z M P S in the sense of the Euclidean distance. So we use the following normalized Euclidean distance between z f inal and z M P S as the second performance indicator: (7)

Remark 1
The reason for the evaluation module to be involved only in the decision phase is to enable learning about the problem in the learning phase. Alternatively, one can assume that it can be adopted in the learning phase and can guide the generation of new reference points. Note that if an ADM uses the evaluation module in the learning phase in the same way as it does in the decision phase, it is actually reduced as an ADM which has no learning phase but only one decision phase.

Task of the decision module
According to the current preferred solution z best,t−1 , the decision module generates a new reference point g t with the purpose of exploiting the neighborhood of z best,t−1 to find better solutions. The details are as follows.
We denote the set of all the generated solutions and all extreme points by {z 1 , z 2 , . . . , z m t } where m t is the size of this set. The decision module determines a polyhedral cone with z best,t−1 inside it or lying on the boundary and takes the vertex of the cone, denoted by is taken as the new reference point. The vertex z c is taken as the new reference point because we assume that the ADM finds the polyhedral cone associated with the current preferred solution to be promising and wants to find better solutions in it. If an interactive method responds to the guidance of the new referent point g t well and generates at least one solution in the current polyhedral cone, this cone tends to be narrowed down. For example, in Fig. 8, F is a newly obtained solution at the tth iteration. If C is still the most preferred one found so far, then the new polyhedral cone will be the region dominated by G. If F is preferred to C and suppose it is z best,t , then the new polyhedral cone will be the region dominated by H. In both cases, the polyhedral cone is narrowed down to facilitate a more focused search.
We have now introduced the actions of the proposed ADM in each phase. It is clear that the ADM's actions in the two phases differ a lot. Firstly, the ADM explores different regions in the learning phase to gain more knowledge about the problem, while in the decision phase, the ADM concentrates on a surrounding region of a promising solution to refine solutions.  Secondly, in the learning phase, only the information on the locations of generated solutions is used to generate reference points. In the decision phase, solutions are evaluated and the most preferred one is identified so as to determine new reference points.
By adopting two different types of utility functions, two examples of the ADM can be acquired. We denote them by ADM1 and ADM2. The deterministic utility function reflects the stability of ADM1's preferences, while the random one indicates that ADM2's preferences are unstable but are getting clearer as more solutions are obtained.

Remark 2
Though only one learning phase and one decision phase are considered in our ADM, the two phases can actually be carried out repeatedly and an ADM's actions in a later learning or decision phase can be changed. For example, we can first utilize an ADM which generates reference points randomly in the learning phase and uses a random utility function in the decision phase. Next, we conduct the two phases again with ADM1. This implies that the preferences of the composite ADM become stable in the second decision phase.

Remark 3
It is also possible to switch interactive methods in different phases to judge their performance in a single phase. For instance, one can use interactive methods A and B to find solutions in the learning phase and use method C in the decision phase. In this way, the performance of A and B in the learning phase can be compared. Through identifying the strengths and weaknesses of interactive methods in each phase, new interactive methods can be designed by combining the advantages of existing methods which behave well in different phases.

ADM's steps
Recall that the iteration numbers in the whole solution process and in the decision phase are t and t d , respectively. Denote the largest allowable numbers of iterations in the learning phase and the decision phase by T l and T d , respectively. To evaluate the final solutions obtained by different interactive methods under the same number of iterations, the overall numbers of iterations of the ADM in each phase are assumed to be fixed. In other words, the ADM switches from the learning phase to the decision phase when t > T l and the decision phase is terminated when t d > T d . In what follows, the steps of how the ADM interacts with a reference point based method and evaluates the final solution are given. The corresponding flowchart is presented in Fig. 9. Step 1 (Initialization of the learning phase) Set t = 1, t d = 0. Generate an initial reference point g 1 randomly or let the user of the ADM specify it within the hyperbox determined by z * and approximate z nad . Let the interactive method generate Pareto optimal solutions based on g 1 .
Step 2 (Learning phase) Let t = t + 1. The ADM provides a new reference point g t according to the structured learning approach in Sect. 3.1. Then, the interactive method generates corresponding Pareto optimal solutions. Repeat this step until t = T l . Then, go to Step 3.
Step 3 (Decision phase) Set t = t + 1, t d = t d + 1. The ADM generates a new reference point g t according to Sect. 3.2. Obtain new Pareto optimal solutions by using the interactive method. Repeat this procedure until t d = T d . Use the evaluation module to determine a solution as the final output and calculate the performance indicator value of it. Then, the whole process is terminated.

An example of how the ADM works
Now we illustrate how the proposed ADM works with a reference point based method on the bi-objective problem ZDT1 [57]. ADM1 is used as an example and a simple linear disutility function U − ( f 1 , f 2 ) = f 1 + f 2 is used to evaluate solutions. Figure 10 shows the contour lines (dotted lines) of the disutility function and the MPS (z M P S = [0.25, 0.5]). The curve represents the Pareto front of ZDT1. The interactive method we use is the reference point method [51] which uses the DM's reference point and k perturbed reference points to generate k + 1 Pareto optimal solutions by minimizing k + 1 ASFs at each iteration. The augmented ASF shown by (2) is adopted and is minimized by using the differential evolution algorithm DE/rand/1/bin [46]. Three and two iterations are implemented in the learning phase and the decision phase, respectively.
All the reference points provided by ADM1 and Pareto optimal solutions generated by the reference point method [51] in the learning phase are shown in Fig. 11. We set the initial reference point as g 1 = [0.5, 0.1]. Three Pareto optimal solutions z 1,1 , z 1,2 , and z 1,3 are generated based on g 1 . The new reference point g 2 is generated based on z 1,1 and the extreme point z = [0, 1] which are neighbors and have the largest normalized Euclidean distance among all pairs of neighbors. Then, another three Pareto optimal solutions z 2,1 , z 2,2 , and z 2,3 are obtained. Similarly, the third reference point g 3 is determined and solutions z 3,1 , z 3,2 , and z 3,3 are generated.
At the fourth iteration, the decision phase begins. According to the disutility function, z 2,2 is the solution with the minimum disutility value found so far, i.e., z best,3 = z 2,2 as shown in Fig. 12a. For the first objective, z 2,1 1 is the largest objective value among all the objective values which are smaller than z best,3 1 . For the second objective, z 2,3 2 is the largest objective value among all the objective values less than z best,3 2 . Thus, g 4 = [z 2,1 1 , z 2,3 2 ] is the new reference point. Another three Pareto optimal solutions z 4,1 , z 4,2 , and z 4,3 corresponding to g 4 are generated and z 4,2 has the minimum disutility value now, i.e., z best,4 = z 4,2 . Figure 12b presents the new reference point g 5 = [z 2,2 1 , z 4, 3 2 ] and three new solutions z 5,1 , z 5,2 , and z 5,3 . The solution z 5,1 overlaps z M P S and it has the minimum disutility value. Its two performance indicator values are all zero according to (6) and (7), which means that the reference point method behaves well on ZDT1 under the evaluation of ADM1.

Numerical experiments on benchmark test problems
In this section, we use ADM1 and ADM2 as examples to demonstrate how the proposed ADM can be utilized to compare reference point based methods on benchmark test problems. Four popular methods, i.e., the reference point method [51], R-NSGA-II [17], g-NSGA-II [34], and r-NSGA-II [43] are used as examples of reference point based methods. As introduced in Sect. 3.3.2, the reference point method generates k+1 Pareto optimal solutions by minimizing k + 1 ASFs at each iteration. The other three methods use the DM's reference point to modify NSGA-II [16]. Nevertheless, their ways of modification are different. R-NSGA-II modifies the crowding distance mechanism of NSGA-II to prefer solutions close to the reference point. g-NSGA-II uses g-dominance relation to replace the Pareto dominance in NSGA-II to emphasize solutions which satisfy all aspiration levels or achieve none of the aspiration levels. r-NSGA-II substitutes r-dominance for the Pareto dominance in NSGA-II to make solutions closer to the reference point more preferred. In the literature, the three methods are usually used as a priori methods. We use them interactively in this paper as follows. At each iteration, the modified NSGA-II is performed for a certain number of generations to find Pareto optimal solutions corresponding to the ADM's current reference point. The procedures of the modified NSGA-II at different iterations are independent. That is to say, the modified NSGA-II is rerun by starting from a newly randomly generated initial population at each iteration. In our experiments, the codes of the latter two methods come from the MATLAB platform PlatEMO [50].

Parameter settings of interactive methods
For the reference point method, we adopt the same implementation as used in Sect. 3.3. The parameter values used in DE/rand/1/bin are: (1) population size N P = 5n; (2) the scaling factor F = 0.5; (3) the crossover probability C R = 0.5; and (4) the maximum number of generations G max = 400. Note that DE/rand/1/bin is called k + 1 times at each iteration of the reference point method. For the other three methods, the population size is 100 and 200 for three-objective and five-objective problems, respectively. The clearing parameter in R-NSGA-II is set as 0.01 [17] and the non-r-dominance threshold δ in r-NSGA-II is set as 0.3 [43]. Since these three methods generate many solutions at each iteration, we select k + 1 solutions to the ADM by clustering. This implies that the four methods will present the same number of solutions to the ADM at each iteration. To get comparable results with the four methods, their total numbers of function evaluations are kept the same at each iteration.

Parameter settings of ADMs
It has been reported in the literature that the median number of iterations performed with interactive methods is often between three and eight [19]. Too many iterations will bring a heavy burden to the DM. On the other hand, a reference point based method is likely to have better results when more iterations are adopted since more solutions can be obtained. In order to allow the four methods to have better performance within an acceptable number of iterations, we set the overall number of iterations T as eight. In addition, we set T l = 5, T d = 3. In fact, one can set T , T d , and T l the way one wishes. The formulas of the disutility functions in (4) and (5) are used. Table 1 gives the weighting vectors used in them. For each test problem, the difference between z nad and z * in each dimension is large enough, so we use z * directly instead of z * * (i.e., ε i = 0, for all i ∈ {1, . . . , k}) in (4), (5), and (7). For ADM2, the standard deviations of the noise of U − in the decision phase are set as σ 1 = 0.2 × (U −max − U − * ), σ 2 = σ 1 /2, σ 3 = σ 2 /2. After the solutions generated at the final iteration are given to the ADM, σ t d will be reduced to zero, which means that the noise of ADM2's disutility function disappears finally. The MPS on each test problem and corresponding optimal disutility value are listed in Table 1.

Initial reference point
We use four different initial reference points for each test problem, as listed in Table 2. They are denoted by irp1, irp2, irp3, and irp4 where irp1 and irp2 are unachievable while irp3 and irp4 are achievable. Besides, irp1 and irp3 are close to the Pareto front while irp2 and irp4 are far away from the Pareto front. On each test problem, the two ADMs will use the same four initial reference points when applying each interactive method. The approach of generating various initial reference points is given in the supplementary material.

Comparison of the four methods by using the proposed ADM
In this subsection, the proposed ADM1 and ADM2 are utilized to compare the four reference point based methods. Since randomness is involved in the solution process, the solutions generated by each method with the same reference point may vary in different runs. Hence, for every ADM, each method is run 21 times independently on each test problem under each initial reference point. This means that each method is applied for a total of 2 × 4 × 10 × 21 = 1680 times. In each run, the values of the two indicators difference and distance are calculated by using (6) and (7), respectively.
• Comparison according to the mean values. The mean values and the standard deviations of each method over 21 independent runs are listed in Tables 3 and 4. In these two tables, the first three columns show the test problem, the number of objectives, the initial reference point, respectively. The fourth to seventh columns present results of the four methods in terms of difference. Note that the reference point method is abbreviated as RPM. The values in parentheses are the standard deviations. Similarly, the eighth to eleventh columns give results in terms of distance. In order to facilitate comparison, we ranked the four methods on each instance by using 1, 2, 3, and 4 according to their mean values. The method with rank 1 has the smallest mean value. All the ranks are listed in the fourth to eleventh columns of each table. The average rank of each method over 40 instances is shown in the last row of each table.
According to Tables 3 and 4, no method is absolutely superior to the other methods on all instances. The reference point method ranks the first on more than half of the instances under each indicator, no matter which ADM is utilized. Most of its mean difference values are less than 10% and the corresponding standard deviations are less than 5%, which means that it can generally find solutions close to the real MPS and it has a relatively good stability.

Table 3
Results of the four methods when using ADM1 to compare them  Table 4 Results of the four methods when using ADM2 to compare them R-NSGA-II usually ranks the last on three-objective DTLZ2 and DTLZ4, and ranks the second or the third on the other test problems. Similarly, r-NSGA-II ranks the second or the third in most cases. From the average rank, R-NSGA-II and r-NSGA-II rank the second or the third among the four methods. When ADM2 is utilized, R-NSGA-II is slightly better than r-NSGA-II in terms of each indicator. When ADM1 is used, R-NSGA-II has a better difference value while r-NSGA-II is better under the distance indicator. This shows that a better disutility value and a closer distance are not always consistent. It is meaningful to use different indicators to measure the methods' performance.
Although ranking the first or second on three-objective DTLZ7, g-NSGA-II ranks the last on most of the other instances, which makes it the worst one among the four methods. Meanwhile, its mean difference values on DTLZ1 and DTLZ3 are very large. When checking the solutions obtained by g-NSGA-II, we found that the final population of g-NSGA-II at each iteration is usually far away from the Pareto front, especially on DTLZ1 and DTLZ3. This shows that g-NSGA-II has a relatively weak capability to converge to the Pareto front. In fact, g-NSGA-II has a drawback that it may prefer a solution a to another solution which dominates a in the Pareto sense, which can hinder its convergence.
From the differences between the mean values on three-and five-objective problems, it can be observed that the reference point method has no significant differences while the other three methods generally get better mean values on three-objective problems. This reflects the degradation of the three methods' performance with the increase of the number of objectives.
• Comparison according to the boxplots. For the sake of an intuitive comparison of the four methods, the boxplots of the difference/distance values over 21 independent runs of each method are drawn. Here we only present the boxplots of the difference values when ADM1 is used, as shown in Fig. 13. All the boxplots can be found in the supplementary material and they give similar results to Fig. 13 does. In Fig. 13, M1, M2, M3, and M4 represent the reference point method, R-NSGA-II, g-NSGA-II, and r-NSGA-II, respectively. As the difference values of g-NSGA-II are much larger than those of the other methods on several problems, in order to see the differences among the other methods, the maximum difference value of the boxes is restricted to be no more than 100%. Values larger than 100% are not shown.
From Fig. 13, we can see that the difference values of the reference point method are usually smaller and the lengths of the boxes are usually shorter than the other methods on most test problems. This means that the reference point method generally finds better solutions and is more stable than the others. g-NSGA-II performs rather badly on three-and five-objective DTLZ1, DTLZ3. However, it has relatively smaller boxes which are closer to zero than the other three methods on three-objective DTLZ7. R-NSGA-II and r-NSGA-II usually rank in the middle among the four methods. These observations are similar to what we have derived according to the mean values and standard deviations of the four methods.
• Comparison according to statistical tests. To compare the four methods pairwise, the Wilcoxon rank sum test at a significance level of 0.05 is conducted on each pair of methods. We use three symbols +, ≈, and − to represent that method i performs statistically better than, equal to, and worse than method j, respectively. The numbers of these symbols over 40 instances with respect to each ADM and each indicator are given in Tables 5, 6, 7 and 8.
In Tables 5, 6, 7 and 8, both the indicators give rather similar results. The reference point method performs statistically better than or equal to the other three methods in most cases, no matter which ADM is used. On the contrary, g-NSGA-II is inferior to the other methods in more than a half of the instances. These observations are consistent with what we found from the mean values in Tables 3 and 4. R-NSGA-II is a little better than r-NSGA-II when ADM1 (a) (b) Fig. 13 Boxplots of difference values (%) over 21 independent runs for each method when using ADM1 to compare methods: a boxplots on 3-objective test problems; b boxplots on 5-objective test problems    is used, and the two methods perform similarly under ADM2. This is slightly different from the comparison result of them according to their average ranks in Tables 3 and 4.

Influence of the initial reference point
According to Tables 3 and 4, the mean difference/distance values of some methods when using different initial reference points differ on some problems. To see whether the initial reference point influences the results of each method, we conducted the Wilcoxon rank sum test at a significance level of 0.05 to compare the results of each method under the following pairs of initial reference points: (1) irp1 and irp2, (2) irp1 and irp3, (3) irp2 and irp4, and (4) irp3 and irp4. Tables 9, 10, 11 and 12 present the numbers of +/ ≈ /− over the ten test problems for each pair of initial reference points. Each table corresponds to a method. According to Table 9, the reference point method performs better on more than a half of the instances when using irp2 than using irp1. Under other pairs of initial reference points, the reference point method behaves inversely on different problems. Generally, irp3 leads to better results on more problems than irp1 does, while it is slightly inferior to irp4 in terms of distance; irp2 is slightly better than irp4. Hence, the reference point method usually gets  irp2 versus irp4 0/10/0 1/9/0 1/9/0 2/8/0 irp3 versus irp4 0/10/0 0/10/0 0/10/0 1/9/0 better results when using irp2 as compared with using other initial reference points, which implies that this method is more likely to obtain a final solution closer to the MPS when the ADM's initial reference point is unachievable and far away from the Pareto front on the ten test problems. Table 10 shows that R-NSGA-II usually has opposite performance on different problems under irp1, irp2, and irp3. In most cases, R-NSGA-II performs better under irp4 than irp3, which means that between the two achievable initial reference points, R-NSGA-II tends to behave better when the initial reference point far away from the Pareto front is used.
From Tables 11 and 12, g-NSGA-II and r-NSGA-II are insensitive to the location of the initial reference point. On more than a half of the problems, the results under different initial reference points are statistically indifferent.

Numerical experiments on a water resources planning problem
This section illustrates how to use the proposed ADM to compare reference point based methods on a real-world problem, namely, a water resources planning problem [32]: where x 1 and x 2 are the total man-hours devoted to building a dam and the mean radius of the lake impounded (in miles), respectively. The three objectives f 1 , f 2 , and f 3 represent the cost of construction, the water loss, and the total storage capacity of the reservoir, respectively. We compare four reference point based methods, i.e., the reference point method [51], R-NSGA-II [17], g-NSGA-II [34], and r-NSGA-II [43]. We consider an ADM with the following disutility function where z * and z nad are the ideal point and the nadir point of the water resources planning problem, respectively. The MPS of this ADM is z M P S = [50.92, 25, −50.34]. The minimum disutility value is 0.5. Similar to Sect. 3.3, DE/rand/1/bin is adopted as a solver to minimize the augmented ASF formed by the reference point method. The parameter values of DE/rand/1/bin are set as: (1) population size N P = 20; (2) the scaling factor F = 0.5; (3) the crossover probability C R = 0.5; and (4) the maximum number of generations G max = 200. For the other three methods, we set N P = 40 and G max = 400. The overall numbers of function evaluations of the four methods are the same. With regard to the ADM, the numbers of iterations in the learning phase and the decision phase are both set as 3. Since the reference point method obtains four solutions at each iteration, the final population of each of the other three methods are clustered into four groups and four representative solutions are passed to the ADM at each iteration for fair comparison.
We have an unachievable initial reference point p1 = [30, 15, −80]. Each method was run 20 times. The difference and distance indicator values were calculated in each run. Table 13 shows the mean values and the standard deviations (in parentheses) of the indicators for the four methods over 20 independent runs. According to the mean difference values in Table 13, the order of the four methods from the best to the worst was g-NSGA-II, RPM, R-NSGA-II, and r-NSGA-II. In terms of the mean distance values, g-NSGA-II and RPM still ranked the first and the second, respectively, while r-NSGA-II performed better than R-NSGA-II. It should be noted that the deviations of the two indicator values for the reference point method are both zero. This means that the reference point method found the same final solution in all the 20 runs and it is rather stable when solving this water resources planning problem.
In each of the 20 independent runs, the 24 solutions obtained by each reference point based method in the solution process were recorded. Figure 14 shows the MPS and all the 24 solutions obtained by each method in the run when the method has the best distance indicator value among 20 runs. It can be seen that for each method, as expected, the solutions generated in the learning phase have a wide spread in the objective space, and the solutions obtained in the decision phase are more densely distributed in a small region. This implies that the ADM realized a wide exploration in the learning phase and a focused exploitation in the decision phase when it worked with each reference point based method. Now let us focus on the ability of each method in converging to the MPS. We can find in Fig. 14 that the solutions obtained by the reference point method, g-NSGA-II, and r-NSGA-II in the decision phase locate around the MPS, while the solutions generated by R-NSGA-II deviate from the MPS as a whole. This helps explain why R-NSGA-II ranked the last in terms of the distance indicator in Table 13. In order to see more clearly the closeness of solutions to the MPS through visualization, Fig. 15 displays only the solutions constrained in the neighbor of the MPS among the 24 solutions. Generally, g-NSGA-II found more solutions close to the MPS than the other methods, while R-NSGA-II seldom obtained solutions near the MPS.

Conclusions
In this paper, we have proposed an ADM to compare reference point based interactive methods quantitatively. Owning to a modular structure, the ADM can achieve automatic interaction with the methods to be compared and automatic evaluation of the final results. No human DMs are involved in the process. Conducting a series of experiments on multiple benchmark test problems and a water resources planning problem, we have demonstrated how the proposed ADM can be used to compare reference point based methods. The performance differences among different methods have been captured by analyzing the experimental results.
In addition to the ADM proposed in this paper, the modular structure we present can also be utilized to design other types of ADMs. Our future research will focus on building more ADMs with different abilities for a more comprehensive comparison of reference point based methods. For example, we intend to build ADMs which are able to generate multiple reference points at the same time. appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.