Multiobjective dynamic programming in bipolar multistage method

The multicriteria Bipolar method can be extended and used to control multicriteria, multistage decision processes. In this extension, at each stage of the given multistage process two sets of reference points are determined, constituting a reference system for the evaluation of stage alternatives. Multistage alternatives, which are compositions of stage alternatives, are assigned to one of six predefined hierarchical classes and then ranked. The aim of this paper is to show the possibility of finding the best multistage alternative, using Bellman’s optimality principle and optimality equations. Of particular importance is a theorem on the non-dominance of the best multistage alternative, proven here. The methodology proposed allows to avoid reviewing each multistage alternative, which is important in large-size problems. The method is illustrated by a numerical example and a brief description of the sustainable regional development problem. The problem can be solved by means of the proposed procedure.


Introduction
The reference point methodology is one of the most important concepts in multiple criteria decision making and has been extensively studied in the past. Elements of this methodology can be found in compromise programming (Zeleny 1974), goal programming (Romero 1991) and reference point approaches (Wierzbicki 1977). In many studies, the ideal and nadir elements are sets rather than single points, hence the concept of bipolar reference sets is proposed here.
Based on the concept of bipolar reference sets, Konarzewska-Gubała (1987, 1989 proposed a single-stage method, called Bipolar. In her method, the author noticed that the motivation to achieve success and the motivation to avoid failure are not equivalent, therefore the final evaluation of the decision alternatives is based on their position with respect to two segments of the reference system: ideal and nadir.

3
Almost simultaneously with the introduction of the classic Bipolar method, the question arose: is it possible to apply this approach to the analysis of multistage, multicriteria decision processes (Trzaskalik 1987). That paper formulated the problem of searching for the best solution using an extended bipolar approach. It also attempted to solve this problem using Bellman's Principle of Optimality and optimality equations. That attempt, however, was not entirely successful and research in that direction was discontinued. The present paper is an attempt to reformulate this problem and to solve it using multiobjective dynamic programming.
The Bipolar Multistage Method, its assumptions and concepts, have been described in detail in Trzaskalik (2020). The concept of a multistage alternative and the definition of the best such alternative are introduced there.
The question arises: In order to find the best multistage alternative, is it necessary to examine all such alternatives? The present paper aims at showing that it is possible to find the best alternative using Bellman's vector Principle of Optimality and optimality equations. Of importance is here the theorem on the non-domination of the best multistage alternative, proven in the paper.
The paper consists of six sections. After Introduction (Sect. 1), the characteristic features of the classical Bipolar method, as compared with MCDA and other bipolar approaches, are presented in Sect. 2. Detailed assumptions and notation used in the multistage procedure and its description are presented in Sect. 3. The classification and ranking of multistage alternatives are also shown there. The basic theorem proven in Sect. 4 states that the best alternative, which is non-dominated, belongs to the lowest-indexed class. Applying that theorem and the vector version of Bellman's Principle of Optimality, it is shown how multiobjective dynamic programming can be used to find the best multistage alternative. Examples constitute Sect. 6. In Sect. 6.1 a numerical illustration of the proposed procedure can be found. In Sect. 6.2 a brief description of the regional sustainable development problem is given. The problem can be solved by means of the proposed procedure. Discussion and conclusions end the paper.

Bipolar method as compared with MCDA and other bipolar approaches
Discrete multiple criteria decision aid (or analysis)-MCDA problems-can be formulated as follows: given a finite set of alternatives which are evaluated using certain criteria, the decision maker intends to achieve one of the following goals (Figueira et al. 2005): -To select the alternative which best corresponds to his/her preferences (the problem of selection); -To order the alternatives from the best to the worst (the problem of ordering); -To assign each alternative to one of the predefined classes (the problem of multi-criteria classification).
Such problems are usually solved in two stages. In the first stage the decision alternatives are compared in various ways. In the second stage, a synthesis of the results is performed.
Many methods of solving such problems are known. Among the oldest multi-criteria methods are additive methods, such as the SAW method (Churchman and Ackoff 1954) and its modification, the F-SAW method (Tzeng and Huang 2011). Also, the SMART (Edwards 1971), and SMARTER (Edwards and Barron 1994) methods should be mentioned. The final evaluation is interpreted as the global utility of the given alternative. Analytical hierarchization and related methods are very often used. The most popular seems the AHP method (Saaty 1980) and its extensions, such as F-AHP (Mikhailov and Tzvetinov 2004), REMBRANDT (Lootsma 1992), ANP (Saaty 1996), its fuzzy extension F-ANP (Tzeng andHuang 2011), andMACBETH (Bana e Costa andVansnick 1993). Verbal Decision Analysis (VDA) is used to analyze unstructured problems, with mostly qualitative parameters. The best known are ZAPROS (Larichev and Moskovich 1995) and its modification, ZAPROS III (Larichev 2001). Methods from the ELECTRE family: ELEC-TRE I, ELECTRE Iv, ELECTRE Is, ELECTRE III, ELECTRE TRI, ELECTRE IV (Roy 1985, Roy andBouyssou 1993) are based on outranking relations. A characteristic feature of methods from the PROMETHEE group is their use of preference flow. The most important methods from this family are: PROMETHEE I and PROMETHEE II (Brans 1982) and EXPROM (Diakoulaki and Koumoutsos 1991). A very popular approach uses reference points. One of such methods is TOPSIS (Hwang and Yoon 1981) and its fuzzy version F-TOPSIS (Jahanshahloo et al. 2006). The VIKOR method (Opricovic 1998) is also worth mentioning. There exist also methods which are compositions of other methods. For instance, combining the three methods: DEMATEL (Gabus and Fontela 1973), ANP and VIKOR (Tzeng and Huang 2011) allows to consider decision problems with interdependent criteria and alternatives.
One of the MCDA methods considered in the present paper is Bipolar, introduced by Konarzewska- Gubala (1987Gubala ( , 1989. Its characteristic feature is that decision alternatives are not compared directly with each other, but using two sets which constitute a reference system: the set of objects of desired characteristics, called "good" objects, and the set of objects of undesired characteristics, called "bad" objects. The classic Bipolar method consists of three phases. In the first phase, decision alternatives are compared with the reference objects. In the second phase, the position of each alternative is established with respect to the reference sets. In the third phase, the alternatives are assigned to predefined classes and a linear ordering is defined in each class, which allows to rank the alternatives. In phase I, we find the notions of concordance and veto thresholds, introduced in the Electre methodology (Roy 1985), while in phase II, that of algorithms of confrontation (Merighi 1980).
Similarly to the Bipolar method, reference objects are used also in other methods, such as the bi-reference method introduced by Michałowski and Szapiro (1992), and expanded in Wojewnik and Szapiro (2010). Skulimowski (1996) introduced a decision support system based on reference sets.
The bipolar approach can be found also in other papers on decision making methods. In Bisdorff et al. (2008), the authors present the R UBIS method for tackling the problem of choice in the MCDA context and focus on pairwise comparisons of the alternatives which leads to the concept of a bipolar-valued outranking digraph. Franco et al. (2013) introduce the bipolar Preference-Aversion (P-A) model, which allows to analyze, using an independent aggregation methodology, the possible gains and losses. Aslam et al. (2014) combine the concept of a bipolar fuzzy set and a soft set. Ezhilmaran and Sankar (2015).define bipolar intuitionistic fuzzy sets, relations and graphs, isomorphism of these graphs and discuss some of their properties. Bouzarour-Amokrane et al. (2015) address the collaborative group decision making problems considering consensus processes to achieve a common legitimate solution. Their resolution model is based on individual bipolar assessments. Al-Qudah and Hassan (2017) extend the two concepts of bipolar fuzzy sets and soft expert sets to bipolar fuzzy soft expert sets. Liu et al. (2018) aim to develop a dynamic linguistic multi-criteria decision making model using bipolar linguistic scales in which both alternatives and criteria may vary across time. Akram et al. (2018) present a novel framework for handling bipolar fuzzy soft information by combining bipolar fuzzy soft sets with graphs. Jana and Pal (2018) combine a bipolar intuitionistic fuzzy set and a soft set. Jana et al. (2019) develop a model for multiple attribute decision-making problem for bipolar fuzzy Dombi aggregation operators under the bipolar fuzzy environment. Jana et al. (2020) use Dombi t-norm and t-conorm in the bipolar fuzzy environment to aggregate various preferences of the decision makers. Akram et al. (2020) present the notions of two novel hybrid multi-attribute decision-making models: m-polar fuzzy bipolar soft set model and rough m-polar fuzzy bipolar soft set model and develop efficient algorithms to solve multi-attribute decision-making problems applying these notions.

Bipolar multistage method-assumptions and notation
We consider multistage decision processes with a finite, fixed number of feasible states in each stage, a finite number of feasible decisions for each state, and a finite, fixed number of stages. The pair consisting of the initial state of the process at the given stage and one of the feasible decisions will be called a stage alternative. At each stage, two finite stage reference sets are given: the stage set of "good" objects and the stage set of "bad" objects. The stage alternatives are compared with the elements of the reference sets with respect to the criteria which are important for the decision maker at the given stage.
Similarly as in the first two phases of the original Bipolar method, we calculate the indicators describing the positions of the stage alternatives with respect to the stage reference objects, as well as the positions of these alternatives with respect to the reference sets. In the third phase we construct multistage indicators which allow to compare the multistage alternatives and to assign them to the six predefined hierarchical classes. A detailed characteristic of these classes can be found in Trzaskalik (2020). Within each class we define a linear ordering of the multistage alternatives which allows to rank them.
In the case of small-scale problems constructing a ranking of all the multistage alternatives is feasible and can be useful. In the case of large-scale problems, finding the best multistage alternative only can be of interest. Here multiobjective dynamic programming can be helpful. It turns out that the best multistage alternative is non-dominated. Therefore, in practice, we search for the set of non-dominated multistage alternatives and select among its elements those that belong to the lowest-indexed (that is, the best) class. From among those alternatives we select the best multistage alternative.
In what follows we will use the following notation and certain further detailed assumptions.
Let us denote: T-the number of stages of the process, t-the index of the process stage considered (t = 1,…,T), y t -a feasible state of the process at the beginning of stage t, Y t -the set of feasible states at the beginning of stage t, x t -a feasible decision for state y t , X t (y t )-the set of feasible decisions at the beginning of stage t for state y t , Ω t -the transition function for stage t. We have: y t+1 = x t , a t -a feasible realization for stage t (a t = (y t , x t ) = (y t , y t+1 )) called stage alternative, A t -the set of feasible realizations of the process for stage t, a-a feasible realization of the whole process, a = (a 1 ,…,a T ) = (y 1 , x 1 ,…,y T , x t ) = (y 1 , y 2 ,…,y T+1 ) called multistage alternative, A-the set of all feasible multistage alternatives of the process, K-the number of all the criteria considered (k = 1,…,K) c k t -kth criterion in stage t.
It is assumed that at each process stage t the following are given: the set of stage criteria and K k is a cardinal, ordinal or binary scale. The criteria are defined so that higher values are preferred over lower values. It is possible, of course, to transform the remaining types of criteria to the form used here.
For each stage and each criterion, the decision maker determines the weight w k t of its relative importance. The values of the weights allow to determine which criteria are important at the given stage and also to determine the degree of their importance. The inequality w k t > 0 means that criterion k is important at stage t. If w k t = 0, the criterion is not important at stage t. We also assume that if the kth criterion is more important for the decision maker than the lth criterion, he/she determines the weights so that w k t >w l t . For equally important criteria, we have w k t =w l t . Furthermore, we assume that and The decision maker determines, for t = 1,…,T, the stage reference systems which consist of the set G t of "good" objects and the set B t of "bad" objects. We assume that each of these sets is finite and It is also assumed that for all t = 1,…,T, k = 1, …,K and k t ∈ t , the values f k t k t are known. We also assume that This last condition guarantees that no stage alternative is at the same time overgood and underbad [see Trzaskalik and Sitarz (2012)], that is, it is not evaluated better than G t and worse than B t at the same time.

Phase I: Comparison of stage alternatives with stage reference objects
For the pair (a t , r t ), where a t ∈ A t , r t ∈ R t , we calculate the following values:

Case 1
The stage outranking indicators for the stage alternative a t are defined as follows:

Case 2
The stage outranking indicators for the stage alternative a t are defined as follows:

Case 3
The stage outranking indicators for the stage alternative a t are then defined as follows: -if r t is a good object, then -if r t is a bad object, then (6) c + By means of these outranking indicators two stage relationships: stage preference L t and stage indifference I t are defined as follows:

Phase II: Position of a stage alternative with respect to the bipolar stage reference system
For a given a t ∈ A t , the auxiliary sets of indices are defined as follows: To determine the position of a stage alternative a t with respect to the set G t we consider:

Case S1
We calculate the value of the stage success achievement degree as follows:

Case S2
We calculate the value of the stage success achievement degree as follows: For a given a t ∈ A t , the auxiliary sets of indices are defined as follows: To determine the position of an alternative a t with respect to the set B t we consider:

Case F1
We calculate the value of the stage failure avoidance degree as follows:

Case F2
We calculate the value of the stage failure avoidance degree as follows:

Relationships in the set of multistage alternatives
Using the stage success achievement degree, for each multistage alternative a we define the multistage success achievement degree values: Using the stage failure avoidance degree, for each multistage alternative a we define the multistage failure avoidance degree values: We define these four indicators as arithmetic means of the stage indicators. They admit values from the interval [0, 1].
Using the values d G We have: From the construction of these classes it follows that if k < l, each multistage alternative from class A k should be preferred over any multistage alternative from class A l (Trzaskalik 2020). Let: Within the classes the alternatives are ordered as follows:

Application of multiobjective dynamic programming
For each stage of our decision process we define the stage vector function Φ t with four components: where a t ∈ A t .
To keep the notation uniform, we assume that For the entire process we define the vector function Φ with four components: where for k = 1,…,4, where a = (a 1 ,…,a T ).

From this notation it follows that
The multistage alternative a' dominates the multistage alternative a, if We denote this by: The multistage alternative a * is non-dominated, if We denote the set of all non-dominated alternatives by A * . The problem of determining the set A * is a vector maximization problem.
To find the best multistage alternative a ** the following theorem will be useful:

Theorem 1 The best multistage alternative is non-dominated.
Proof (by contradiction) Let a** be the best multistage alternative. Assume that it is dominated. Let a′ be any multistage alternative which dominates a**.
First we will show that a' must belong to the same class as a**. (55) From the definition of the best alternative it follows that no other alternative (including a′) can belong to a class indexed lower than the class to which a** belongs.
If, in turn, a′ belonged to a class indexed higher than the class of a**, then from the construction of our classes it would follow that at least one of the following strong inequalities holds: which contradicts the assumption that a′ dominates a**. Hence, both realizations must belong to the same class.
Since we assumed that a** is dominated, we have: Adding these inequalities, we obtain: From the definition of a non-dominated multistage alternative we know that there exists v ∈ {1, 2, 3, 4} for which Therefore we obtain the inequality From the last inequality and from the definition of the best multistage alternative it follows that a** is not the best multistage alternative, which leads to a contradiction and ends the proof.
Theorem 1 shows that we should search for the best multistage alternative among nondominated alternatives. Therefore we can formulate the following strategy: 1. Find the set A* of non-dominated multistage alternatives. 2. Assign the non-dominated multistage alternatives to the corresponding classes. 3. Find the lowest-indexed class which contains at least one non-dominated multistage alternative. Denote by A m* the set of non-dominated multistage alternatives from this class. 4. For each non-dominated alternative a* from A m* find the value 5. Find the best multistage alternative a** which satisfies the condition: To find the set A* we use multiobjective dynamic programming. Let α and β be two finite sets, that is: We define the sum α + β as follows: and the operation "max" on α as follows: Using the vector version of Bellman's optimality principle (Li and Haimes 1989), we can write down the optimality equations. For the consecutive states and stages (starting with the last stage) we find the sets of vectors which allow to construct non-dominated partial realizations of the process, starting with the given state until the end of the process. For each t = T, T − 1,…,1, we also find the optimal stage strategy, that is, the sets X t * (y t ) of non-dominated decisions. We obtain: Stage T.
For the consecutive states y T ∈ Y T we have: and Stage t (t = T − 1,…,1) For the consecutive states y t ∈ Y t we have: We find the set A * of non-dominated multistage alternatives using the formula:

Numerical illustration of the procedure
We consider a three-stage decision process. The sets of feasible states and decisions are as follows: This means that we have four stage alternatives at each stage of the process (t = 1, 2, 3): At each stage we have two reference sets: The matrix of stage criteria weights is given in Table 1.
The results of the comparisons of the stage alternatives a t ∈ A t with the elements of the reference sets are given in Table 2. Table 3 shows the complete set of stage indicators resulting from the comparisons of the consecutive multistage alternatives with the reference sets.
Our multistage process, together with the values of the functions Φ t , is presented in Fig. 1. We find the best multistage alternative using Bellman's optimality Eqs. (83)-(86). (86) (1) (1) = = + + = Table 3 Values of the stage indicators  x * 2 (0) = 0 From formulas (87) and (88) we obtain: It turns out that the set of vectors consists of one element only, hence the optimal process realization corresponding to this vector is at the same time the optimal multistage alternative. It starts at state y 1 = 0 and is of the form: where According to formula (40), alternative a** belongs to class A.

Sustainable regional development 1
Spatial planning at regional level is carried out by creating a spatial plan for the development of the region. It defines, among other things, the development of the settlement network, communication links as well as cross-border links, protected areas, the location of public investment of above-local importance, areas at risk of flooding, closed areas, areas of fossil deposits.
The principles of sustainable development must be implemented in real terms when creating a plan for the region. The omission of this principle may lead to an irrational and uncontrolled use of the resources and potential of space and to policies based only on progressive economic development without socio-environmental assumptions.
Sustainable regional development aims to compensate for intraregional disparities while preserving the identity of individual sites. Development objectives should apply to all aspects of life: economic, social and environmental.
In the planning of sustainable development strategies, measures are used to show the scale and structure of development processes. They can be divided into three groups that correspond to individual governance: economic, social and environmental. Each of these groups contains multiple measures.
Economic governance indicators include, for example: -gross domestic product, -value of capital expenditure, -number of operators, -size of the population of non-working age in relation to the working-age population, -number of people employed per thousand inhabitants, -expenditure on active forms of combating unemployment, -length of the electricity grid, -number of motor vehicles, -length of railway lines.
Social governance indicators include, for example: -population density, -rate of natural increase, -unemployment rate, -number of homeless people, -number of people receiving social assistance, -the percentage of people living below the social minimum, -the percentage of people living below the subsistence level, -social services expenditure, -health care expenditure, -culture and art expenditure, -education expenditure, -share of people with tertiary education in relation to the total number of inhabitants, -average living space per person.
Environmental governance indicators include, for example: -investment in water protection, -withdrawal of surface water and groundwater, -pollutant emissions, -size of renewable energy production, -share of selected waste in the total amount of municipal waste, -quality of acoustic climate, -area of recreation parks and forests, -number of endangered species (fauna and flora) -amount of investment in the protection of biodiversity and landscape.
The issue of sustainability planning can be presented as a multi-stage decision-making process. This requires the identification of multidimensional state vectors and admissible decisions, as well as of stage reference sets forming stage reference systems. These sets should be created using the different predicted values of the measures indicated above. The use of the dynamic bipolar procedure will allow to integrate all dimensions of sustainable regional development: ecological, economic, social and spatial.

Conclusions
The determination of the best multistage alternative by means of optimality equations was relatively simple in our example, both because of the small scale of the problem and because the sets of non-dominated solutions for the consecutive states consisted of single elements only. If the sets had been larger, the situation would have been more complicated.
As mentioned previously, the first attempt to extend the classic Bipolar method to multistage decision processes was introduced in Trzaskalik (1987). Note the similarities and differences in the two approaches: the previous one and the present one. The way of comparing the stage alternatives (that is, the first two phases of the method) are similar in both approaches and are based on the classic Bipolar method, although in the present paper we have simplified the assumptions. Namely, we have set the equivalence threshold to be 0.5 and decided not to use the veto coefficients. These simplified assumptions allowed to uniquely predefine the disjoint classes to which the multistage alternatives were assigned. In the previous paper, the possibility of the occurrence of multistage alternatives which are overgood and underbad at the same time was not eliminated, which made the classification less clear. An essential novelty in the present paper is the numerical indicator describing the multistage alternatives and the form of this indicator, described by formula (48). It allows to order the multistage alternatives within each class. Of essential practical importance is also the theorem proven here, stating that the best multistage alternative is non-dominated in the four-dimensional space formed by the multistage indicators, described by formulas (36)-(39). This theorem allows to precisely determine the best multistage alternative among the non-dominated multistage alternatives found by means of multiobjective dynamic programming.
I hope that the present paper will start a new stage of research on multicriteria decision processes using the Electre methodology. The next task is to design a new variant of the method, with the veto thresholds and the stage equivalence thresholds (both of which are now omitted), and in which the concordance coefficient will be an arbitrary number from the closed interval [0.5, 1.0]. Another problem to be solved is that of the possible occurrence of stage alternatives which are at the same time overgood and underbad. In the general case it may happen that some stage alternatives will not be comparable with the stage reference sets. In such situations some multistage alternatives will also be non-comparable. Another direction of research is to design software for numerical simulations.
Considering the possible fields of applications of the dynamic bipolar procedure, it seems that it can be applied, among others, to design a long-term regional development strategy using the sustainable development approach, described briefly in Sect. 6.2. The problem needs to be worked out in detail.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.