1 Introduction

Nowadays cities are considered as the most important factor of environmental pollution and, as acknowledged by [54], although urban spaces are only 2% of the earth’s surface, they consume 60-80% of all goods. Consequently, over the years many different objectives have been defined to reduce the pollution produced by cities and to make them develop concerning the environment. This is also the reason for which on September 25, 2015, the Agenda 2030 has specified, among the others, the sustainable development goal (SDG) 11 regarding cities [1]. Therefore, the evaluation of the level of sustainability of cities became a crucial issue.

Even if there is no univocal definition of the term sustainability, three main types of sustainability are commonly considered: environmental, economic, and social [39, 45]. Because the level of sustainability is measured by means of several indicators, studying the sustainability of cities at both comprehensive and partial level can be considered a typical Multiple Criteria Decision Aiding (MCDA; [29]) problem [37].

In this paper, we would like to promote the use of a method recently proposed in the literature, called the hierarchical-SMAA-PROMETHEE method [3], to evaluate the sustainability of cities but that can analogously be applied to countries or regions. To this aim, we applied the method to study the sustainability of 20 European cities in the 2012-2015 period, taking into account 9 different indicators grouped in three macro-criteria, that is, environmental, economic, and social. The method combines three MCDA methodologies, namely, the Multiple Criteria Hierarchy Process [16], the PROMETHEE II method [7], and the Stochastic Multicriteria Acceptability Analysis [34], taking advantage of their main potentialities. This methodological combination is innovative and yields an added value to be shown by this study. The benefits resulting from the application of the hierarchical-SMAA-PROMETHEE method in the considered context are the following:

  • The use of the PROMETHEE methods and, in particular, of PROMETHEE II, permits to aggregate the evaluations on multiple criteria avoiding compensation between them. Indeed, following [38], in measuring sustainability “compensability should be avoided”. Moreover, PROMETHEE II provides a complete ranking of the alternatives, permitting to define the position got by each city.

  • The use of the MCHP permits to get the ranking information not only at the comprehensive level but also at environmental, economic, and social levels. In this way, by integrating the MCHP with the PROMETHEE II method one can rank-order the cities not only comprehensively, but also with respect to each individual macro-criterion, learning in this way which are their weak and strong points.

  • The use of SMAA permits to provide robust recommendations concerning the considered sustainability evaluation. Indeed, instead of choosing a single vector of weights corresponding to the nine indicators, SMAA gives the possibility of ranking the considered cities at both comprehensive and partial levels using a big set of possible vectors of weights. The output information provided by SMAA is presented in statistical terms specifying the probability with which a city gets a certain rank position or the frequency with which a city is preferred to another one. This statistical information can then be used to obtain a robust ranking of the considered cities at both comprehensive and partial levels.

The paper is organized as follows: in the next Section, we shall briefly review the literature on sustainability evaluation of cities, countries, and regions; In Section 3, we remind the methodological background including, in particular, the three methods composing the hierarchical-SMAA-PROMETHEE method. In Section 4, we apply the proposed method to evaluate the sustainability of 20 European cities at both comprehensive and partial levels, showing the potential of the method; finally, in Section 5, we make conclusions and indicate some future research directions.

2 Literature review

The number of papers presenting applications of MCDA methods to assess sustainability in different fields is quite large (see, for example, [15, 22, 24, 55, 62]). In the following, without any ambition of being exhaustive, we review a few of these papers regarding the sustainability evaluation of cities, countries, and regions. As will become evident later, they differ for the type of sustainability to be studied, the chosen indicators, and the method used to perform the aggregation of the alternatives’ performances [57].

Munda and Saisana [39] compared 25 regions in the Mediterranean area (17 Spanish, 4 Italian, and 4 Greek) based on 29 indicators representing economic, social, and environmental aspects. Compensatory and non-compensatory aggregations have been used to underline that the obtained results depend on the chosen method; Data Envelopment Analysis (DEA; [12]) has also been used to study the efficiency of the same regions.

Phillis et al. [44] evaluated the sustainability of 106 cities and megacities all over the world by a fuzzy model called SAFE (Sustainable Assessment by Fuzzy Evaluation). The sustainability of these cities has been studied based on 46 indicators belonging to two macro-input, that is, ecology and well-being. A sensitivity analysis has been performed to highlight which indicators influence more the degree of sustainability of the considered cities. The same model has been used to compare the sustainability of 128 countries considering 75 indicators in [43].

Using an intuitionistic fuzzy approach, [23] assessed the environmental sustainability of 27 U.S. and Canadian metropoles. 16 sustainability indicators were taken into account in the paper while the analysis was performed based on experts’ judgments used to assign weights to these indicators.

Chen and Zhang [13] studied the sustainability of 14 Chinese cities in Liaoning province considering 21 indicators grouped in economic, social end environmental macro-criteria in the 2013-2017 period. Interaction between indicators, distinguished in static interaction and dynamic trend similarity, has been taken into account. After normalizing the evaluations of the cities by the minmax normalization, the IOWA (Induced Ordered Weighted Averaging; [63]) operator has been used to aggregate them; the IOWA operator has also been applied in [65] to analyze the sustainable development of 13 Chinese cities in the 2011-2016 period. To this aim, 18 criteria concerning economic, social, and environmental aspects have been taken into account. Weights assigned to the different indicators were used to express their interdependence.

Deng et al. [21] assessed the sustainability of four large-sized Chinese cities (Beijing, Chongqing, Shanghai, and Tianjin) in three different years (2005, 2010, and 2015). The evaluations of the cities on 18 indicators divided into four macro-areas have been aggregated by the arithmetic mean after the minmax normalization. The same preference model has been used by [5] to evaluate the sustainability of 92 municipalities located in the Umbria region (Italy), using 18 indicators (9 environmental and 9 socio-economic). After putting all evaluations in the [0,1] interval by a standardization method, trade-off weights necessary to apply in the weighted sum have been obtained using the SWING method [61].

Yi et al. [64] evaluated the sustainability of 14 cities in the Lianoling province of China in the 2011-2016 period. A weighted sum has also been used in this case to aggregate the evaluations of the considered cities on 21 indicators using equal weighting. By stochastic simulations, the authors forecasted also the sustainability of the same cities in the following years.

The sustainability of 4 metropolitan areas (Bari, Bitonto, Mola, and Molfetta) in the south of Italy has been studied by [10]. The AHP method [51] has been applied to the data set composed of 35 indicators belonging to seven different dimensions; AHP has been used in [20] and [46] as well. On the one hand, [20] evaluated the sustainability performance of the 28 European countries from environmental and energetic perspectives considering 9 indicators; on the other hand, [46] proposed an approach aiming to assess the sustainability and livability of cities based on cognitive mapping. The analysis took into consideration 6 macro-criteria.

Zhang et al. [66] evaluated the sustainability of 13 Chinese cities in the Jiangsu province with respect to 30 indicators by using the Choquet integral [14]. The evaluations of the considered cities have been normalized by the min-max normalization, and then aggregated considering weights assigned to the indicators and all possible coalitions of criteria, objectively based on the data at hand and, therefore, without any judgment of the Decision Maker (DM). The Choquet integral has also been used in conjunction with cognitive mapping by [8] and [11]. In the first paper, the “greenness” of 20 Portuguese cities has been evaluated, while, in the second, the “smartness” of the same 20 Portughese cities with respect to 6 macro-criteria has been analyzed.

Evaluation of the urban sustainable development of 16 Chinese cities in Anhui province has been performed based on 39 indicators divided into economic, social, and environmental in [56]. The evaluations of the cities on the considered indicators have been computed by an integrated framework composed of the TOPSIS method [31] and the grey relational analysis, while weights of criteria have been obtained by the entropy method [67].

Paolotti et al. [41] assessed the sustainability level of the 20 Italian regions and the 17 Spanish autonomous communities on the basis of 18 indicators belonging to economic, social, and environmental macro-criteria. The analysis has been performed based on the preferences of 8 experts (4 from Italy and 4 from Spain), while the results have been obtained by the GeoUmbriaSUIT [6] which integrates GIS and MCDA [35]. In particular, TOPSIS was used again as a preference model, while the weights necessary to apply the method have been obtained by SWING.

Antanasijevic et al. [2] measured the sustainability of 30 European countries in the 2004-2014 period. The analysis involved 38 indicators grouped into 8 subgroups. The authors applied PROMETHEE II as a preference model and provided a ranking of the considered countries for the period 2004-2014 and two shorter periods, being 2004-2009 and 2010-2014, respectively. Analogously, PROMETHEE II has been used to assess the sustainable energy transition readiness level of 14 countries belonging to all continents by [40]. 8 indicators have been considered in the study and AHP has been used to get the weights of indicators necessary to apply the PROMETHEE II method, which is admittedly not a correct combination [48].

Finally, [52] applied ELECTRE III [25] to evaluate the sustainability of some megacities, such as New York and Los Angeles, considering their evaluations on 12 indicators belonging to environmental, economic, social, and smart macro-criteria.

The above literature review shows that in the performed studies there was no MCDA method permitting at the same time to consider sustainability of the cities at the global level as well as at the level of macro-criteria, using a non-compensatory aggregation, taking into account preference information expressed by the DM in an indirect way, and providing robust recommendations following from not only one but a plurality of instances of the assumed preference model compatible with available preference information. Therefore, in this paper, we would like to fill this gap in the literature by applying the hierarchical SMAA-PROMETHEE method addressing all the mentioned issues simultaneously.

3 Methodological background

In this Section, we shall recall briefly the methods composing the hierarchical-SMAA-PROMETHEE method that will be applied in the considered case study, namely the PROMETHEE II (Section 3.1), the Multiple Criteria Hierarchy Process (Section 3.2), and the Stochastic Multicriteria Acceptability Analysis (Section 3.3).

3.1 Multiple criteria decision aiding and the PROMETHEE II method

Multiple Criteria Decision Aiding (MCDA; [29]) methods are designed to deal with ranking, choice, or sorting problems. In this paper, we are interested in the ranking since we aim to order several European cities from the best to the worst with respect to their sustainability at both comprehensive and partial levels, that is, economic, social, and environmental. In the following, by A we shall denote the set of alternatives {a, b, c,…} and by G the set of m criteria {g1,…,gm} on which the alternatives are evaluated. For the sake of simplicity and without loss of generality, we assume that each criterion gj is a real-valued function, that is, \(g_{j}:A\rightarrow \mathbb {R}\), and that it has an increasing direction of preference (the more gj(a), the better is a on gj).

Considering the performance matrix being composed of the evaluations of the cities on the criteria at hand, the only objective information that can be obtained is the dominance relation D, such that for all a, bA, \(g_{j}(a)\geqslant g_{j}(b)\) for all gjG and there is at least one gjG such that gj(a) > gj(b). The objectivity of this relation is counterbalanced by its poverty, since in most of the cases neither a dominates b nor vice versa, which means that many alternatives are non-comparable by the dominance relation. To make the alternatives more comparable, an aggregation method has to be used summarizing the information included in the performance matrix on one hand, and preference information provided by the DM on the other hand. Historically, three different aggregation methodologies have been considered: (i) value function methods belonging to Multiple Attribute Value Theory (MAVT; [33]), (ii) methods based on outranking relations, such as ELECTRE [25, 26] and PROMETHEE [4, 7], and (iii) methods based on induction of decision rules [27]. In this paper, we will aggregate the evaluations of the alternatives on the considered criteria by using the second approach and, in particular, the PROMETHEE II method. We recall this method below.

PROMETHEE II builds a complete order of the alternatives at hand based on their comparison through the net flow. The net flow of each alternative a is computed in the following steps:

  1. 1.

    For each criterion gj and each pair of alternatives (a, b) ∈ A × A, the preference function Pj(a, b) is computed first. It is a non-decreasing function of the difference gj(a) − gj(b) expressing the degree of preference of a over b on gj. Six different types of function Pj(a, b) have been defined by [7]. The most frequently used is the V -shape function with indifference zone, defined as follows:

    $$ P_{j}(a,b)= \left\{ \begin{array}{lll} 0 & \text{if} & g_{j}(a)-g_{j}(b)\leqslant q_{j},\\ \frac{\left[g_{j}(a)-g_{j}(b)\right]-q_{j}}{p_{j}-q_{j}} & \text{if} & q_{j}<g_{j}(a)-g_{j}(b)<p_{j},\\ 1 & \text{if} & g_{j}(a)-g_{j}(b)\geqslant p_{j}. \end{array} \right. $$

    \(P_{j}(a,b)\in \left [0,1\right ]\) and, the greater Pj(a, b), the more gj is in favor of the preference of a over b. In the definition of Pj(a, b), qj and pj are, respectively, the indifference and the preference thresholds related to gj (see [50]). They are such that \(0\leqslant q_{j}<p_{j}\) and qj is the maximum difference between evaluation gj(a) and evaluation gj(b) compatible with their indifference on gj, while, pj is the minimum difference between gj(a) and gj(b) compatible with the preference of a over b on this criterion.

  2. 2.

    For each (a, b) ∈ A × A, the preference function π(a, b) is computed:

    $$ \pi(a,b)= \displaystyle \sum\limits_{j=1}^{m}w_{j}\cdot P_{j}(a,b) $$

    where wj > 0 is a relative importance weight assigned to criterion \(g_{j}, j=1,{\dots } m\), such that \(\displaystyle {\sum }_{j=1}^{m}w_{j}=1\). Also \(\pi (a,b)\in \left [0,1\right ]\) and the greater π(a, b) the more a is preferred to b.

  3. 3.

    For each aA, the positive flow ϕ+(a), the negative flow ϕ(a) and the net flow ϕ(a) are computed:

    $$ \begin{array}{@{}rcl@{}} &&\phi^{+}(a)=\frac{1}{|A|-1}\displaystyle \sum\limits_{b\in A\setminus\{a\}}\pi(a,b), \phi^{-}(a)\\ &=&\frac{1}{|A|-1}\displaystyle \sum\limits_{b\in A\setminus\{a\}}\pi(b,a), \phi(a)=\phi^{+}(a)-\phi^{-}(a). \end{array} $$

    On the one hand, ϕ+(a) measures how much, in average, a is preferred to all other alternatives; consequently, the greater ϕ+(a), the better a. On the other hand, ϕ(a) measures how much, in average, the other alternatives are preferred to a; consequently, the greater ϕ(a), the worse a. Finally, ϕ(a) provides a balance between the positive and the negative flows and it represents the relative quality of a in the set of alternatives A.

Based on the net flow of each alternative, PROMETHEE II builds a preference PII and an indifference III relation, such that:

  • aPIIb iff ϕ(a) > ϕ(b);

  • aIIIb iff ϕ(a) = ϕ(b).

The above two relations constitute a weak order (PII and III are transitive and PIIIII is reflexive and complete; see. e.g., [47]) in the set of alternatives, so one gets a ranking recommendation from the best to the worst alternative with possible ex-aequo.

3.2 The multiple criteria hierarchy process

In real-world applications of MCDA, the evaluation criteria are not always considered at the same level but they are structured hierarchically. This means that it is possible to distinguish a root criterion being the comprehensive objective of the problem, some macro-criteria descending from the root criterion hierarchically, until the elementary criteria being placed at the bottom of the hierarchy tree. The basic evaluations of the alternatives are made on the elementary criteria only, and they are aggregated to macro-criteria up the hierarchy tree, until the comprehensive criterion.

To deal with such a hierarchical structure of the family of criteria in MCDA, the Multiple Criteria Hierarchy Process (MCHP) has been proposed in [16]. In particular, its application to the PROMETHEE methods has been presented in [17]. The integration of the MCHP and the PROMETHEE II method permits to define a preference and an indifference relation not only at the comprehensive level (that is, at the root criterion level) but also at partial levels that correspond to macro-criteria (that is, at particular nodes of the hierarchy tree). To describe the integration of PROMETHEE II with MCHP, we shall use the following notation: gt represents an elementary criterion, while the set of indices of elementary criteria is denoted by EL; gr represents a generic macro-criterion in the hierarchy, while E(gr) is a subset of EL composed of the indices of elementary criteria descending from gr. In particular, g0 represents the root-criterion. For the sake of simplicity and without loss of generality we assume that macro-criteria, as well as elementary criteria, can descend from only one macro-criterion at the level above (see [17] for more details on this point).

To adapt the PROMETHEE II method to the MCHP, it is enough to perform the following replacements in steps 1–3 presented in the previous Subsection:

1→1’.:

The preference function Pj(a, b) that was defined for each criterion gj and for each ordered pair of alternatives (a, b) ∈ A × A has now to be defined for each elementary criterion gt and for each ordered pair of alternatives (a, b); of course, if the V-shape is the used preference function, indifference qt and preference pt thresholds have to be defined for each gt, tEL.

2→2’.:

After a weight wt is assigned to each elementary criterion gt, tEL, so that wt > 0 for all tEL and \(\displaystyle {\sum }_{\mathbf {t}\in EL}w_{\mathbf {t}}=1\), for each macro-criterion gr, the partial preference function πr(a, b) is defined for each ordered pair of alternatives (a, b) ∈ A × A as follows:

$$ \pi_{\mathbf{r}}(a,b)=\sum\limits_{\mathbf{t}\in E\left( g_{\mathbf{r}}\right)}w_{\mathbf{t}}\cdot P_{\mathbf{t}}(a,b). $$

Of course, \(\pi _{\mathbf {r}}(a,b)\in \left [0,\displaystyle {\sum }_{\mathbf {t}\in E\left (g_{\mathbf {r}}\right )}w_{\mathbf {t}}\right ]\) and, the greater πr(a, b), the more a is preferred to b on gr.

3→3’.:

For each alternative aA and for each macro-criterion gr, the partial positive, negative, and net flows are defined as follows:

$$ \begin{array}{@{}rcl@{}} &&\phi_{\mathbf{r}}^{+}(a)=\frac{1}{|A|-1}\displaystyle\sum\limits_{b\in A\setminus\{a\}}\pi_{\mathbf{r}}(a,b),\\ \phi_{\mathbf{r}}^{-}(a) & = &\frac{1}{|A|-1}\displaystyle\sum\limits_{b\in A\setminus\{a\}}\pi_{\mathbf{r}}(b,a), \phi_{\mathbf{r}}(a)=\phi_{\mathbf{r}}^{+}(a)-\phi_{\mathbf{r}}^{-}(a). \end{array} $$

Based on the net flows, a marginal preference relation \(P_{\mathbf {r}}^{II}\) and a marginal indifference relation \(I_{\mathbf {r}}^{II}\) are defined for each macro-criterion gr as follows:

  • \(aP_{\mathbf {r}}^{II}b\Leftrightarrow \phi _{\mathbf {r}}(a)>\phi _{\mathbf {r}}(b)\);

  • \(aI_{\mathbf {r}}^{II}b\Leftrightarrow \phi _{\mathbf {r}}(a)=\phi _{\mathbf {r}}(b)\).

3.3 Stochastic multicriteria acceptability analysis

As described in the previous Subsection, the application of PROMETHEE II involves that the weights of elementary criteria wt, as well as the indifference qt and preference thresholds pt, have to be specified by the DM. Assuming that the discriminating thresholds are technical, and as such, they are fixed, the final ranking recommendation at both comprehensive level and partial levels depends on the choice of the weights wt. Indeed, different values of wt imply, in general, different values of the positive, negative, and net flows and, consequently, different relations between the considered alternatives. To avoid a single, and therefore to some extent arbitrary, choice of weights, [19] proposed the SMAA-PROMETHEE method, later extended to the hierarchical-SMAA-PROMETHEE method by [3]. The hierarchical-SMAA-PROMETHEE method integrates the Stochastic Multicriteria Acceptability Analysis (SMAA) in the hierarchical PROMETHEE II method (see [34] for the paper introducing SMAA and [42] for a recent survey of the SMAA applications). As result, one gets a recommendation in statistical terms specifying the probability with which an alternative gets a particular position in the ranking or on the probability with which an alternative is preferred to another one at the comprehensive and partial levels. Such a ranking recommendation is more robust than the ranking obtained for a single vector of weights.

Defining by \(W=\left \{\left (w_{1},\ldots ,w_{|EL|}\right )\in \mathbb {R}^{|EL|}_{+}:\displaystyle {\sum }_{\mathbf {t}\in EL}w_{\mathbf {t}}=1\right \}\) the whole space of vectors of weights and by WDM the subset of W composed of the vectors of weights compatible with some preferences provided by the DM, the application of SMAA starts sampling several weights vectors from WDM. The hierarchical-PROMETHEE II method is then applied for each (w1,…,w|EL|) ∈ WDM getting a preference and an indifference relation at both comprehensive and partial levels. Based on these computations, the following indices are then obtained:

  • The rank acceptability index \(b_{\mathbf {r}}^{k}(a)\): it is the probability with which an alternative a reaches the k th position in the ranking corresponding to criterion gr,

  • The pairwise winning index pr(a, b): it is the probability with which alternative a is preferred to alternative b on gr. Of course, based on pr(a, b) and pr(b, a) the frequency of indifference between a and b on gr can also be computed as 1 − pr(a, b) − pr(b, a).

3.4 The SMAA-PROMETHEE method in detail

In this Section, we shall present in detail the different steps of the SMAA-PROMETHEE method summarized in the flow chart in Fig. 1.

Fig. 1
figure 1

Flow chart of the steps of the SMAA-PROMETHEE method

Step 0::

The set of criteria to be taken into account in the problem are structured hierarchically: a root criterion being representative of the problem itself is defined, as well as few macro-criteria descending from it, continuing down until the elementary criteria on which the alternatives will be evaluated and that are located at the bottom of the hierarchy tree.

Step 1::

In this step, the DM is asked to provide some preference information:

Step 1.1::

With the help of the analyst, all technical parameters are fixed. These regard selection of the shape of the preference function Pt(⋅,⋅) for each elementary criterion gt that, as mentioned in Section 3.1, can be of six different types, as well as indifference qt and preference pt thresholds for each gt if the V -shape with indifference is assumed as preference function,

Step 1.2::

The DM elicits some preference information by comparing criteria in terms of their importance (for example, “\(g_{\mathbf {r}_{1}}\) is more important than \(g_{\mathbf {r}_{2}}\)” or “\(g_{\mathbf {r}_{1}}\) and \(g_{\mathbf {r}_{2}}\) have the same importance”, etc.) or comparing some alternatives pairwise in terms of preference (for example, “a is preferred to b” or “a and b are indifferent”, etc.). These pieces of preference information are translated into linear constraints involving the weights of the elementary criteria and, therefore, reducing the space W of the compatible weights vector giving rise to the WDM space.

Step 2::

The analyst checks if the preference information provided by the DM is consistent, that is, if there exists at least one instance of the assumed preference model (the PROMETHEE II in our case) compatible with the preferences provided by the DM. We shall call such an instance compatible model in the following. If there exists at least one compatible model and, therefore, WDM, one can pass to step 3, otherwise the DM, with a help of the analyst, has to check the cause of the incompatibility and to resolve it in the revised preference information [36].

Step 3::

A certain number of compatible models is sampled by means of, e.g., the Hit-And-Run method [53, 59] (to have an idea of the number of compatible models that need to be sampled to get a certain precision in the obtained results, see [58]).

Step 4::

Apply the hierarchical-PROMETHEE II method for each sampled vector of weights obtaining, therefore, a complete ranking of the alternatives at hand both at the comprehensive level as well as for each considered macro-criterion.

Step 5::

Based on the rankings obtained in the previous step, the SMAA methodology is applied to compute the indices presented in Section 3.3, that is, the rank acceptability index of each alternative on each rank position, and the pairwise winning index between two alternatives. Both indices can be computed at comprehensive and at partial levels. The rank acceptability indices of the alternatives can also be aggregated to obtain a final ranking of the alternatives at hand, as will be shown in the next Section.

4 Case study

In this Section, we shall apply the hierarchical-SMAA-PROMETHEE method to assess the sustainability of 20 European cities in the period 2012-2015 at a comprehensive level as well as at Environmental (ENV), Economic (ECO), and Social (SOC) ones. Comprehensively, 9 elementary criteria have been taken into account, three for each macro-criterion at hand. In particular, Passenger Cars (PC), Amount of Waste Generated (AWG), and CO2 emissions (CO2) are elementary criteria of ENV; Employment Rate (ER), Unemployment Rate (UR), and GDP per capita (GDP) are elementary criteria descending from ECO; finally, Percentage of Population Owning a House (POH), Population Density (PD) and Criminal Rate (CR) are elementary criteria descending from SOC. The hierarchy of criteria is shown in Fig. 2, while the reference to the data source as well as their preference direction is shown in Table 1.

Fig. 2
figure 2

Criteria considered in the case study and structured in a hierarchical way

Table 1 Evaluation criteria, data source and preference direction

In Table 2, there are evaluations of the 20 considered cities on the elementary criteria in the year 2015 only. The whole data set and the whole set of results can be downloaded by clicking on the following link: Supplementary Results(http://www.antoniocorrente.it/wwwsn/images/allegati_articoli/SMAA).

Table 2 Evaluations of the considered cities on elementary criteria in year 2015

The indifference and preference thresholds for each elementary criterion, equal for all considered years, are shown in Table 3.

Table 3 Indifference and preference thresholds for the elementary criteria

Using the notation introduced in Section 3.3 we have that EL = {(1,1),…,(3,3)} and, therefore, the whole set of weights is \(W=\left \{\left (w_{(\mathbf {1,1})},\ldots ,w_{(\mathbf {3,3})}\right )\in \mathbb {R}_{+}^{9}: \displaystyle {\sum }_{\mathbf {t}\in EL}w_{\mathbf {t}}=1\right \}\). For our analysis, let us assume that Environmental macro-criterion is more important than Social one which, in turn, is more important than Economic macro-criterion. This preference information is translated into the following constraints on the weights: W1 > W3 and W3 > W2. Transforming the strict inequalities in weak ones by means of an auxiliary variable ε and observing that W1 = w(1,1) + w(1,2) + w(1,3), W2 = w(2,1) + w(2,2) + w(2,3) and W3 = w(3,1) + w(3,2) + w(3,3), the two constraints become

$$ \begin{array}{@{}rcl@{}} &&w_{(\mathbf{1,1})}+w_{(\mathbf{1,2})}+w_{(\mathbf{1,3})}\geqslant w_{(\mathbf{3,1})}+w_{(\mathbf{3,2})}+w_{(\mathbf{3,3})}+\varepsilon \\ &&\text{and} w_{(\mathbf{3,1})}+w_{(\mathbf{3,2})}+w_{(\mathbf{3,3})}\geqslant w_{(\mathbf{2,1})}+w_{(\mathbf{2,2})}+w_{(\mathbf{2,3})}+\varepsilon. \end{array} $$

Moreover, to impose that there is no dictator criterion, that is, there is no criterion having importance greater than the sum of the remaining ones, we impose the no-dictatorship condition [49] at all levels of the hierarchy, obtaining the following constraints:

  1. 1.

    \(W_{\mathbf {1}}\leqslant W_{\mathbf {2}}+W_{\mathbf {3}} \Leftrightarrow w_{(\mathbf {1,1})}+w_{(\mathbf {1,2})}+w_{(\mathbf {1,3})}\leqslant w_{(\mathbf {2,1})}+w_{(\mathbf {2,2})}+w_{(\mathbf {2,3})}+w_{(\mathbf {3,1})}+w_{(\mathbf {3,2})}+w_{(\mathbf {3,3})}\)Footnote 1,

  2. 2.

    \(w_{(\mathbf {1,1})}\leqslant w_{(\mathbf {1,2})}+w_{(\mathbf {1,3})}\); \(w_{(\mathbf {1,2})}\leqslant w_{(\mathbf {1,1})}+w_{(\mathbf {1,3})}\); \(w_{(\mathbf {1,3})}\leqslant w_{(\mathbf {1,1})}+w_{(\mathbf {1,2})}\),

  3. 3.

    \(w_{(\mathbf {2,1})}\leqslant w_{(\mathbf {2,2})}+w_{(\mathbf {2,3})}\); \(w_{(\mathbf {2,2})}\leqslant w_{(\mathbf {2,1})}+w_{(\mathbf {2,3})}\); \(w_{(\mathbf {2,3})}\leqslant w_{(\mathbf {2,1})}+w_{(\mathbf {2,2})}\),

  4. 4.

    \(w_{(\mathbf {3,1})}\leqslant w_{(\mathbf {3,2})}+w_{(\mathbf {3,3})}\); \(w_{(\mathbf {3,2})}\leqslant w_{(\mathbf {3,1})}+w_{(\mathbf {3,3})}\); \(w_{(\mathbf {3,3})}\leqslant w_{(\mathbf {3,1})}+w_{(\mathbf {3,2})}\).

To summarizing, the space WDM of weights vectors compatible with the mentioned preference information is defined by the following set of constraints:

$$ \left. \begin{array}{l} w_{(\mathbf{1,1})}+w_{(\mathbf{1,2})}+w_{(\mathbf{1,3})}\geqslant w_{(\mathbf{3,1})}+w_{(\mathbf{3,2})}+w_{(\mathbf{3,3})}+\varepsilon\\ w_{(\mathbf{3,1})}+w_{(\mathbf{3,2})}+w_{(\mathbf{3,3})}\geqslant w_{(\mathbf{2,1})}+w_{(\mathbf{2,2})}+w_{(\mathbf{2,3})}+\varepsilon\\ w_{(\mathbf{1,1})}\leqslant w_{(\mathbf{1,2})}+w_{(\mathbf{1,3})}, w_{(\mathbf{1,2})}\leqslant w_{(\mathbf{1,1})}+w_{(\mathbf{1,3})}, w_{(\mathbf{1,3})}\leqslant w_{(\mathbf{1,1})}+w_{(\mathbf{1,2})} \\ w_{(\mathbf{2,1})}\leqslant w_{(\mathbf{2,2})}+w_{(\mathbf{2,3})}, w_{(\mathbf{2,2})}\leqslant w_{(\mathbf{2,1})}+w_{(\mathbf{2,3})}, w_{(\mathbf{2,3})}\leqslant w_{(\mathbf{2,1})}+w_{(\mathbf{2,2})} \\ w_{(\mathbf{3,1})}\leqslant w_{(\mathbf{3,2})}+w_{(\mathbf{3,3})}, w_{(\mathbf{3,2})}\leqslant w_{(\mathbf{3,1})}+w_{(\mathbf{3,3})}, w_{(\mathbf{3,3})}\leqslant w_{(\mathbf{3,1})}+w_{(\mathbf{3,2})} \\ w_{\mathbf{t}}\geqslant\varepsilon, \forall\mathbf{t}\in EL \\ \displaystyle\sum\limits_{\mathbf{t}\in EL}w_{\mathbf{t}}=1. \end{array} \right\}E^{DM} $$

Solving the LP problem: \(\varepsilon ^{*}=\max \limits \varepsilon \), subject to EDM, we find that EDM is feasible and ε = 0.0833 meaning that there is at least one weights vector compatible with the preference we provided as a hypothetical DM. Applying the HAR method, we sampled 100,000 compatible weight vectors, applying for each of them the hierarchical PROMETHEE II method and obtaining for each weights vector a total ranking of the considered cities, not only at the comprehensive level but also at partial levels. As explained in Section 3.3, applying the SMAA methodology to the hierarchical PROMETHEE II method, we get the rank acceptability indices of all cities at comprehensive and partial levels. In Tables 456 and 7 we show the rank acceptability indices at the comprehensive level, as well as at the levels of the three considered macro-criteria, according to the provided preference informationFootnote 2.

Table 4 Rank Acceptability Indices of the cities at the comprehensive level: 2015 data
Table 5 Rank Acceptability Indices of the cities at the Environmental level: 2015 data
Table 6 Rank Acceptability Indices of the cities at the Economic level: 2015 data
Table 7 Rank Acceptability Indices of the cities at the Social level: 2015 data

Looking at data in Tables 4-7 and 8a-b, the following detailed observations can be done

  • Comprehensive level: Luxembourg, Oslo, Bern, Riga, and Prague are the only five cities that can take the first ranking position. In particular, Luxembourg is the one being in the first place more frequently (\(b^{1}_{\mathbf {0}}(LU)=42.66\%\)), followed by Oslo (\(b^{1}_{\mathbf {0}}(OS)=25.746\%\)), Bern (\(b^{1}_{\mathbf {0}}(BL)=14.432\%\)), Riga (\(b^{1}_{\mathbf {0}}(RI)=12.774\%\)) and Prague (\(b^{1}_{\mathbf {0}}(PR)=4.388\%\)). Moreover, looking at the sum of the rank acceptability indices of the same countries for the first three positions, we have the confirmation that Luxembourg and Oslo are the two best cities since this sum is equal to 85.623% for Luxembourg and 83.585% for Oslo meaning that, in almost all cases, both cities reach one of the first three positions in the ranking. Regarding Prague, instead, this sum is equal to 35.003% that means that it can reach the first three positions in the ranking but not very frequently. To further compare these five cities, in Table 8 we provide the pairwise winning indices among them. Luxembourg is preferred to the other four considered cities with a probability at least equal to 59.599% being the probability with which it is preferred to Oslo. Analogously, Oslo is preferred to the other four cities with a probability at least equal to 40.401% being the probability with which it is preferred to Luxembourg. Indeed, again, we have the confirmation that the two cities are the best.

    Table 8 Pairwise winning indices between few cities at comprehensive level as well as at macro-criteria level

    Regarding, instead, the tail of the ranking, London and Athens are the two cities more frequently in the last positions since the other three, that is, Rome, Berlin, and Madrid can be ranked last with very marginal probabilities. Similar to what has been done for the first three positions in the ranking, we look at the sum of the rank acceptability indices corresponding to the last three positions in the ranking. This sum is equal to 98.957% for London and 93.052% for Athens, meaning that the two cities are almost always in these positions independently on the weights assigned to the elementary criteria;

  • Environmental level: The information extracted from Table 5 is quite easy to be interpreted since there is only one city that can be ranked first, that is Bern, and one city that is always the last ranked, that is London. Moreover, one can observe that the results are quite stable and do not change very much with the weights of the three elementary criteria descending from the environmental macro-criterion since all cities can take at most 5 positions and at least one of them is filled with a probability close to 50% and, in many cases, even higher;

  • Economic level: At the economic level only three cities can be ranked first, that is, Stockholm, Oslo, and Luxembourg with probabilities of 77.242%, 20.616%, and 2.142%, respectively. Moreover, in the cases in which Stockholm is not ranked first, it is in the second (\(b_{2}^{\mathbf {2}}(ST)=17.256\%\)) or in the third position (\(b_{2}^{\mathbf {3}}(ST)=5.502\%\)) showing, therefore, great stability. Considering the tail of the ranking, Athens is surely the worst among the twenty analyzed cities since it fills always the last position. Looking at the second-worst city w.r.t. economic aspects, this has to be chosen between Lisbon, Madrid, and Rome since they are the only three cities that can be ranked at the last but one place of the ranking with probabilities of 62.056%, 36.873%, and 1.071%, respectively (see Table 6);

  • Social level: Warsaw, Luxembourg, and Oslo are the only three cities that can be ranked first considering the social macro-criterion. In particular, from Table 7, one can observe that they have a probability to be ranked first equal to 67.87%, 20.972%, and 11.158%, respectively. However, Luxembourg and Oslo present the highest probability in correspondence of a rank position different from the first. Indeed, Oslo’s highest probability is 37.273% and it is obtained in correspondence of the second place, while Luxembourg has the highest probability in correspondence of the sixth place, meaning that this is the rank position that is occupied more frequently from this city. To further compare these three cities, we extract their pairwise winning indices shown in Table 8b from which one can observe that Warsaw is preferred to the other two cities with a probability at least equal to 79.028%, while Oslo and Luxembourg are preferred to each other with quite similar probabilities since Oslo is preferred to Luxembourg in 51.547% of the cases, while Luxembourg is preferred to Oslo in the remaining 48.453% of the cases.

    Considering the tail of the ranking, London and Paris are the only two cities that can be ranked last and they have the highest probability to be ranked last (\(b_{\mathbf {3}}^{20}(LO)=97.686\%\)) and last but one (\(b_{\mathbf {3}}^{19}(PA)=80.841\%\)), respectively.

To summarize the results of the rank acceptability indices of the cities at the comprehensive and partial levels, following [32], we can calculate the expected rank ERr(a) of each city a:

$$ ER_{\mathbf{r}}(a)=\displaystyle -\sum\limits_{k=1}^{|A|}k\cdot b_{\mathbf{r}}^{k}(a). $$

Based on ERr(a), the cities are then ranked from the best (the one having the greatest ERr(a)) to the worst (the one having the smallest ERr(a)).

In Table 9, we show the positions got by each city in the ranking obtained according to the expected rank at the comprehensive and partial levels. Looking at the data, one can appreciate the finer results provided by the MCHP at the macro-criteria levels. Indeed, while some cities, such as Prague or Oslo, take more or less the same position at the comprehensive level and each macro-criterion, others take completely different rank positions in the four considered rankings. For example, London, is 3rd according to the economic ranking, while it is the 19th in the comprehensive and environmental rankings and the 20th in the social ranking. Analogously, while Bern is the first according to the environmental aspect, it is the 4th at the comprehensive level, the 5th on the economic aspect, and, finally, the 13th on the social aspect. In this way, from a policy-making point of view, it is possible to consider the weak and strong points of each city, so that the policymakers can develop strategies preserving the strong points and, at the same time, improving the weak ones.

Table 9 Rank positions of each city at the comprehensive and partial levels according to their expected rank: 2015 data

Since the analysis has been performed separately for each year from 2012 to 2015, it can be beneficial to look at the evolution in time of the positions got by each city in the rankings obtained at the comprehensive and partial levels.

In Table 10a-c we reported the evolutions of the expected ranks over the 2012-2015 period at the comprehensive level as well as for economic and social macro-criteria. We did not include the same table for the environmental macro-criterion since the ranking is the same in all years apart from 2015 where Paris is not considered because of missing data on some elementary criteria. In parentheses, we reported the number of rank positions a city has increased (↑) or decreased (↓) with respect to the previous year. More in detail, one could observe the following:

  • Comprehensive level: In 2013 many cities have changed their rank position with respect to the previous year. However, this is mainly because four of them were excluded from the 2012 analysis because of missing data on some elementary criteria. Considering, instead, the years 2013-2014, one can see that the ranking is almost the same since all cities maintain the same rank position or they change by one or at most 2 ranks. The main changes can be observed for Riga which is 5th in 2013 and 2014 but goes to 3rd place in 2015;

  • Economic level: Differently from the comprehensive case, more changes can be observed w.r.t. the economic aspect. While Lisbon and Athens fill always the same position in the four considered years, all the other cities are subject to changes in their rank positions. In particular, on the one hand, a positive trend can be observed for London (which gains one position in 2013 and three positions in 2014), Berlin (which gains one position in 2013 and 2015), Copenaghen (which gains one position in 2013 and 2015), and Stockholm (which gains one position in 2013); on the other hand, a negative trend can instead be observed for Rome (which loses one position in 2013 and 2014), Paris (which loses one position in 2015), Vienna (which loses one position in 2013 and 2015), Madrid (which loses one position in 2013), Oslo (which loses one position in 2013), Warsaw (which loses one position in 2014), Bruxelles (which loses one position in 2014 and 2015), Helsinki (which loses two positions in 2013 and 2015) and, finally, Luxembourg (which loses one position in 2014 and two positions in 2015); the other cities have a fluctuating trend since they gain and lose positions during the years;

  • Social level: As already observed for the comprehensive level, in 2013 many cities have changed their rank position with respect to the previous year because in 2012 four cities have not been considered in the ranking because of missing data on some elementary criteria descending from the social macro-criterion; apart from London, Paris, Athens, and Amsterdam which maintain the same position in the three years, all the other cities change their position over the years. In particular, on the one hand, a positive trend can be observed for Lisbon (it gains two positions in 2014 and three positions in 2015), Vienna (it gains one position in 2014), Madrid (it gains two positions in 2014), Riga (it gains one position in 2014 and two positions in 2015), Warsaw (it gains two positions in 2014), Helsinki (it gains one position in 2014); on the other hand, a negative trend can be observed for Berlin (it loses one position in 2014 and 2015), Oslo (it loses one position in 2014), Stockholm (it loses seven positions in 2014 and one position in 2015); the other cities have a fluctuating trend since they gain and lose positions during the years.

Table 10 Comparison of the expected ranks over the period 2012-2015 both at comprehensive and partial level

5 Conclusions

Sustainability and, consequently, sustainable development is a wide concept universally acknowledged but not univocally defined. Several definitions of sustainable development have been provided over the years, starting from the first dated 1713 and provided by Von Carlowitz [60]. The most used is the one contained in the Brundtland report, dated 1987, for which it is “...a development that meets the needs of the present without compromising the ability of the future generations to meet their own needs” [9]. As three main types of sustainability are commonly acknowledged [45], that is, Environmental, Social and Economic, the measuring of sustainability of a project, a city, or a country, is a problem that has to be addressed using some Multiple Criteria Decision Aiding (MCDA) methods.

In this paper, we applied a recently developed MCDA method, namely, the hierarchical-SMAA-PROMETHEE method, to evaluate the sustainability of 20 European cities in the 2012-2015 period. Nine elementary criteria belonging to the environmental, social, and economic macro-criteria have been taken into account to evaluate the sustainability of the cities at the comprehensive level as well as at three partial levels. The preference model used to aggregate the multiple criteria evaluations is the one of PROMETHEE II, which permits to avoid the effect of compensation between criteria. The compensability of indices used to measure sustainability is in fact undesirable, as shown in [30] and [38]. The use of PROMETHEE II permits to rank order all considered cities from the best to the worst, not only at the comprehensive level but also at the level of each macro-criterion. To provide a robust ranking recommendation, the Stochastic Multicriteria Acceptability Analysis (SMAA) [34] has been applied. The application of SMAA permits to find the frequency with which a city is reaching a certain rank position at the comprehensive and partial levels, taking into account a large set of the instances of the assumed preference model compatible with the preferences provided by the Decision Maker. In our study, these instances are defined by vectors of criteria weights drawn randomly from a feasible set. The results of SMAA were summarized using the expected rank score proposed by [32], providing a final ranking of the cities.

Based on the results of our case study, we believe that the proposed method can be a useful tool for policymakers for at least two reasons. Firstly, it gives the possibility to identify the weak and strong points of each city looking at its rank position got at partial levels. Indeed, apart from information about the rank position of a particular city reached at the comprehensive level, the hierarchical-SMAA-PROMETHEE method gives also information about its position in the rankings taking into account the Environmental, Social and Economic aspects separately. This permits to identify weak points (the macro-criteria on which the city gets low-rank positions) and strong points (the macro-criteria on which the city gets high-rank positions). Secondly, comparing the rank position got by a city in different years, one can observe the consequences of implemented policies, and arrive to conclude which ones should be maintained (in case a city has improved or, at least, not deteriorated its rank position over the years) and which ones should be revised and improved (in case a city has seen its rank position going down during the years).

In our future research, we would like to apply the same framework considering a greater number of cities as well as a wider period. Moreover, from the methodological points of view it would be interesting to take into account some possible interactions between the different criteria [18] and, finally, to explain and justify the obtained results using the Dominance-based Rough Set Approach [27, 28].