1 Introduction

For several years, academics, policymakers, and international organisations have required systematic assessments to design sustainable development and implement proper environmental management that combines economic growth with social and natural capital protection. In this context, sustainable tourism development has emerged as the dominant paradigm to redress the cumulative negative impacts of tourism development (Bramwell et al., 2017; Pagliara et al., 2021; Ruhanen et al., 2015; Zhong et al., 2011).

Sustainable tourism is firmly positioned in the United Nations 2030 Agenda for Sustainable Development Goals (SDGs) and potentially involves, directly or indirectly, all the goals. In particular, the Agenda includes, among other things, these targets: (1) implementing policies to promote sustainable tourism that creates jobs and encourages local culture and products (target 8.9); (2) developing tools to monitor sustainable development impacts for sustainable tourism (target 12.b); (3) increasing economic benefits for the sustainable use of marine resources, including through sustainable management of fisheries, aquaculture and tourism (target 14.7); (4) preserving biodiversity and ecosystems through sustainable tourism that helps reduce waste and consumption (goal 15). Therefore, it is desirable to provide practical advances in tourism sustainability measurement that help policymakers verify existing tools or design coherent new policies of sustainable tourism development towards the SDGs. In this field, monitoring tourism development through a broad set of indicators can be strategically important to combine community needs with the sustainable management of natural heritage and cultural resources.

A strand of literature has recognised sustainability indicators as a solid methodology and has proposed different types of indicators (mainly descriptive and related to specific destinations or geographical areas). Scholars have also renowned the need for composite indicator methodologies to design proper tourism planning and management practices and actions (Arbolino et al., 2021; Asmelash & Kumar, 2019; Choi & Sirakaya, 2006). Butler's seminal contribution (1999) argued that sustainability is 'meaningless' without indicators. Likewise, international organisations have accepted composite indicators measuring tourism impacts and sustainable development as fundamental tools for monitoring strategic policies towards sustainable tourism (UNWTO, 2004) and for communicating to society and destination stakeholders. However, practical assessments in tourism research based on composite indicators are still ongoing (Blancas et al., 2016; OECD, 2016; Torres-Delgado & Saarinen, 2014).

This paper helps fill this research gap by building and validating a composite indicator of sustainable tourism, the Sustainable Tourism Index (SusTour-Index), which recognises the economic, environmental and social dimensions as the three main interrelated facets of tourism sustainability. This study performs a multi-modelling approach to identify the most suitable methodology to summarise the elementary indicators according to the hierarchical structure of the SusTour-Index. Different combinations of weighting and aggregation methods are tested, resulting in alternative models of the SusTour-Index within the same theoretical framework (OECD, 2008; Saisana et al., 2011). The hierarchical structure of the SusTour-Index is validated in all 21 Italian regions at the NUTS-2 level, using a large set of elementary indicators from official statistical sources.Footnote 1 Italy was chosen as one of the most significant worldwide destinations (5th for tourist arrivals by UNWTO, 2018) in light of the specific strengths and weaknesses of the tourism sector and the overall implications in terms of sustainability (Castellano et al., 2019). Besides, Italian tourism policies are defined and managed at the regional level.

The paper presents theoretical and methodological contributions, opening rooms for future research in the practical assessment of economic, environmental and social impacts, and supports policymakers in planning and managing sustainable tourism development.

2 Sustainable Tourism Composite Indicators: A Review

2.1 The Role of International Institutions

Since the 1990s, international institutions have recognised sustainability indicators as relevant tools for efficient destination policy making and for planning and management processes, providing an integrated information system to assess the impact of tourism, meant as economic activity, on the environment and society (UNWTO, 1996, 2004, 2005). Over the years, authoritative international institutions—e.g. World Tourism Organization of the United Nations (UNWTO), European Commission (EC), European Environment Agency (EEA), and Organization for Economic Cooperation and Development (OECD)—have developed theoretical frameworks and methodologies for composite indicators of sustainable tourism, often based on sustainable development standards.

The Pressure-State-Response (PSR) model proposed by OECD (1994) and its extended version of the Driver-Pressure-State-Impact-Response (DPSIR) model by EEA (1999) represents the seminal analytical frameworks to analyse the interactions between human behaviour and the environment. Despite being more focused on environmental aspects, the PSR and DPSIR models are considered reference points for scholars when selecting relevant indicators in every research field, trying to classify each indicator according to the element of the DPSIR model.

The DPSIR model allows for analysing the impact of human activities on the environment and society's response to these changes. The DPSIR approach includes indicators able to describe the specific sector itself and the related economic and policy considerations placed in the context of sustainable development. The interaction between tourism and the environment are identified by indicators to be developed for each dimension of the DPSIR model. The connection system provided by the DPSIR model requires the initial establishment of the causal relationship between the elementary indicators and the dimensions to which they belong. This is not a simple task for tourism indicators. The impact of tourism cannot be defined a priori as a cause or effect because of its heterogeneity (e.g. geographical dimension, type of tourism, seasonality). Therefore, the DPSIR model is suitable for studies on sustainable tourism aiming at investigating the impact of tourism on the environment, but a comprehensive approach is rather complicated.

UNWTO (2004) provided indicators for sustainable tourism measurement, identifying more than 150 sub-components and 768 elementary indicators developed by a wide range of experts. Moreover, the common issues of destinations are determined in twelve core indicators, labelled 'basic indicators', and included in a list of 'baseline indicators'. They can be customised according to the territorial level of analysis and enriched with other indicators to focus on the peculiarities of each specific destination (e.g., coastal, maritime, mountain).

The European Commission promoted tourism sustainability quite late by developing the European Tourism Indicators System (ETIS), a toolkit of sustainable destination management, presenting four fundamental dimensions of tourism sustainability (European Commission, 2016): environmental impact, social and cultural impact, economic value, and destination management.

2.2 Sustainable Tourism Composite Indicators in the Literature

Considerable progress has emerged in academic research on sustainable tourism (Ruhanen et al., 2015), moving beyond the myopic view of the environmental perspective towards a holistic approach, which also includes economic and social dimensions and political/institutional aspects (Choi & Sirakaya, 2006; Ko, 2005). An intense debate was therefore aimed at effectively supporting decision-makers in achieving the sustainable development goals through indicators capable of providing an integrated information system to assess the impact of tourism (UNWTO, 2004, 2005).

An indicator-based approach to measuring tourism sustainability is a growing research topic, but many composite indicators remain theoretical conceptualisations with marginal empirical implementation (Blancas et al., 2016; Lee & Hsieh, 2016; Ruhanen et al., 2015). Analyses mainly focus on single tourism destinations (e.g. region, island, park) and use different pillars and indicators (local approach), which compromise their comparability at the regional and national level (Chris & Sirakaya, 2006; McCool et al., 2001; Torres-Delgado & Saarinen, 2014). Currently, scholars call for developing a more organic system of indicators for assessing tourism development and impact (Budeanu et al., 2016).

Table 1 summarises the main composite indicators of sustainable tourism proposed by the literature, explaining concepts (dimensions and innovativeness), elementary indicators (economic, environmental, social and others), methodology (approaches, weighting, aggregation), advantages and disadvantages.

Table 1 Sustainable tourism composite indicators in the literature

The composite indicators of sustainable tourism mainly focus on three typologies of elementary indicators: economic, environmental and social (Blancas et al., 2010a, 2010b; Bonett, & Wright, 2015; Lozano-Oyola et al., 2019a, 2019b; Torres-Delgado & Lόpez Palomeque, 2018). Some scholars considered the institutional (Asmelash & Kumar, 2019) and patrimonial (Pérez et al., 2013) dimensions and integrated the elementary indicators of tourism and services in the framework of sustainable tourism (Castellani & Sala, 2010).

From a methodological perspective, several authors adopted the EEA’s DPSIR model (Asmelash & Kumar, 2019; Castellani & Sala, 2010; Pulido Fernández & Sánchez Rivero, 2009; Torres-Delgado & Lόpez Palomeque, 2018). Composite indicators were also built on expert opinions or participatory processes (Asmelash & Kumar, 2019; Blancas et al., 2010a). Several methodologies have been applied, which have both advantages and disadvantages; however, most proposed composite indicators suffer from the lack of consensus on the best methodology for their construction and validation (Torres-Delgado & Lόpez Palomeque, 2018).

Therefore, building a system of sustainable tourism indicators that is scientifically advanced, methodological solid and useful for decision-making represents the current challenge for scholars (Blancas et al., 2016; Tanguay et al., 2013) with a variety of implications.

First, sustainability's multidimensionality makes it hard to aggregate a considerable amount of information (Butler, 1999; Castellani & Sala, 2010; Miller, 2001) and identify the universal list of elementary indicators. UNWTO (2004) has identified more than 150 sub-components and 768 sustainable indicators with twelve baselines that can be customised according to the territorial level of analysis and destination characteristics (e.g. coastal, maritime, mountain). Furthermore, both academics and international institutions have proposed selection criteria for indicators to improve comparability and reliability at the different geographical levels required (European Commission, 2016; Tanguay et al., 2013).

Second, once the hierarchical structure has been identified, the choice of the appropriate methodology (i.e. for weighting and aggregating elementary indicators and pillars/sub-pillars) can be affected by subjective judgments, which reflect on the outcome of the composite indicator (e.g. Mayer, 2008; Singh et al., 2009; Torres-Delgado & Lόpez Palomeque, 2018).

Although diverse research fields have applied the multi-modelling as a methodology to gather and summarise the information contained in a set of elementary indicators (Castellano & Rocca, 2016, 2019; Saisana et al., 2011, 2020), it is still almost unexplored in the field of tourism sustainability (Mikulić et al., 2015).

3 Material and Methods

This study builds and validates a composite indicator of tourism sustainability (SusTour-Index), which summarises the multidimensionality of sustainable tourism in the economic, environmental and social dimensions. The research design considered both the guidelines of international institutions (EC, 2016; EEA, 1999, 2003; OECD, 1994; UNWTO, 2004, 2005) and the literature's milestones on composite indicators of sustainable tourism (Table 1).

The multi-modelling approach combines a comprehensive spectrum of methodological choices, which generate a plurality of SusTour-Index models coherently with the underlying theoretical framework. While most studies considered one specific methodology for building a composite indicator of sustainable tourism, we obtain as many SusTour-Index models as combinations of weighting and aggregation methods within each dimension of sustainability. This made it possible to determine the effect of varying methodological assumptions in treating the indicators on the outcomes (scores and rankings) of the SusTour-Index, guiding the choice of the most suitable methodology once the uncertainty analysis has been performed (Saisana et al., 2011).

The SusTour-Index was structured following the OECD (2008) guidelines, which outline an ideal sequence of steps for building rigorous composite indicators: (1) developing the theoretical framework, (2) selecting elementary indicators, (3) treatment of missing data, (4) multivariate analysis, (5) data normalisation, (6) weighting and aggregation, (7) uncertainty analysis. The SusTour-Index was designed with maximum transparency in each step which allowed us to retrace step-by-step the composite indicator's construction.

3.1 Developing the Theoretical Framework and Selecting Elementary Indicators

The first two steps of the OECD procedure defined the conceptual structure of the SusTour-Index and the elementary indicators that give the best evidence of each dimension's meaning (sub-index) of tourism sustainability according to the hierarchical architecture in pillars and sub-pillars (Fig. 1; Tables 2, 3, 4).

Fig. 1
figure 1

The hierarchical structure of the SusTour-Index

Table 2 Economic dimension of the SusTour−Index
Table 3 Environmental dimension of the SusTour−Index
Table 4 Social dimension of the SusTour−Index

The homonymous sub-indexes summarise each of the three dimensions of tourism sustainability. They are adequately organised in pillars, and two (economic and environment) out of three cases, sub-pillars. The economic sub-index captures all the economic output or externalities of tourism activities that contribute to the region development, employment growth, sustainability of tourism enterprises, and international appealing. The environment sub-index identifies tangible and intangible assets related to tourism that usually have an adversarial relationship with the environment. The social sub-index entails aspects of tourism—i.e. human rights (health and security), gender equity, political participation—that affect social communities' sustainability. Each dimension can be considered individually (sub-index) or in combination with the rest of the system (composite indicator), allowing the simultaneous evaluation of all dimensions by linking the regional economy to tourism and its impact on the natural and social environment.

We collected an initial list of 310 elementary indicators. The primary selection criteria (Tanguay et al., 2013) allowed filtering the initial set of indicators (104, 119 and 87 for the economic, environmental and social dimensions, respectively) to obtain a more concise list of 120 indicators based on four principles: (1) relevance of the elementary indicators concerning the three dimensions of sustainable tourism; (2) their frequency of use in the academic research (see Table 1); (3) coverage of the UNWTO's central issues of sustainable development in tourism; (4) their replicability over time. Subsequently, three secondary criteria (Rajaonson & Tanguay, 2012) allowed filtering of the list of 120 indicators to a set of 75 core indicators (34, 21 and 20, respectively): (1) uneven data availability at the regional level; (2) consistency of indicators with the regional policy; (3) relevance of indicators in capturing the territorial heterogeneity among Italian regions.

In addition to tourism-specific indicators, the SusTour-Index also includes a few more general indicators unrelated to tourism but can affect the sustainable development of destinations and the quality of the tourism experience. Torres-Delgado and Lόpez Palomeque (2018) suggested that these indicators—although not directly related to tourism—were kept as they capture relevant aspects of sustainability that are likely to impact or be modified by the tourism activity (see Tanguay et al., 2013). Their relevance to appropriately characterise the three dimensions considered for tourism management justifies the inclusion in the system (Lozano-Oyola et al., 2019b).

The conceptual framework of the SusTour-Index was validated in the Italian regions. The candidate elementary indicators for all 21 regions were obtained from the main national (Istat, ISPRA, Bank of Italy) and international (Eurostat, UNESCO, Foundation for Environmental Education) institutional sources.

The hierarchical structure of the SusTour-Index required elementary indicators to be first synthesised into the pillars (I level) and then the pillars into their sub-index (II level). It can imply multiplicative effects on subjective judgments that involve each step of the construction process (Becker et al., 2017; Mayer, 2008; Singh et al., 2009). This is why we did not choose any model of the SusTour-Index a priori. We used the multi-modelling approach to assess the robustness of the composite indicator by performing a set of 23 models of the SusTour-Index, each obtained as an appropriate mixture of weighting and aggregation schemes consistent with the underlying theoretical framework.

3.2 Data Treatment, Multivariate Analysis and Normalisation

The third step concerns the strategies for handling missing data and outliers. In particular, outliers were detected by comparing the absolute values of skewness and kurtosis of each elementary indicator, respectively, with the critical thresholds of 2 and 3.5 (OECD, 2008).

The fourth step assesses the statistically-determined structure of the data set to the theoretical framework. Cross-correlation analyses between elementary indicators within each pillar and between sub-pillars allowed checking whether the nested structure of the SusTour-Index was well defined (Booysen, 2002; Saisana & Philippas, 2012). Moreover, Principal Component Analysis (PCA) verified the suitability of the data structure to the theoretical framework, and Cronbach's alpha assessed the pillars' internal consistency. An acceptable alpha should range between 0.70 and 0.90, as higher values may suggest that some indicators in the construct are redundant (Tavakol & Dennick, 2011). However, some scholars suggested that alpha values lower than 0.70 could be accepted (Bonett & Wright, 2015; Spiliotopoulou, 2009).

The fifth step makes the indicators with different units of measurements and/or orders of magnitude dimensionless and, therefore, comparable with each other. Elementary indicators were standardised using the adjusted z-scores, which involve the use of the traditional standardisation formula for each indicator \(\left( {I_{q} } \right)\) adjusted so that the average is 100 and the standard deviation 10:

$$z_{qr} = \frac{{x_{qr} - \overline{x}_{q} }}{{\sigma_{q} }} \cdot 10 \cdot d + 100$$
(1)

where \(\overline{x}_{q}\) and \(\sigma_{q}\) are, respectively, the average and the standard deviation of \(I_{q}\); d stands for the directional adjustment which allows indicators to be corrected when their polarity is discordant with the direction of the latent pillar or sub-pillar they contribute to measuring; d is equal to 1 if higher values of the indicator denote better conditions ('the bigger, the better') and -1 otherwise ('the bigger, the worst'). The adjusted z-score allows indicators to be converted to a common scale, preserving the relative distances. It shifts the measurement scale on the positive axis, avoids the negative scale problem for geometric aggregation, and is compatible with the aggregation methods performed in the later steps, i.e. the Mazziotta-Pareto aggregation function (Mazziotta et al., 2010).

3.3 Weighting and Aggregation

While the weighting criteria allow assigning weights to each indicator and/or pillar/sub-pillar, the aggregation methods enable elementary indicators to be progressively summarised according to the structure of the composite indicator. When used in a benchmarking framework, weighting and aggregation schemes require caution due to the effects on the composite indicator (Saisana et al., 2011). This is particularly true for the SusTour-Index due to its hierarchical structure, which requires a two-level aggregation procedure (indicators and pillars) using weights at each level.

Weighting methods' choice was strongly dependent on their compatibility with the aggregation methods (OECD, 2008).Footnote 2 In particular, we tested:

  1. (1)

    Equal weighting (EW), which does not mean any weight. A weighting system is implicitly introduced by the informative value of each indicator. EW is a homogeneous weight system and not an absence of weights (OECD, 2008).

  2. (2)

    A data-driven weighting procedure based on principal component analysis (PCA). It allows weights to be obtained endogenously based on data correlation. The rationale of PCA is to group individual indicators that share a large amount of common variance (Mikulić et al., 2015). Indeed, PCA synthesises the original m standardised variables in a smaller number \(k\) (\(< m\)) of uncorrelated variables (principal components, PCs), which are the new indicators that explain most of the observed variance. If \(k > 1\), the weights of the \(j\)-th component is equal to the related eigenvalue \(\lambda_{j}\) rescaled for the total explained variance \(\mathop \sum \limits_{j \le k}^{ } \lambda_{j}\) (normalised variability explained by the j-th component); while if \(k = 1\), the weight is equal to the factor loadings, scaled to unity sum. The PCA weighting method considers the correlation among the indicators, providing a weighting system for the components to their relevance (Castellano & Rocca, 2019; Munda, 2016).

As regards aggregation, we used: (1) linear aggregation; (2) geometric aggregation; (3) Mazziotta-Pareto (MP) method; (4) Wroclaw taxonomic approach; (5) Borda's rule. The first four allow the metric structure of the indicators to be preserved (cardinal approaches) since the differences between the regions can be expressed in terms of index scores. Borda's rule replaces the index scores with the rankings (ordinal approach), losing the magnitude of differences between regions.

The linear aggregation consists of computing the arithmetic mean of elementary indicators (or pillars) for each pillar (or each sub-index). However, full compensability could affect it, i.e. poor performance in some indicators can be compensated by sufficiently high values of other indicators. Geometric aggregation ensures that there is no possibility of full compensability of low results in one indicator with high results in other indicators.

The MP method is a non-linear aggregation approach based on the hypothesis that elementary indicators are not substitutable (i.e. they do not compensate each other). This involves the introduction of a 'penalty' for entities not showing balanced values of elementary indicators (Mazziotta & Pareto, 2016; Mazziotta et al., 2010). The MP method aggregates standardised indicators by using the arithmetic mean adjusted by a penalty coefficient, which captures the variability of each unit across the set of indicators (i.e. horizontal variability).

Wroclaw's taxonomic approach consists in ranking units in relation to their Euclidean distance from the ideal unit, which is the unit with the best value for that indicator:

$$D_{ij} = \sqrt {\mathop \sum \limits_{q = 1}^{Q} \left( {x_{iq} - x_{jq} } \right)^{2} }$$
(2)

where \(x_{iq}\) is the value of unit i for the indicator q and \(x_{jq}\) is the value of the ideal unit for the same indicator q.

Borda's method allows overcoming the problem of 'plurality rule' (i.e. the winner is the unit more often ranked in the first position), declaring the unit with the highest total score as the winner (OECD, 2008). Borda's method aggregates n units sorting them into binary relations for pairwise comparisons and assigns a score to each unit in relation to the position assumed in the ranking for each pillar:

$$B_{i} = \mathop \sum \limits_{k = 1}^{n} \left( {n - k} \right) \cdot S_{k}$$
(3)

where \(S_{k}\) shows the number of times that the unit i is at the \(k_{th}\) position, \(\left( {n - k} \right)\) is the corresponding score, while the final rank is based on the total score \(\left( {B_{i} } \right)\) received by each unit.

4 Results and Discussion

4.1 The Conceptual Structure of the SusTour-Index

The SusTour-Index is composed of 75 elementary indicators at the regional level referring to the year 2017 and organised into pillars and sub-pillars within the economic (D1), environmental (D2) and social (D3) sub-index. The economic dimension consists of 34 elementary indicators, structured into five pillars (Table 2): Economic value (7 elementary indicators); Sustainability of tourism enterprises (5); International appeal of tourism, which in turn is organised in the two sub-pillars of International tourism demand (4) and International tourism satisfaction (4); Tourism demand (6); Cultural heritage organisations (8).

The environmental dimension shows 21 elementary indicators (Table 3). It is structured into four pillars: Energy and water consumption (5 indicators); Sustainable energy management (4); Waste management (6); Environmental value with the two sub-pillars of quality of the environment (3) and Promotion of natural heritage (3).

The social dimension is composed of 20 elementary indicators (Table 4), structured into four pillars: Security (6 indicators); Health (3); Mobility (5); Gender balance (6).

4.2 Measuring the Dimensionality of the SusTour-Index

The institutional nature of the statistical sources ensured high data quality with only 21 missing values (1.3% of total data), mostly concerning Aosta Valley and the autonomous provinces of Trento and Bolzano. The imputation strategy depended on the type of missing data: the regional value was split for count data, and the regional median value was used for score variables. One outlier was detected for eight indicators and two outliers for one indicator, which were winsorised (suggested when outliers are less than 5%), except for one case that was treated through the Box-Cox transformation.

The set of elementary indicators that best converge with the theoretical framework was identified through an iterative procedure, avoiding negative correlations, low correlations (\(\rho < 0.33\)) and very high correlations (\(\rho > 0.92\)) (OECD, 2008). This thumb rule required a case-by-case analysis because even if the correlation was outside the range, some indicators were kept when considered crucial in the conceptual structure. The detailed analysis of the correlation structure within and between pillars confirms the higher correlation of each indicator to its pillar than any other, suggesting that the allocation of the elementary indicators to a specific pillar, inside each dimension of the SusTour-Index, is consistent both from a theoretical and statistical perspective (Saisana et al., 2020). As shown in Table 5 (third column), all the Cronbach's alpha values fall between acceptability ranges, except for the 'Environmental value'. PCA confirmed the suitability of the data's underlying structure to the theoretical framework and, therefore, the unidimensional latent structure for each pillar (Table 5, fourth and fifth columns). All eigenvalues are higher than 1, and one component captures more than 70% of the total variance within each pillar, except a few cases (i.e. 'Sustainability of tourism enterprises', 'International appeal', 'Environmental value') for which two components are required. Multivariate analysis was not performed on the pillars of 'Sustainable energy and water management' and 'Health' because of the low number of elementary indicators (fewer than 5 indicators are also critical for computing Cronbach's alpha).

Table 5 Statistical dimensionality of the SusTour-index structure

4.3 Weighting and Aggregating: The Multi-modelling Approach

Once all the indicators were made dimensionless, the two-level aggregation procedure allowed elementary indicators to be summarised into the pillars (I level) and the pillars into their sub-index (II level). By adequately combining the weighting and aggregation methods (sub-Sect. 3.3) in both levels, 23 different models of the SusTour-Index were estimated for the Italian regions.

Table 6 details the weighting system and the aggregation procedure used for each model of the SusTour-Index and for both its levels in compliance with the OECD's compatibility criteria (2008). Two sets of models of the SusTour-Index can be distinguished. The first set of 14 models (M1–M14) (Table 6, left panel) was estimated using EW on both levels of aggregation and combining three aggregation methods on the first level (linear, geometric, MP) and five aggregation methods on the second level (linear, geometric, MP, Wroclaw, Borda). On the other side, the introduction of the PCA weighting method on at least one aggregation level, combined with different aggregation techniques, allowed us to estimate another set of 9 models (M15-M23) (Table 6, right panel) of the same composite indicator. In particular, the models of the SusTour-Index from M15 to M17 have EW on the first level and PCA weighting on the second one; the models from M18 to M22 use PCA weighting on the first level and EW on the second one. The last model M23 uses PCA weighting on both levels of aggregation.

Table 6 Multi-modelling approach

4.4 Assessing Reliability and Validity

Once the 23 models were performed, sensitivity analysis allowed us to evaluate the robustness of the SusTour-Index. Sensitivity is closely related to uncertainty analysis, which allows quantifying the impact of weighting and aggregation choices on the variation in the regions' rankings (scores), contributing to SusTour-Index well-structuring and improving the consistency of the results.

Descriptive statistics on rankings of Italian regions obtained from the 23 different models of the SusTour-Index represent a first way to evaluate the outcomes' stability. “Appendix 1” shows both synthesis measures and variability of the positions occupied by each region as the models (weighting and aggregation methods) change, allowing an evaluation of the criticalities in the construction of the SusTour-Index in terms of output instability. Tables 7, 8 and 9 also show the 95% confidence intervals of the median rank for each region, using bootstrap procedures (2000 samples) (Efron & Tibshirani, 1998) to measure the volatility of the rankings due to a change to the underlying methodology (Saisana et al., 2011).

Statistics show lower sensitivity for the first set of models in which the EW system was used than for the second set of models estimated using the PCA weighting method. The analysis of the central tendency (mean; median) and variability (standard deviation, SD; coefficient of variation, CV; median absolute deviation, MAD) of regional rankings estimated by the two sets of models of the SusTour-Index shows a strong agreement in the rankings among EW models (M1–M14) for each of the three sub-indexes. In particular, as regards the environmental and social sub-index, no region shows an SD of rankings higher than 3 if EW models were used, while all regions (except one for the social dimension) would exceed this SD value in the case of models with PCA weights (Tables 8, 9). Moving on to the economic sub-index, the SD higher than 3 is confirmed for more than 70% of regions if models with PCA weights were used, against 15% in the case of EW models (Table 7). Therefore, a higher variability is detected for the second set of models (M15-M23) for which the regional rankings are more sensitive to the chosen weighting and aggregation methods. The lower ranking stability resulting from models using the PCA weighting procedure is also shown by MAD. These results are also confirmed by the estimated confidence intervals, which are consistently lower for the first set of models (EW) for each dimension of tourism sustainability. As expected, lower volatility is observed for the top and bottom regions of the ranking (i.e. the best and worst performers are essentially the same regions across the different models of the SusTour-Index), while greater differences are found for the middle ranking regions.

Spearman's rank correlation helps measure the proximity of rankings obtained from the different models (“Appendix 2”). The Spearman coefficients, separately computed for the economic, environmental and social sub-index (Tables 10, 11, 12), are consistently positive for the EW models (M1–M14). Moreover, the methods of aggregation between the pillars (level II) based on linear and geometric aggregations are the most correlated with the remaining methods. With an average Spearman's rank coefficient higher than 0.90 within each sub-index, the rankings from the EW models are highly correlated to each other. In particular, the rankings from models combining EW methods with linear (M1–M5) or MP (M11–M14) aggregation on the first level are very similar. In fact, although a few differences are highlighted among middle-ranking regions, the best and worst performers are substantially the same regions across all these methods.

On the other side, rankings from the second set of models in which data-driven weights based on PCA were used (M15–M23) are much less correlated with each other, with Spearman coefficients even negative (Tables 10, 11, 12), and less correlated with those of the first set. More particularly regardig the economic sub-index, the most problematic models (M16, M18 and M19) all belong to the second set, which used PCA weights in one of the two levels of aggregation and EW with linear or geometric aggregation in the other level. Similarly, moving on to the environmental sub-index, problems arise for M17 and M23 whose rankings are negatively correlated with those from each other model. For the social sub-index, with few exceptions, most models (M15, M17, M18, M19, M23) show negative correlations with the other ones, especially with the EW models. The rankings from the M20-M22 are positively correlated with each other and, more moderately, with the EW models.

In a nutshell, models using PCA weights in at least one of the two hierarchical levels are less correlated with each other, regardless of the aggregation function adopted, letting us prefer EW models. Therefore, the models of the SusTour-Index with equal weights in both aggregation levels appear to be more suitable than those weighting the indicators through the PCA procedure.

4.5 Assessing Sustainable Tourism in the Italian Regions

One of the most relevant results of this research is that the proposed structure of the SusTour-Index contributed to making the rankings from different models robust, and rankings attained by the set of EW models are quite similar to each other. Ceteris paribus, this would allow the selection of those models of the SusTour-Index which are most correlated with all the others.

The M2 model was used to assess and discuss the Italian regions' relative performance in tourism sustainability. M2 model performs the linear aggregation in the first level and geometric aggregation in the second level. However, we would like to reiterate that one EW model of the SusTour-Index is as good as another given the high correlation between the rankings. Given the hierarchical structure of the SusTour-Index, a third step was required to aggregate through geometric mean the three sub-indexes—economic, environmental and social—into the overall Sus-Tour Index to provide insights into the broader tourism sustainability.

Considering the overall sustainable tourism development (M2 model), four clusters of regions can be identified (Fig. 2):

  1. (1)

    Very low scores (I quartile): Apulia, Basilicata, Molise, Calabria, and Sicily;

  2. (2)

    Low scores (II quartile): Abruzzo, Marche, Campania, Umbria, and Sardinia;

  3. (3)

    High scores (III quartile): Emilia Romagna, Friuli-Venezia Giulia, Latium, Liguria, Lombardy, and Piedmont;

  4. (4)

    Very high scores (IV quartile): Bolzano, Trento, Aosta Valley, Veneto, and Tuscany.

Fig. 2
figure 2

Map of Italian regions by quartile of SusTour-Index score

Tourism sustainability moves from the highest levels in the northern and central regions towards the lowest levels in the southern regions. The historical pattern of the territorial divide between northern and southern Italy re-emerges as indicated by the overall SusTour-Index, the three sub-indexes and the elementary indicators.

Looking within each cluster of regions by quartile of the overall SusTour-Index score, it is worth noting the relatively homogeneous rankings for the three dimensions of tourism sustainability (Fig. 3).

Fig. 3
figure 3

Italian regions by quartile of the overall SusTour-Index scores

The first two clusters present low tourism sustainability performances on each sustainability dimension. Molise and Basilicata keep better rankings in the environmental dimension (first quartile); Abruzzo and Campania are very close to the regions with a higher SusTour-Index (second quartile). Although high economic and social sustainability levels characterise the regions with a relatively high overall SusTour-Index (third quartile), they lack environmental sustainability (with the lowest level in Lombardy). The regions with the highest SusTour-Index (fourth quartile) perform very well in all dimensions, with Trento, Bolzano and Aosta Valley at the top of environmental and social sustainability.

The three sub-indexes and their elementary indicators evaluate possible drivers and reasons for the top regional positioning in sustainable tourism development.

Economic sustainability reflects the Italian economic scenario, characterised by the historic north/centre-south territorial divide (Castellano et al., 2016). The best regions (Tuscany, Veneto, Latium, and Emilia-Romagna) present a long-term economic development based on consolidated tourism activities and cultural heritage, which enhance the international tourism demand.

Sustainability of tourism enterprise (D1.2) plays a relevant role in Tuscany, Veneto, Latium, Emilia-Romagna and Lombardy, as specific elementary indicators indicate: tourism companies (ind.08), accommodation establishments (ind.09), ISO 14001 certification (ind.10). In Tuscany, Emilia-Romagna and Lombardy, additional value has been generated by EMAS registration (ind.11); Ecolabel license (ind.12) presents important values for Tuscany and Emilia-Romagna.

All regions show higher tourism arrivals (ind.21) and tourism presence in accommodation establishments (ind.22). They attract international tourism demand (D1.3.1), excluding Emilia-Romagna, as some indicators suggest: tourist stay of foreigners (ind.13), quota of foreign tourism on domestic tourism (ind.14) (excluding Lombardy), foreign tourism expenditure (ind.15), and foreign tourists revisiting a place (ind.16). Cultural heritage organisations (D1.5) represent other key elements of economic sustainability of the best Italian regions: heritage museum (ind.30), degree of promotion of the cultural offer of the state institutes (ind.32) except for Emilia Romagna, cultural events (ind.33), expenditure for cultural events (ind.34). Museum visitors (ind.25) and cultural heritage visitors (ind.26) are relevant drivers for Tuscany and Latium.

The southern regions have not yet managed to exploit the full potential of tourism for the sustainable development of the local economy. On the contrary, Campania, Sicily and Sardinia show a fair performance for tourism's economic sustainability, as characterised by important levels of diverse indicators.

Environmental sustainability appears patchy with some best performers: Abruzzo, Aosta Valley, Bolzano, Campania, Trento, and Molise. Their relatively low environmental impact has been driven by several key aspects, which show diversity in northern and southern regional behaviours. Reducing energy and water consumption (D2.1: ind35- ind38), sustainable energy and water management (D2.2: ind40- ind43) and waste management (D2.3: mainly ind48) represent the main investments of the northern regions, such as Aosta Valley, Bolzano and Trento. Molise adopts a similar approach in reducing energy and water consumption (D2.1: ind35- ind38).

The southern regions, Campania and Molise, reach their advantageous position for environmental sustainability by investing in the territory's environmental value (D2.4). It considers aspects related to the quality and promotion of the environment, such as the certification of the Blue Flag beach (only Campania, ind.50), favourable conservation of the habitat (ind.51), quality of bathing coasts (ind.52), sites of Community Importance (ind.53), Special Protection Area (ind.54) and Rete Natura (ind.55). Aosta Valley, Bolzano and Trento also excel in the sites of Community Importance (ind.53) and in the Special Protection Area (ind.54); Aosta Valley even in Rete Natura (ind.55).

The social sustainability of tourism has been concentrated in the north-east of Italy (Trento, Bolzano, Veneto, and Friuli-Venezia Giulia) and the Aosta Valley. These regions represent the best in almost all elementary indicators, presenting high family health care expenditure (ind.64) and cycle path density (ind.69). Furthermore, Trento, Bolzano and Aosta Valley have invested in hospitals (ind.63), while Bolzano, Friuli Venezia Giulia and Aosta Valley are spreading urban transport (ind.67). Gender balance (D3.4: ind.72–74) represents a key element of social sustainability in Trento and Bolzano.

5 Conclusion

This paper proposed a multi-modelling approach for building and validating a new composite indicator of sustainable tourism, the SusTour-Index, offering theoretical and methodological contributions and opening rooms for future research. Besides, it provides practical advancements in measuring tourism sustainability, supporting policymakers, institutions and practitioners in planning and managing tourism development.

From a theoretical perspective, the SusTour-Index implements a comprehensive framework to summarise the multidimensionality of tourism sustainability with a broad set of elementary indicators, effectively structured in pillars and sub-pillars within each sustainability dimension (economic, environmental, social). The rationale of the SusTour-Index required relevant indicators that captured the territorial heterogeneity across regions, avoiding misleading indicators for policy implications.

The multi-modelling approach validated the hierarchical structure of the SusTour-Index in all 21 Italian regions on a range of 23 alternative models, which allowed evaluating the impact of the different methodological choices in dealing with the elementary indicators inside each sustainability dimension. A notable strength of the SusTour-Index is represented by the overcoming of the close dependence on specific local destinations. The use of objective indicators from institutional sources made it possible to capture the economic, environmental, and social diversity of sustainable tourism among the Italian regions. The conceptual framework of the SusTour-Index can support policymakers as it also ensures comparability between regions and replicability over time.

From a methodological perspective, evaluating how a choice in dealing with elementary indicators affects the outcomes (scores and rankings) of the SusTour-Index provided new insights into this issue. The use of equal weights for sustainability indicators appeared to be a better strategy than potentially altering some indicators' importance through data-driven weighting approaches. The multi-modelling analysis also demonstrated the lower sensitivity of EW models to aggregation functions compared to models in which PCA weights were used. The aggregation functions used to build the SusTour-Index could be considered exchangeable when using the EW approach to facilitate their interpretation (Singh et al., 2009).

Some past research (see Mikulić et al., 2015) proved the opportunity of using equal weights for sustainability indicators in certain circumstances but relying exclusively on an illustrative comparison of indicator weights from available studies. This research made further efforts to verify the low sensitivity of EW systems by performing simulations on the same set of elementary indicators. The proposed framework of the SusTour-Index contributed to making the rankings of the different EW models robust; that is, the rankings (scores) obtained from the set of EW models are quite similar to each other. Given their interchangeability, therefore, it is possible to choose the model of the SusTour-Index that is most correlated with all the others. From a practical perspective, the SusTour-Index contributes to monitoring, planning, and managing the critical aspects of sustainable development, helping institutions and policymakers address effective and coherent policies towards the SDGs. The results on the Italian regions underline the value of the SusTour-Index, which can support practical assessments of tourism impacts and design evolutionary scenarios for sustainable destination development. The structure of the SusTour-Index can define specific dashboards for decision-makers and institutions to monitor the impact of economic tourism activities and their implications on the environment and society. An in-depth analysis can introduce strategic interventions, investments and policies that help redress the economic, social and environmental imbalances towards SDSs. Comparing performances across all dimensions identifies the best practices in pursuing tourism sustainability and possible actions to manage critical events and behaviours and invest in destinations' key pillars of sustainable development.

The research proposes to differentiate interventions and public investments by considering the value of overall sustainable tourism development (M2 model) in each Italian region combined with specific elementary indicators' role (and value) in the regional strategic plan. Overcoming generalised interventions for regional sustainable development, the SusTour-Index can direct and redirect specific strategies to optimise the allocation of public funds. In light of the overall value of the sustainable indicators and specific assessment of the elementary indicators, Italian institutions and policymakers could more efficiently implement NGEU (NextGenerationEU) investments and PNRR (National Recovery and Resilience Plan) strategies and actions. They can redirect funds and investments, taking into account the values of the indicators and the sustainable development goals to be achieved in each region.

Future research may consider the emerging role of technologies in addressing and managing sustainable tourism. Technology-driven innovation can enhance sustainability by addressing stakeholder (e.g. tourists, residents, workers, entrepreneurs) behaviours and managing tourism imbalances (e.g. overcrowding, gentrification, etc.). Smart technology tools can support sustainable decision-making, providing solutions to implement adaptive actions, such as redirecting and dispersing tourist flows from icon sites and attractions. For example, an app can offer users alternative routes and locations to visit a city when the number of tourists exceeds the carrying capacity of a specific area. Furthermore, digital technologies (e.g. social media campaigns) can create preferences and shape sustainable behaviours in destinations.

By advancing tourism planning and management, the SusTour-Index supports effective governance (institutions, destination management organisations, policymakers) in achieving sustainable tourism development by: monitoring the impacts and imbalances of tourism development, defining stakeholder rules, policy evaluations and monitoring practices; creating contingency plans for peak periods, emergencies and crisis; defining managerial tools to support destination decisions; introducing a valuable information system for destination stakeholders, including institutions, policymakers, tourism enterprises, community and other key players.

Interpreting the SusTour-Index as a tool for policy learning and policy change in societal behaviours can open new rooms for advances in participatory sustainable development that have received marginal attention (Rasoolimanesh et al., 2020). Stakeholders' engagement in sustainable development goals requires calibrating political-institutional strategies and actions with an inclusive and shared vision of sustainable development (Omrani et al., 2019; Pasquinelli & Trunfio, 2020).

While providing interesting food for future research, the present work does not come without its limitations. As the regional level is considered, weighting approaches adopted do not consider performance levels of sustainability indicators for each destination separately. Still, they take indicator performance levels for multiple destinations to obtain the weights. Therefore, further research is needed to establish guidelines for the weighting and aggregation of tourism sustainability indicators at different territorial levels (e.g. country, municipality, destination) while preserving their comparability and replicability. Moreover, this paper also calls for empirical validation in other countries, which can enhance the value of the conceptual framework of the SusTour-Index by enabling shared theoretical advances and practical tools for policymakers.

Last but not least, in the post-COVID-19 time, the conceptual framework of the SusTour-Index can offer new challenges for scholars called to shift from over-tourism to under-tourism. An efficient transition and crisis management will require a solid set of indicators to assess and monitor the tourism impact on regions and destinations. Future research could develop the conceptual framework of the SusTour-Index by including an additional dimension that measures the health and sanitary security protocols, opening new interdisciplinary research streams.