1 Introduction

In recent decades, with the rapid growth in available information, there has been a parallel growth of building synthetic indicators. The deep complexity of real world makes difficult to measure and evaluate the relevant aspects of the most phenomena, like well-being, human development, environmental sustainability, industrial competitiveness, and hard to capture them by using a single perspective. The scientific community has echoed this interest; accordingly, many scholars have focused their efforts on the development and improvement of a methodology known as Composite Indicator (CI). Constructing a CI (or index) is a popular approach to achieve a simplified numerical representation of a complex phenomenon. A CI is defined as a mathematical combination of individual indicators representing different dimensions into a single index, based on an underlying model of the multidimensional concept that is being measured (Saisana et al. 2005). Greco et al. (2019) make a huge effort in providing a wide collection of numerous publications on techniques and applications related to this field of research. Albeit CIs have gained much attention, they remain the subject of controversy. Their construction involves a series of advantages and disadvantages, some of them are mentioned below. In particular, as pointed out in Smith (2002), arguments for developing CIs include, among others, the possibility to present complex or multidimensional issues in one aggregate value, much easier to interpret than trying to find a trend in many separate indicators. Also, they enable to place performance at the centre of the policy arena. In fact, CIs are being increasingly recognised as a useful tool for policy analysis, benchmarking comparison, performance monitoring, public communication and decisions in various fields, including economy, environment, technology development and society, by many national and international organizations (Badea et al. 2011; Costa 2015; Filippetti and Peyrache 2011). Points against CIs construction, on the other hand, emphasise how they can send misleading or ineffective policy messages if they are poorly constructed or misinterpreted. Opponents specifically question the credibility of CIs, stressing the subjectivity that surrounds the various steps involved in their construction. According to the recommendations of the OECD document (OECD 2008), construction decisions range from the selection of a set of sub-indicators to the standardization method, weighting, and choice of the aggregation function.

The subject of this paper is an essential aspect of index aggregation, specifically the weighting of indicators, which has been subject to considerable scrutiny in recent scholarly works (Becker et al. 2017; Greco et al. 2019; Keogh et al. 2021). The methodological proposal put forward in this study is based on a data-oriented weighted method, known as Data Envelopment Analysis (DEA) (Charnes et al. 1978). When exact knowledge of the weights is not available, the DEA approach facilitates the aggregation of a number of quantitative sub-indicators. We address the issue of unit performance change over time within this framework by proposing a Functional Weighted Malmquist Productivity Index (FWMPI) that allows for comparative trend analysis. The novel aspect of the proposal is that it achieves units ranking by supplementing the analysis with Functional Data Analysis (FDA) tools. There are several practical reasons for considering functional data. Ramsay and Silverman (2005) described important characteristics of FDA and used a strong argument for that approach. In recent years, FDA methods have been used in a variety of fields, including medicine, economics, meteorology, and many others. In contrast, the use of a functional approach for constructing CIs is relatively new (see, for example, Fortuna et al. (2022)). In any case, the implementation of a functional approach in the construction of CIs offers various benefits. Firstly, FDA allows for a more flexible and nuanced representation of data by treating them as continuous functions, which can capture important features missed by other methods. For example, FDA enables the observation of indicator behavior over time, providing insights into its evolution. Secondly, the functional approach is particularly useful when data are not sampled at equally spaced time points, as it can handle irregular intervals. This is important in constructing CIs that are updated at different intervals depending on data availability. Thirdly, FDA allows for the introduction of new analytical tools that can complement the original data with valuable information. It is possible to incorporate external covariates or predictors into the functional model, which can help to explain the variation in the CI. This can lead to more accurate and robust CIs, as well as a better understanding of the relationships between different variables. These advantages are essential for guiding policy decisions and measuring progress towards development goals.

The rest of the paper is structured as follows. Section 2 describes the methodological context, with a focus on the proposed methodology. Section 3 presents the results of the simulated data analysis, while in Sect. 4 we summarise the application of FWMPI to well-being data. Some concluding remarks are given in Sect. 5.

2 Methodological background

In this section, we provide a detailed overview of the methodological approaches employed in our study. Specifically, we begin by examining the use of DEA and Benefit of the Doubt (BoD) weighting to construct a CI. Next, we delve into the use of an entropy-based Malmquist Productive Index to measure changes in productivity over time for the units of analysis. In the third subsection, we introduce the FWMPI as a new method for assessing productivity changes over time.

2.1 Data envelopment analysis and “Benefit of the Doubt”-weighting

When constructing CIs, there are several approaches to aggregating individual indicators. Dimensionality reduction methods, such as Principal Component Analysis (PCA) or Factor Analysis (FA), are commonly used to reduce the number of indicators to a smaller set of uncorrelated factors. These multivariate statistical techniques account for the highest variation in the dataset, replacing the original variables with the smallest possible number of factors that reflect the underlying structure of the data. However, these methods do not explicitly take into account the relative importance of each indicator in the final index and may result in a loss of information. An alternative approach is DEA, a non-parametric mathematical programming method that assesses the relative efficiency of homogeneous Decision Making Units (DMUs) in combining inputs to produce outputs. The first DEA model was proposed by Charnes et al. (1978) in management science to measure the production efficiencies of DMUs in combining inputs to produce outputs of goods or services. Since the mentioned pioneering work, various methods and models for ranking the originally efficient units have been put forward over the years. Cook and Seiford (2009) developed a classification of DEA models.

DEA models are based on the fundamental concept of evaluating the relative efficiency of homogeneous DMUs, such as companies, banks, countries, universities, and so on. The definition of DMUs has been deliberately kept broad to enable the use of DEA in a variety of applications. DEA is concerned with calculating an efficiency score ranging from 0 to 1. DEA assigns an efficiency score less than 1 to “inefficient" units while DMUs with a score equal to 1 are deemed “efficient". In general, among a group of DMUs, a unit with higher outputs but lower inputs has a better chance of achieving a high efficiency rank.

Due to its numerous advantages, the DEA non-parametric technique has recently been adopted as an appropriate method for CIs construction. The application of DEA to the field of CIs has been dubbed the Benefit of Doubt (BoD), which was first proposed by Melyn and Moesen (1991) and has since been used by Cherchye et al. (2007, 2008); Mahlberg and Obersteiner (2001), OECD (OECD 2008), Sahoo et al. (2017); Staessens et al. (2019), and Färe et al. (2019). The main advantage of using BoD in the construction of CIs is its flexibility in retrieving weights from the data itself, allowing for the aggregation of quantitative sub-indicators when exact weight knowledge is not available. The BoD model assumes that a unit good relative performance in a particular aspect or dimension of performance indicates that this unit considers that dimension to be relatively important when estimating these weights. Similarly, poor performance indicates that a unit places less importance on that dimension. As a result, BoD generates a set of endogenous weights that are most advantageous for every unit: each DMU has been placed in its most advantageous position. Thus, if a unit underperform in comparison to others, it cannot be attributed to the unfair weighting scheme since any other set of weights would have worsened the evaluated unit ranking position. The BoD method can be seen as a tool to combine performance sub-indicators without making explicit reference to the input(s).

In order to clearly convey the underlying idea, the BoD formulation can be presented in a step-wise fashion. Let us denote with \(CI_{j}\) the composite index for unit j, \(y_{ji}\) the value (possibly normalised) for the unit j on indicator i (\(i=1, \ldots ,m \)) and \(w_{i}\) the weight assigned to indicator i. In Step 1, the composite index score of a unit j is calculated as the ratio of the weighted sum of its sub-indicators to the sum of the benchmark sub-indicators \(y^{B}_i\):

$$\begin{aligned} CI_j=\frac{\sum _{i=1}^{m} w_{ji} y_{ji}}{\sum _{i=1}^{m} w_{ji} y_{i}^{B}} . \end{aligned}$$
(1)

Step 2 involves identifying the benchmark performance, which is determined endogenously:

$$\begin{aligned} CI_j=\frac{\sum _{i=1}^{m} w_{ji} y_{ji}}{\max \limits _{y_{z,i} \in \{ studied \hspace{0.3em} units\}} \sum _{i=1}^{m} w_{ji} y_{zi}} . \end{aligned}$$
(2)

Formally speaking, the presence of the max operator and its associated argument in the denominator of Eq. 2 indicates that the benchmark observation is derived from an optimization problem.

Step 3 specifies appropriate weights. Because the weights are endogenously selected in such a way that they can be inferred from looking at the relative strengths and weaknesses of each unit, this stage implies the BoD concept. Taking a normalisation constraint into account, emphasising that the most favorable weights are always applied to all observations, and ensuring that weights are not-negative, we have:

$$\begin{aligned} \begin{aligned}&CI_j=\max _{w_{ji}} \quad \frac{\sum _{i=1}^{m} w_{ji} y_{ji}}{\max \limits _{y_{z,i} \in \{ studied \hspace{0.3em} units\}} \sum _{i=1}^{m} w_{ji} y_{zi}}\\&\text {s.t.} \quad \sum _{i=1}^{m} w_{ji} y_{zi}\le 1\\&w_{ji}\ge 0.\\ \end{aligned} \end{aligned}$$
(3)

Formally, the BoD model in Eq. 3 is equivalent to the Charnes, Cooper, and Rhodes input-oriented constant-returns-to-scale (CRS) model (Charnes et al. 1978), with all indicators treated as outputs and a “dummy input" equal to one for all units. From a theoretically point of view, the departure to consistently derive aggregate CI using the BoD model is Koopmans’ theorem (Koopmans 1951) and revenue corollaries, as shown in Färe and Grosskopf (2004).

2.2 Entropy-based Malmquist productivity index

DEA literature offers the possibility to determine whether different DMUs are grown up, are regressed or remain unchanged in terms of their performance over time. The first contribution to measure the productivity change of a DMU over time is the Malmquist Productivity Index (MPI). The original so-called Malmquist index is a quantity index, introduced by Malmquist (1953) for analysing the consumption of inputs in a consumer theory context. Later, Färe et al. (1994) constructed a MPI index directly from input and output data using DEA. Färe et al. (1994) created a DEA-based MPI by combining Farrell’s efficiency measurement (Farrell 1957) with Caves’ productivity measurement (Caves et al. 1982), and decomposing it into two components to analyse productivity change due to technical efficiency and change due to technology. The DEA-based MPI relies on first constructing an efficiency frontier over the whole sample realised by DEA and then computing the distance of individual observations from the frontier.

Therefore, MPI is constructed by utilising distance functions that can serve as both input and output. More in detail, an input distance describes the production technology by observing the comparative reduction of the input vector, given an output vector (Coelli and Prasada Rao 2005); conversely, the output function distance mirrors a maximum proportional expansion of the output vector, given an input vector.

MPI can be formalised using either an output-oriented or an input-oriented method.

The output-oriented MPI is the focus of this research. Let us suppose to have a set of n DMUs, each with s unitary inputs denoted by a vector \({\textbf{x}}_{j}\) and m outputs denoted by a vector \({\textbf{y}}_{j}\), for \(j=1,\dots n\), over the periods t and \(t+1\). According to Färe et al. (1994), the DEA-MPI output for a given \(DMU_0\) at time \(t+1\) and t can be expressed mathematically as:

$$\begin{aligned} MPI_{0}=\left[ \frac{D_0^t(x_0^{t+1},y_0^{t+1})}{D_0^t(x_0^t,y_0^t)}\frac{D_0^{t+1}(x_0^{t+1},y_0^{t+1})}{D_0^{t+1}(x_0^{t},y_0^{t})}\right] ^{\frac{1}{2}} \end{aligned}$$
(4)

where \((x_0^{t+1},y_0^{t+1})\) and \((x_0^t,y_0^t)\) represent the input and output vector of the period \(t+1\) and t respectively, and the notation \(D_0^t(x_0^t,y_0^t)\) represents the distance from the period \(t_0\) to period \(t_1\) technology. The index in Eq. 4 is the geometric mean of two output-based Malmquist indices. A magnitude of MPI greater than 1 indicates progress (positive growth) from period \(t_0\) to period \(t_1\), whereas magnitudes equal to 1 and less than 1 indicate the status quo and productivity decay, respectively. The overall tendency in DMU productivity changes over time periods is traditionally obtained by taking the average of sequential productivity indices, implicitly assuming that all sectional indices are equally affecting the productivity level.

In his work, Fallahnejad (2017) suggested the use of Shannon’s entropy to derive more objective weights for aggregating MPIs, thereby eliminating the problem of equal weighting in the process.

Notably, the Shannon’s entropy technique is one of the most important methods for determining the relative weights of indicators in multi-criteria decision making contexts (Peykani et al. 2022). In information theory, the entropy weight method primarily uses the magnitude of the entropy value to measure the indicator weight contained in known data. The lower the entropy value, the greater the degree of differentiation and the more information that can be derived, and a higher weight should be given to the indicator on the target in the overall evaluation.

In the Fallahnejad’s approach the following basic steps can be enucleated:

Step 1, first the MPI matrix (Table 1) is created taking into account the n DMUs and MPI measures over times.

Table 1 MPI matrix

Step 2, the MPI matrix is normalised by dividing the value of each column by the sum of its column. Thus, the normalised value of the \(i-th\) MPI in the \(t-th\) sample is denoted by \(p_{jt}\) and its calculation is as follows:

$$\begin{aligned} p_{jt}=\frac{MPI_{jt}}{\sum _{j=1}^{n}MPI_{jt}}, \quad \forall i,t. \end{aligned}$$
(5)

The above normalisation allows to eliminate anomalies due to different measurement units and scales.

In Step 3, the entropy \(h_{t}\) for all normalised MPIs is calculated as:

$$\begin{aligned} h_{t}=-h_{0}{\sum _{j=1}^{n}p_{jt}lnp_{jt}}, \end{aligned}$$
(6)

where \(h_{0}=\frac{1}{ln (n)}\). For the convenience of calculation, \(p_{jt}=0\) is generally set when \(p_{jt}lnp_{jt}=0\).

Step 4 involves the computation of the degree of diversification, defined as:

$$\begin{aligned} d_{t}=1-h_{t}, \quad t=1,\dots k. \end{aligned}$$
(7)

As previously stated, the degree of diversification indicates the amount of useful information provided by the relevant MPI measures to the overall aggregated index. It follows that if the DMU productivity values are close, the weight of a given year can be considered weak in the aggregating process.

In Step 5 the degree of importance of MPI at time t is obtained by setting

$$\begin{aligned} w_{t}=\frac{d_{t}}{\sum _{s=1}^{k}d_{s}}, \quad t=1,\dots k, \end{aligned}$$
(8)

where \(\sum _{t=1}^{k} w_{t}=1 \).

In Step (6), the weighted MPI is calculated as:

$$\begin{aligned} WMPI_{j}={\sum _{t=1}^{k}w_{t}MPI_{jt}}, \quad j=1,\dots n. \end{aligned}$$
(9)

It is apparent that a model which generates scores that are nearly identical for all units would have little impact on the ranking and should, therefore, be deemed relatively unimportant.

2.3 The proposed approach: FWMPI

Although the traditional literature has demonstrated that the results of entropy-weighted methods are reliable and effective, some studies, primarily based on engineering practice, have raised concerns about their rationality in decision making. For example, Zhu et al. (2020) point out some flaws in the entropy-weighted method that result in distorted decision-making outcomes. The authors pointed out that when there are too many zeros in the measured values, the standardised results of the entropy-weighted method are prone to distortion. Therefore, the index with the lowest actual differentiation degree will be given too much weight. Secondly, in multi-index decision-making involving categorisation, the classification degree can accurately reflect the information content of the index. The entropy-weighted methods, on the other hand, only consider the index degree of numerical differentiation and ignore rank discrimination.

In this regard, we hold the view that entropy-weighted methods provide only a limited viewpoint in the context of the multi-index decision-making problem under consideration.

As shown in Sect. 2.2, in the construction of the weighted MPI, the Shannon’s entropy can be regarded as a diversity measure of DMUs. However, different indices could have been used to delineate such diversity, and different orderings of homogeneous DMUs could have been obtained depending on the diversity measures used.

To address this limitation, we propose a set of diversity indices based on a single continuous variable that graphically depict a diversity profile. Specifically, we refer to the diversity index of degree \(\beta \), that is \(\Delta _\beta \), proposed by Patil and Taillie (1979, 1982) to quantify the diversity of an ecological population, composed by N units, partitioned into S species (\(i=1,2,\ldots ,s\)) and formally expressed as:

$$\begin{aligned} \Delta _{\beta }=\sum _{i=1}^{s}\frac{(1-p_{i}^{\beta })}{\beta }p_{i}, \quad \beta \ge -1. \end{aligned}$$
(10)

In Eq. 10 the index captures the multidimensional aspect of diversity and can be considered a function of \(\beta \) with parameter p, where \(p_i= \frac{N_i}{\sum _{i=1}^s N_i}\) and \(N_i\) represents the number of units belonging to the \(i-th\) species.

From a mathematical point of view, that index makes sense for any real number \(\beta \), i.e. \(-\infty<\beta <+\infty \); however, to guarantee that \(\Delta _\beta \) has some desirable properties it is necessary to impose the restriction that \(\Delta _\beta \ge -1\) (Patil and Taillie 1979, 1982). In addition, as stressed by the same authors, it may not be useful to calculate and plot \(\Delta _\beta \) for \(\beta \ge 1\) as these profiles tend to converge rapidly beyond this point (Patil and Taillie 1979, 1982). Therefore, we examine \(\Delta _\beta \) as a function of \(\beta \) within the specified range of [−1, 1]. By plotting \(\Delta _\beta \) versus \(\beta \) we graphically obtain a diversity profile. The profiles of the \(\Delta _\beta \) family are decreasing and convex curves. It is easy to verify that some of the most frequently used indices of diversity are special cases of the one-parameter family \(\Delta _\beta \) of diversity indices. For each value of \(\beta \) we can obtain a diversity measure. To be specific, Shannon’s entropy is a particular case of \(\beta \) diversity profile, resulting when \(\beta \rightarrow 0\), whereas the Richness and Simpson indices are got posing \(\beta =-1\) and \(\beta =1\), respectively.

Following Di Battista et al. (2017), the one parameter family \(\Delta _\beta \) of diversity indices can be viewed as a function in a fixed domain rather than a sequence of observations. More in detail, the authors propose an alternative way to explain the diversity profile through the Functional Data Analysis (FDA). As pointed out in Ramsay and Silverman (2005), the underlying idea of FDA is to assume the existence of some functions, giving rise to the observed data. Analytic tools, such as the analysis of \(\beta \) profile derivatives graph, radius of curvature and length of a curve, which are essential components of FDA, can be exploited to further inspect the diversity profiles and rank the different units (Di Battista et al. 2017).

The framework described above was used to create a dynamic index, namely the FWMPI, that addresses the shortcomings of entropy-weighted methods. The FWMPI can be obtained by rewriting some steps of Fallahnejad’s original procedure. To be more specific, we propose measuring the degree of differentiation using the one-parameter family \(\Delta _\beta \) of diversity indices rather than the entropy value \(h_t\), with the ultimate goal of accounting for all possible diversity measures instead of just one represented by Shannon’s index. Accordingly, Steps 3–6 in the multi-index decision-making problem will be reformulated as follows. The degree of diversification is calculated as:

$$\begin{aligned} d_{t}=1-\Delta _{\beta }, \end{aligned}$$
(11)

while the degree of importance of MPI at time t is obtained by setting

$$\begin{aligned} w_{t}(\beta )=\frac{d_{t}(\beta )}{\sum _{s=1}^{k}d_{s}(\beta )},\quad \forall \beta . \end{aligned}$$
(12)

Finally, the functional weighted MPI is calculated according to Eq. 13:

$$\begin{aligned} FWMPI_{j}(\beta )=\sum _{t=1}^{k} w_{t}(\beta ) MPI_{jt}, \quad \forall \beta . \end{aligned}$$
(13)

It is critical to note that the functional tools can now be used to rank DMUs. In this regard, we pursue two alternatives.

First, the comparative analysis of the FWMPI curves is facilitated by considering the area under the curve (Di Battista et al. 2017) which provides a ranking mirroring both the level and the evolutionary dynamics of functions.

Given a set of n FWMPI functions, \(FWMPI_1 \, (\beta ), \, FWMPI_2 \,(\beta )\, \dots FWMPI_n \, (\beta )\), the DMUs can be sorted in descending order according to the area under the curve, defined as:

$$\begin{aligned} A_{i}=\int _{B}{FWMPI_i(\beta )d\beta , \quad i=1,\dots n.} \end{aligned}$$
(14)

Furthermore, by utilising the FDA approach, we can compute the functional depth rank. When dealing with a group of functions, the notion of depth for functional data enables us to establish the centrality of a function and generates an ordering of the sample curves from the center outwards. Functional depth is used to rank functional observations from most unusual to most common. The underlying concept is to figure out “how long" a curve stays in the middle of a group of them. Research on data depth has gained significant attention over the years, as evidenced by Cuevas et al. (2007), who offer a comprehensive review of data depth for high-dimensional or functional data.

In this paper, we employ the functional integrated depth introduced by Fraiman and Muniz (2001) to compute the integration of a univariate depth along the \(\beta \) axis. Let \(F_{n,\beta }\) be the empirical distribution of sample \(x_1(\beta ),\dots , x_n(\beta )\), the functional integrated depth can be defined as follows:

$$\begin{aligned} I_{i}=\int _{B} D_{i}(\beta ) d\beta , \quad \forall \beta \in B,\quad i=1,\dots n, \end{aligned}$$
(15)

where \(D_{i}(\beta )=1-|0.5-F_{n,\beta }(x_{i}(\beta ))|\). According to the values \(I_{i}\) it is possible to rank the sample curves in descending order, from the most central to the most outlying.

3 Simulation study

It is widely acknowledged that for CIs to be effective, they must be developed using robust methods that ensure both benchmarking and stability over time. In this section, we provide evidence to justify the credibility of our proposal by demonstrating how various orderings of homogeneous DMUs can arise due to the use of different diversity measures.

We set up a simulation experiment in the R environment (R Core Team 2023) to gather information on the ranking robustness when a dynamic weighted MPI is constructed using Shannon, Simpson, and Richness indices. To track the shift in ranking, we used the absolute value of the average shift in ranking (ARS) as a general measure of divergence. The ARS obeys the following formula:

$$\begin{aligned} ARS=\frac{1}{n}\sum _{i=1}^{n}|Rank_i (CIs_1)-Rank_i (CIs_2)|. \end{aligned}$$
(16)

Here, n represents the number of DMUs, while \(Rank_i(CIs_1)\) and \(Rank_i(CIs_2)\) refer to the position of the \(i-th\) DMU, as determined by the \(CIs_1\) obtained through a specific diversity index and the \(CIs_2\) obtained through another diversity index, respectively. The greater the divergence measure, the greater the sensitivity of CI to changes in the way indicators are weighted and aggregated; otherwise, the greater the robustness.

In addition, we have included some specific divergence measures in our study. This decision is based on the observation that metrics which indicate a particular outcome on average can sometimes be contradicted by an analysis of the individual DMUs average shift in ranking. Therefore, our simulation study also incorporates the maximum absolute shift in ranking (SR max), the proportion of absolute shifts in ranking that exceed five positions (SSR \(>5\)), and the proportion of absolute shifts in ranking that exceed ten positions (SSR \(>10\)).

In order to assess the ability of the dynamic CIs constructed using diversity indices to handle potential challenges, we created various scenarios. Specifically, we aimed to determine the degree to which the ranking of units is affected by the presence of outliers, collinearity and skewness between indicators, and variations in sample size. We perform a dataset where four single indicators (\(I_{1}, I_{2}, I_{3},I_{4}\)) are analysed over three different time periods.

The initial scenario was created to examine how extreme values affect the ranking of CIs constructed using different diversity measures based indices. In this scenario, we generated indicators from a Uniform distribution within the range of [5,15], without any collinearity present between them. Next, we introduced extreme values into the simulated dataset for each indicator, drawing from a Uniform distribution in accordance with the following scheme:

\(\tilde{I_1c}\) \(\sim \) U\((k_{1} max(I_{1});\, k_{2} max(I_{1}))\),

\(\tilde{I_2c}\) \(\sim \) U\((k_{1} max(I_{2});\, k_{2} max(I_{2}))\),

\(\tilde{I_3c}\) \(\sim \) U\((l_{1} min(I_{3});\, l_{2} min(I_{3}))\),

\(\tilde{I_4c}\) \(\sim \) U\((l_{1} min(I_{4});\, l_{2} min(I_{4}))\).

It is worth noting that the parameters \(k_{1}\) and \(k_{2}\) contribute to the outlying nature of extreme values on the right tail of the distribution, while \(l_{1}\) and \(l_{2}\) are responsible for the left tail. For this study, we have set \(k_{1}=2\), \(k_{2}=3\), \(l_{1}=0.2\), and \(l_{2}=0.3\). Additionally, we have increased the outlyingness of extreme values over time by raising \(k_{1}\) and \(k_{2}\) to 4 and 6 for the second time period, and to 6 and 10 for the third time period, respectively. To randomly select which observations would be contaminated, we have defined a contamination level of \(\epsilon \). Specifically, we have analyzed three different situations: no contamination (\(\epsilon =0\)), and contamination levels of \(2.5\%\) (\(\epsilon =0.025\)) and \(5\%\) (\(\epsilon =0.05\)), respectively.

In the second scenario, the interest was in generating correlated normal distributed indicators. To denote the strength of the linear relationship between two different indicators, we varied the collinearity between them, setting four values, namely 0, 0.25, 0.5 and 0.85.

The third scenario involved examining the changes in ranking position when dealing with skewed indicators. To achieve this, we generated values from a Weibull distribution, varying the shape and scale parameters.

Additionally, we compared the outcomes across different sample sizes for each data-generating mechanism, producing data with sample sizes of n = 30, n = 60, n=100, n=150, and n=200.

Tables 23, and 4 display the simulation outcomes for the scenario involving extreme values generated by the Uniform distribution as described earlier. These results are presented in Panel A.

Table 2 PANEL A-Simulation results under level of contamination \(\epsilon \)=0
Table 3 PANEL A-Simulation results under level of contamination \(\epsilon \)=0.025
Table 4 PANEL A-Simulation results under level of contamination \(\epsilon \) = 0.05

According to the average shift in ranking measures, there are differences in the rankings observed for both comparisons: Shannon vs. Simpson and Shannon vs. Richness. The largest values in the metrics utilized are observed when the sample size is increased and when we introduce a contamination level of 5\(\%\) of extreme values into the simulated data.

Upon analyzing the sensitivity of the outcomes under the correlation scenario (displayed in Panel B of Tables 567, and 8), we observe that the ranking divergence is primarily identified through specific SR metrics that indicate a notable degree of volatility as the sample size and correlation between indicators increase.

Table 5 PANEL B-Simulation results under level of correlation=0
Table 6 PANEL B-Simulation results under level of correlation=0.25
Table 7 PANEL B-Simulation results under level of correlation=0.50
Table 8 PANEL B-Simulation results under level of correlation=0.85

In addition, we examine the level of stability of the rankings in cases where we are dealing with skewed indicators generated from a Weibull distribution. The outcomes, which are presented in Tables 910, and 11, indicate that there is a considerable amount of variability in the ranking positions of each DMU, particularly when the indicators are drawn from a Weibull distribution with a slope parameter of 4 and a scale parameter of 3. This variability has a significant impact on the divergence measures, particularly when the sample size is increased to n=200.

Table 9 PANEL C-Simulation results from a WEIBULL with shape=2 and scale=1
Table 10 PANEL C-Simulation results from a WEIBULL with shape = 3 and scale = 2
Table 11 PANEL C-Simulation results from a WEIBULL with shape = 4 and scale = 3

Besides, in order to assess the effectiveness of the proposed methodology, we conduct a comparison between the performance of FWMPI and other methods that rely on a single diversity measure. This comparison is carried out across the various scenarios described earlier.

In particular, for the initial scenario, we compare the metrics used on rankings generated from uncontaminated artificial data with the metrics obtained when we introduced outliers into the data (Panel A: Tables 12, 13).

In the second scenario, for each index, we contrast the rankings resulting from uncorrelated data with those obtained when different levels of correlation are taken into consideration (Panel B: Tables 14,15,16 ).

Finally, we replicated the comparison of metrics for artificial data that exhibit different levels of indicator skewness (Panel C: Tables 17,18).

In the initial scenario, the results suggest that when outliers are present, the FWMPI methodology generally outperforms the alternative methods most of the time. Upon examining the outcomes of the second scenario, it becomes evident that the performance of the novel approach decreases as the sample size and level of correlation increase.

The findings from the last scenario imply that the proposed methodology demonstrates superior performance when dealing with smaller sample sizes and minimal skewness among the indicators.

Table 12 PANEL A (\(\epsilon = 0\) vs. \(\epsilon \) = 0.025)
Table 13 PANEL A (\(\epsilon = 0\) vs. \(\epsilon \) = 0.05)
Table 14 PANEL B (correlation = 0 vs. correlation = 0.25)
Table 15 PANEL B (correlation = 0 vs. correlation = 0.50)
Table 16 PANEL B (correlation = 0 vs. correlation = 0.85)
Table 17 PANEL C (WEIBULL with shape = 2 and scale = 1 vs. WEIBULL with shape = 3 and scale = 2)
Table 18 PANEL C (WEIBULL with shape = 2 and scale = 1 vs. WEIBULL with shape = 4 and scale = 3)

4 Empirical evidence: application to BES data

In this section, we present the findings of our FWMPI proposal on the well-being dataset. In recent years, several scholars have expressed concerns about the limitations of GDP as a measure of well-being and as a benchmark for evaluating and comparing the development of regions and countries (see, among others, Larraz and Pavia (2010); Costanza et al. (2009); Fleurbaey (2009)).

The concept of well-being and its measurement have been significant topics of discussion in European research debates. Following the EU “beyond GDP" initiative in 2009, many projects have been suggested and implemented to combine indicators, datasets, domains, and dimensions. The Stiglitz Commission, established in France, made the most notable contribution to this effort (Stiglitz et al. 2009).

In response to the need for measurements of individual and societal well-being that surpass conventional measures, such as GDP, Italy developed the BES framework (short for Benessere Equo e Sostenibile, which means equitable and sustainable well-being). According to Riccardini and De Rosa (2016), the BES is “a measurement tool for progress in Italy" and is based on the theoretical model published by the OECD (Hall et al. 2010). The BES conceptualises well-being as a phenomenon comprised of two fundamental elements: equity within and between generations, and sustainability from an environmental, economic, and social perspective.

It is worth mentioning that the objective of the BES project is to create statistical indicators that are deemed important for a country progress, using a formative approach (Diamantopoulos et al. 2008). This type of approach assumes that the indicators define the underlying trait that represents the phenomenon, as opposed to the reflexive approach, which assumes that the indicators reflect the phenomenon itself (Diamantopoulos and Winklhofer 2001; Diamantopoulos and Siguaw 2006; Maggino 2017). As per Diamantopoulos’ approach, the internal consistency of the formative indicators is of minimal relevance since two non-correlated indicators can both be significant for the same construct.

The BES indicators were chosen through a participatory process in which all sectors of society, including academics, institutions, associations, and citizens, expressed their preferences. The elementary indicators are clustered in 12 domains, namely: Health, Education and training, Work and work-life balance, Economic prosperity, Social relationships, Politics and institutions, Security, Subjective well-being, Landscape and Cultural heritage, Environment, Innovation, Research and Creativity, Quality of services (see BES (2013)).

Subsequently, the Italian National Institute of Statistics (ISTAT) has expanded the BES project by introducing other initiatives that focus more on the local level. The “Benessere Equo e Sostenibile dei Territori (BESdT)" is one such initiative, which applies the BES framework to a provincial (NUTS 3) scale.

Measuring well-being at the local level is highly significant in formulating policies aimed at achieving fair and sustainable development while taking into account the distinctive characteristics of each region (Calcagnini and Perugini 2019; Cracolici et al. 2018; Scott and Bell 2013; Nissi and Sarra 2018; Sarra and Nissi 2020). In Italy, the BES and BESdT frameworks are the only institutional statistical resources that systematically work towards this goal.

To apply our functional dynamic index, we utilize the BESdT framework, which evaluates well-being through a range of variables. Due to a lack of data and missing values, our analysis is confined to 103 provincial capital cities and only seven of the twelve domains from the original Ur-Bes dataset. In particular, we focus on the following pillars: “Health”, “Education and Training”, “Work and Life Balance”, “Environment”, “Safety”, “Politics”, “Quality of Services” (see Fig. 1).

Fig. 1
figure 1

Graphical representation of BESdT framework

Our goal is to employ the FWMPI to assess and illustrate the advancement of Italian provincial capital cities in promoting equitable and sustainable well-being between 2004 and 2017.

Table 19 presents the basic indicators that are used in constructing the dynamic composite index, and it also displays the summary statistics for the year 2017. Additionally, Fig. 2 depict the correlation matrices heatmap, categorised according to the BES pillars.

Table 19 Descriptive statistics of Indicators-year 2017
Fig. 2
figure 2

Heatmap of correlation matrices

By following our approach, it is possible to visualise the variations of the FWMPI within the range of [-1,1]. This graphical representation facilitates a thorough comparison of the index values across different provinces. As an illustration, we have included in Figs 3 and 4 the FWMPI values for a couple of provinces as \(\beta \) changes. This example demonstrates how the graphical representation can capture the subtleties of the index values and provides a framework for conducting similar analyses on other provinces.

Fig. 3
figure 3

FWMPI of Avellino city from 2004 to 2017 across the \(\beta \) domain

Fig. 4
figure 4

FWMPI of Piacenza city from 2004 to 2017 across the \(\beta \) domain

Specifically, we notice a contrasting pattern in the dynamic index within the beta domain (ranging from -1 to +1) for the two provinces being studied. Through the use of FDA tools, we can reconstruct the FWMPI profile within the \(\beta \) domain and derive a singular ranking of Italian provinces by calculating the area under the curve, as described in Sect. 2. This ranking is highly beneficial as it retains the valuable information contained within the entire domain of the diversity profile. In our context, it enables a precise and comprehensive assessment of the advancements made by Italian provincial capital cities in promoting equitable and sustainable well-being. Based on the area under the curve metric, as defined in Eq. 14, the cities of Avellino, Livorno, Milano, Lucca, and Bergamo demonstrated the most significant strides in promoting equitable and sustainable well-being during the studied time frame. Conversely, the cities with the lowest area under the curve were Bari, Ancona, Enna, Varese and Piacenza, experiencing a decline throughout the entire period. The comprehensive ranking obtained by calculating the area under the curve is presented in Table 20.

Table 20 Ranking of Provinces according to the area under the curve

The functional integrated depth, defined in Eq. 15, can be used to obtain an additional overall ranking. Table 21 displays the depth-ranking of the Italian province capital cities according to the FWMPI.

Table 21 Ranking of Provinces according to the depth measure

Piacenza, Bolzano, Parma, Aosta, Viterbo, and Milano are the most central cities, with Pescara, Matera, Brindisi, Pistoia, and Asti ranking last.

As a result, in the last positions there are the most outlying province capital cities. It should be noted that the depth-based rank does not allow for the inference of the direction of departure from the central observations.

5 Concluding remarks

The existing literature offers many different methods for constructing CIs. In accordance with ideas discussed in the DEA-framework, we addressed the issue of change of performance of units over time and proposed a novel method for capturing the dynamic of composite indices.

We accomplished this goal by developing a FWMPI. Our proposal is built on an extension of Shannon’s entropy, which is widely used in multi-index decision making problems. To assess the degree of differentiation of DMUs, we used a family of diversity indices based on a single continuous variable. Unlike entropy-based methods, which only provide a partial view of the multi-index decision making problem, our new dynamic CI captures the multidimensional aspect of diversity and can graphically depict the composite index trend. As a result, we have the distinct advantage of supplementing the analysis with FDA tools.

We documented some pitfalls in using entropy-weighted based methods to further support the utility of our novel approach.

A simulation experiment has shown how different rankings of homogeneous DMUs can occur when different diversity measures are used. We ascertained the loss in robustness of CIs rankings under various scenarios that take into account outlier contamination, correlation and skewness among indicators, and sample size variation. The applicability and versatility of our proposal are validated by applying FWMPI to BESdT data. FDA tools like the area under the curve and the depth measure give us useful information about the order and dynamics of the FWMPI trajectories. The depth measure describes the centrality of functional data, whereas the area under the FWMPI curves allows for a clear, unambiguous ranking of Italian province capital cities across the entire domain.

An intriguing future research direction would be to apply the novel approach to other CIs to judge the evolutionary dynamic in various application areas.