A Functional approach for constructing dynamic Composite Indicators

Sarra, Annalina; Nissi, Eugenia; Evangelista, Adelia; Di Battista, Tonio

doi:10.1007/s10260-023-00728-8

A Functional approach for constructing dynamic Composite Indicators

Original Paper
Open access
Published: 07 November 2023

(2023)
Cite this article

Download PDF

You have full access to this open access article

Statistical Methods & Applications Aims and scope Submit manuscript

A Functional approach for constructing dynamic Composite Indicators

Download PDF

Annalina Sarra¹,
Eugenia Nissi²,
Adelia Evangelista¹ &
…
Tonio Di Battista ORCID: orcid.org/0000-0003-2139-7273¹

663 Accesses
Explore all metrics

Abstract

This paper contributes to the research on the development of comparable composite indicators by introducing a Functional Weighted Malmquist Productive Index that allows for comparative trend analysis. In analogy with entropy-based weighted methods, this novel dynamic indicator is derived by measuring the degree of diversification of the single method through a family of diversity indices. The paper has the merit of proposing a new dynamic composite indicator that supplements the analysis with Functional Data Analysis (FDA) tools that provide us with useful information about the order and dynamics of the composite index trajectories. The simulation study set up in this paper raises doubts about the robustness of the entropy-based weighted methods while the application of the new index to well-being dataset highlights its practical appeal.

Measuring Well-Being Over Time: The Adjusted Mazziotta–Pareto Index Versus Other Non-compensatory Indices

Article 08 February 2017

Building Statistical Indicators of Equitable and Sustainable Well-Being in a Functional Framework

Article 21 May 2019

E pluribus, quaedam. Gross Domestic Product out of a Dashboard of Indicators

Article Open access 30 March 2024

1 Introduction

In recent decades, with the rapid growth in available information, there has been a parallel growth of building synthetic indicators. The deep complexity of real world makes difficult to measure and evaluate the relevant aspects of the most phenomena, like well-being, human development, environmental sustainability, industrial competitiveness, and hard to capture them by using a single perspective. The scientific community has echoed this interest; accordingly, many scholars have focused their efforts on the development and improvement of a methodology known as Composite Indicator (CI). Constructing a CI (or index) is a popular approach to achieve a simplified numerical representation of a complex phenomenon. A CI is defined as a mathematical combination of individual indicators representing different dimensions into a single index, based on an underlying model of the multidimensional concept that is being measured (Saisana et al. 2005). Greco et al. (2019) make a huge effort in providing a wide collection of numerous publications on techniques and applications related to this field of research. Albeit CIs have gained much attention, they remain the subject of controversy. Their construction involves a series of advantages and disadvantages, some of them are mentioned below. In particular, as pointed out in Smith (2002), arguments for developing CIs include, among others, the possibility to present complex or multidimensional issues in one aggregate value, much easier to interpret than trying to find a trend in many separate indicators. Also, they enable to place performance at the centre of the policy arena. In fact, CIs are being increasingly recognised as a useful tool for policy analysis, benchmarking comparison, performance monitoring, public communication and decisions in various fields, including economy, environment, technology development and society, by many national and international organizations (Badea et al. 2011; Costa 2015; Filippetti and Peyrache 2011). Points against CIs construction, on the other hand, emphasise how they can send misleading or ineffective policy messages if they are poorly constructed or misinterpreted. Opponents specifically question the credibility of CIs, stressing the subjectivity that surrounds the various steps involved in their construction. According to the recommendations of the OECD document (OECD 2008), construction decisions range from the selection of a set of sub-indicators to the standardization method, weighting, and choice of the aggregation function.

The subject of this paper is an essential aspect of index aggregation, specifically the weighting of indicators, which has been subject to considerable scrutiny in recent scholarly works (Becker et al. 2017; Greco et al. 2019; Keogh et al. 2021). The methodological proposal put forward in this study is based on a data-oriented weighted method, known as Data Envelopment Analysis (DEA) (Charnes et al. 1978). When exact knowledge of the weights is not available, the DEA approach facilitates the aggregation of a number of quantitative sub-indicators. We address the issue of unit performance change over time within this framework by proposing a Functional Weighted Malmquist Productivity Index (FWMPI) that allows for comparative trend analysis. The novel aspect of the proposal is that it achieves units ranking by supplementing the analysis with Functional Data Analysis (FDA) tools. There are several practical reasons for considering functional data. Ramsay and Silverman (2005) described important characteristics of FDA and used a strong argument for that approach. In recent years, FDA methods have been used in a variety of fields, including medicine, economics, meteorology, and many others. In contrast, the use of a functional approach for constructing CIs is relatively new (see, for example, Fortuna et al. (2022)). In any case, the implementation of a functional approach in the construction of CIs offers various benefits. Firstly, FDA allows for a more flexible and nuanced representation of data by treating them as continuous functions, which can capture important features missed by other methods. For example, FDA enables the observation of indicator behavior over time, providing insights into its evolution. Secondly, the functional approach is particularly useful when data are not sampled at equally spaced time points, as it can handle irregular intervals. This is important in constructing CIs that are updated at different intervals depending on data availability. Thirdly, FDA allows for the introduction of new analytical tools that can complement the original data with valuable information. It is possible to incorporate external covariates or predictors into the functional model, which can help to explain the variation in the CI. This can lead to more accurate and robust CIs, as well as a better understanding of the relationships between different variables. These advantages are essential for guiding policy decisions and measuring progress towards development goals.

The rest of the paper is structured as follows. Section 2 describes the methodological context, with a focus on the proposed methodology. Section 3 presents the results of the simulated data analysis, while in Sect. 4 we summarise the application of FWMPI to well-being data. Some concluding remarks are given in Sect. 5.

2 Methodological background

In this section, we provide a detailed overview of the methodological approaches employed in our study. Specifically, we begin by examining the use of DEA and Benefit of the Doubt (BoD) weighting to construct a CI. Next, we delve into the use of an entropy-based Malmquist Productive Index to measure changes in productivity over time for the units of analysis. In the third subsection, we introduce the FWMPI as a new method for assessing productivity changes over time.

2.1 Data envelopment analysis and “Benefit of the Doubt”-weighting

When constructing CIs, there are several approaches to aggregating individual indicators. Dimensionality reduction methods, such as Principal Component Analysis (PCA) or Factor Analysis (FA), are commonly used to reduce the number of indicators to a smaller set of uncorrelated factors. These multivariate statistical techniques account for the highest variation in the dataset, replacing the original variables with the smallest possible number of factors that reflect the underlying structure of the data. However, these methods do not explicitly take into account the relative importance of each indicator in the final index and may result in a loss of information. An alternative approach is DEA, a non-parametric mathematical programming method that assesses the relative efficiency of homogeneous Decision Making Units (DMUs) in combining inputs to produce outputs. The first DEA model was proposed by Charnes et al. (1978) in management science to measure the production efficiencies of DMUs in combining inputs to produce outputs of goods or services. Since the mentioned pioneering work, various methods and models for ranking the originally efficient units have been put forward over the years. Cook and Seiford (2009) developed a classification of DEA models.

DEA models are based on the fundamental concept of evaluating the relative efficiency of homogeneous DMUs, such as companies, banks, countries, universities, and so on. The definition of DMUs has been deliberately kept broad to enable the use of DEA in a variety of applications. DEA is concerned with calculating an efficiency score ranging from 0 to 1. DEA assigns an efficiency score less than 1 to “inefficient" units while DMUs with a score equal to 1 are deemed “efficient". In general, among a group of DMUs, a unit with higher outputs but lower inputs has a better chance of achieving a high efficiency rank.

Due to its numerous advantages, the DEA non-parametric technique has recently been adopted as an appropriate method for CIs construction. The application of DEA to the field of CIs has been dubbed the Benefit of Doubt (BoD), which was first proposed by Melyn and Moesen (1991) and has since been used by Cherchye et al. (2007, 2008); Mahlberg and Obersteiner (2001), OECD (OECD 2008), Sahoo et al. (2017); Staessens et al. (2019), and Färe et al. (2019). The main advantage of using BoD in the construction of CIs is its flexibility in retrieving weights from the data itself, allowing for the aggregation of quantitative sub-indicators when exact weight knowledge is not available. The BoD model assumes that a unit good relative performance in a particular aspect or dimension of performance indicates that this unit considers that dimension to be relatively important when estimating these weights. Similarly, poor performance indicates that a unit places less importance on that dimension. As a result, BoD generates a set of endogenous weights that are most advantageous for every unit: each DMU has been placed in its most advantageous position. Thus, if a unit underperform in comparison to others, it cannot be attributed to the unfair weighting scheme since any other set of weights would have worsened the evaluated unit ranking position. The BoD method can be seen as a tool to combine performance sub-indicators without making explicit reference to the input(s).

In order to clearly convey the underlying idea, the BoD formulation can be presented in a step-wise fashion. Let us denote with $CI_{j}$ the composite index for unit j, $y_{ji}$ the value (possibly normalised) for the unit j on indicator i ($i=1, \ldots ,m $) and $w_{i}$ the weight assigned to indicator i. In Step 1, the composite index score of a unit j is calculated as the ratio of the weighted sum of its sub-indicators to the sum of the benchmark sub-indicators $y^{B}_i$:

$$\begin{aligned} CI_j=\frac{\sum _{i=1}^{m} w_{ji} y_{ji}}{\sum _{i=1}^{m} w_{ji} y_{i}^{B}} . \end{aligned}$$

(1)

Step 2 involves identifying the benchmark performance, which is determined endogenously:

$$\begin{aligned} CI_j=\frac{\sum _{i=1}^{m} w_{ji} y_{ji}}{\max \limits _{y_{z,i} \in \{ studied \hspace{0.3em} units\}} \sum _{i=1}^{m} w_{ji} y_{zi}} . \end{aligned}$$

(2)

Formally speaking, the presence of the max operator and its associated argument in the denominator of Eq. 2 indicates that the benchmark observation is derived from an optimization problem.

Step 3 specifies appropriate weights. Because the weights are endogenously selected in such a way that they can be inferred from looking at the relative strengths and weaknesses of each unit, this stage implies the BoD concept. Taking a normalisation constraint into account, emphasising that the most favorable weights are always applied to all observations, and ensuring that weights are not-negative, we have:

$$\begin{aligned} \begin{aligned}&CI_j=\max _{w_{ji}} \quad \frac{\sum _{i=1}^{m} w_{ji} y_{ji}}{\max \limits _{y_{z,i} \in \{ studied \hspace{0.3em} units\}} \sum _{i=1}^{m} w_{ji} y_{zi}}\\&\text {s.t.} \quad \sum _{i=1}^{m} w_{ji} y_{zi}\le 1\\&w_{ji}\ge 0.\\ \end{aligned} \end{aligned}$$

(3)

Formally, the BoD model in Eq. 3 is equivalent to the Charnes, Cooper, and Rhodes input-oriented constant-returns-to-scale (CRS) model (Charnes et al. 1978), with all indicators treated as outputs and a “dummy input" equal to one for all units. From a theoretically point of view, the departure to consistently derive aggregate CI using the BoD model is Koopmans’ theorem (Koopmans 1951) and revenue corollaries, as shown in Färe and Grosskopf (2004).

2.2 Entropy-based Malmquist productivity index

DEA literature offers the possibility to determine whether different DMUs are grown up, are regressed or remain unchanged in terms of their performance over time. The first contribution to measure the productivity change of a DMU over time is the Malmquist Productivity Index (MPI). The original so-called Malmquist index is a quantity index, introduced by Malmquist (1953) for analysing the consumption of inputs in a consumer theory context. Later, Färe et al. (1994) constructed a MPI index directly from input and output data using DEA. Färe et al. (1994) created a DEA-based MPI by combining Farrell’s efficiency measurement (Farrell 1957) with Caves’ productivity measurement (Caves et al. 1982), and decomposing it into two components to analyse productivity change due to technical efficiency and change due to technology. The DEA-based MPI relies on first constructing an efficiency frontier over the whole sample realised by DEA and then computing the distance of individual observations from the frontier.

Therefore, MPI is constructed by utilising distance functions that can serve as both input and output. More in detail, an input distance describes the production technology by observing the comparative reduction of the input vector, given an output vector (Coelli and Prasada Rao 2005); conversely, the output function distance mirrors a maximum proportional expansion of the output vector, given an input vector.

MPI can be formalised using either an output-oriented or an input-oriented method.

The output-oriented MPI is the focus of this research. Let us suppose to have a set of n DMUs, each with s unitary inputs denoted by a vector ${\textbf{x}}_{j}$ and m outputs denoted by a vector ${\textbf{y}}_{j}$, for $j=1,\dots n$, over the periods t and $t+1$. According to Färe et al. (1994), the DEA-MPI output for a given $DMU_0$ at time $t+1$ and t can be expressed mathematically as:

$$\begin{aligned} MPI_{0}=\left[ \frac{D_0^t(x_0^{t+1},y_0^{t+1})}{D_0^t(x_0^t,y_0^t)}\frac{D_0^{t+1}(x_0^{t+1},y_0^{t+1})}{D_0^{t+1}(x_0^{t},y_0^{t})}\right] ^{\frac{1}{2}} \end{aligned}$$

(4)

where $(x_0^{t+1},y_0^{t+1})$ and $(x_0^t,y_0^t)$ represent the input and output vector of the period $t+1$ and t respectively, and the notation $D_0^t(x_0^t,y_0^t)$ represents the distance from the period $t_0$ to period $t_1$ technology. The index in Eq. 4 is the geometric mean of two output-based Malmquist indices. A magnitude of MPI greater than 1 indicates progress (positive growth) from period $t_0$ to period $t_1$, whereas magnitudes equal to 1 and less than 1 indicate the status quo and productivity decay, respectively. The overall tendency in DMU productivity changes over time periods is traditionally obtained by taking the average of sequential productivity indices, implicitly assuming that all sectional indices are equally affecting the productivity level.

In his work, Fallahnejad (2017) suggested the use of Shannon’s entropy to derive more objective weights for aggregating MPIs, thereby eliminating the problem of equal weighting in the process.

Notably, the Shannon’s entropy technique is one of the most important methods for determining the relative weights of indicators in multi-criteria decision making contexts (Peykani et al. 2022). In information theory, the entropy weight method primarily uses the magnitude of the entropy value to measure the indicator weight contained in known data. The lower the entropy value, the greater the degree of differentiation and the more information that can be derived, and a higher weight should be given to the indicator on the target in the overall evaluation.

In the Fallahnejad’s approach the following basic steps can be enucleated:

Step 1, first the MPI matrix (Table 1) is created taking into account the n DMUs and MPI measures over times.

Table 1 MPI matrix

A Functional approach for constructing dynamic Composite Indicators

Abstract

Similar content being viewed by others

Measuring Well-Being Over Time: The Adjusted Mazziotta–Pareto Index Versus Other Non-compensatory Indices

Building Statistical Indicators of Equitable and Sustainable Well-Being in a Functional Framework

E pluribus, quaedam. Gross Domestic Product out of a Dashboard of Indicators

1 Introduction

2 Methodological background

2.1 Data envelopment analysis and “Benefit of the Doubt”-weighting

2.2 Entropy-based Malmquist productivity index

2.3 The proposed approach: FWMPI

3 Simulation study

4 Empirical evidence: application to BES data

5 Concluding remarks

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation