1 Introduction

Data envelopment analysis (DEA) is a methodology based on mathematical programming for the assessment of technical efficiency of a set of decision making units (DMUs), which allows a data-driven construction of a piece-wise frontier, enveloping the data cloud of observations and, at the same time, the determination of the distance from each DMU to this frontier. DEA provides both an efficiency measure (score) for each of the assessed DMUs and information on the targets that have been used in the efficiency assessment in the case of inefficient DMUs. The targets are the levels of operation of inputs and outputs that would make the corresponding inefficient DMU perform efficiently. In consequence, the targets can play an important role in practice, since they indicate keys for the inefficient DMUs to improve their performance.

Since its origins [1], DEA researchers have introduced many ways of implementing the idea of technical inefficiency as the distance from each evaluated unit to the estimated frontier. Each one of these ways is related to a different way of projecting the DMU onto the frontier. So, we have radial measures [1, 2], directional distance functions [3], the hyperbolic measure [4], weighted additive measures [5, 6], the Russell measures [4], the enhanced Russell graph or slacks-based measure (SBM) [7, 8], and so on. However, from a benchmarking viewpoint, it is worth mentioning that traditional DEA models can yield targets that are determined by the “furthest” efficient projection to the assessed DMU; see [9].Footnote 1 In contrast, measures, based on the notion of least distance, determine closer targets, suggesting directions of improvement for the inefficient units that may lead them to the frontier with less effort. In recent times, the calculation of the least distance to the frontier for determining the closest targets has been associated with the application of the principle of least action (PLA) in physics to the sphere of performance evaluation [10].

Additionally, DEA has been recently linked to the construction of composite indicators in social science through the notion of the Benefit-of-the-Doubt (BoD) modelFootnote 2; see [11,12,13]. This is a DEA model without real inputs [14], where sub-indicators are all treated as outputs to generate an overall and objective aggregated indicator for each assessed DMU through the determination of its efficiency. In that model, a unique ‘virtual’ input equal to one for all units is included in the model.

However, as far as we are aware, no previous contribution to the definition of composite indicators by BoD in DEA has applied the PLA, when determining the final aggregated indicator for each unit, neglecting the potentiality of the approach as a benchmarking tool for the evaluated DMUs. Previous research has used the variety of measures within DEA to create BoD models. The usage of DEA to aggregate individual indicators was initiated by [14], who proposed a DEA radial model without inputs to evaluate macroeconomic performance based on four indicators. [11,12,13] popularized the usage of radial DEA models in the estimations of composite indicators by BoD. Recent contributions in this research area focus on the extensions of radial BoD toward the non-radial model, which takes into account the possible existence of slacks [15], the multiple-layer model that considers a hierarchical structure of indicators [16], the directional model to recognize the preference structure among indicators [17, 18], the directional model that allows dealing with undesirable output indicators [19], the robust and non-compensatory composite indicator based on the directional model [20], the composite indicator with imprecise data [21], the robust BoD model that considers external factors directly [22], the spatial directional robust BoD model [23], and a translation invariant directional distance function, which allows the computation of indicators for firms with zero values for outputs [24].

DEA is a nonparametric and deterministic technique for estimating technical efficiency. The literature contains an alternative parametric approach, called stochastic frontier analysis (SFA), which is grounded on statistical bases. An interesting characteristic of SFA is that the “goodness of fit” can be measured by statistical tests, while DEA lacks a goodness-of-fit tool. In this sense, as other authors have already pointed out [5, 7, 25, 26], the goodness of the measure of efficiency in DEA is checked by establishing a set of interesting properties, which the measure should satisfy. The above-cited authors have highlighted the following properties (adapted for output-oriented approaches): The measure should be greater or equal to one, with one signaling Pareto efficiency; units invariance; translation invariance; and strong monotonicity. Additionally, when DEA is used for aggregating sub-indicators, other properties can be also interesting. In particular, assigning zero weights to one or more sub-indicators (outputs) would imply that these performance criteria are (nearly) ignored in the construction of the composite performance score; see, for example, [23]. Finally, the application of the PLA should be considered essential, if the researcher needs to provide useful benchmarking information in the way of less demanding targets. Unfortunately, none of the existing measures in the literature satisfies all these properties at the same time.

In this paper, we fulfill this gap in the literature by introducing a BoD model in DEA for defining a composite indicator, which satisfies all of the above-mentioned properties. We have implemented a version of the Russell output measure, linked to the extension of full-dimensional efficient facets (FDEFs). In addition, and as a methodological contribution of our paper, we show—for the first time—how a Russell-type output measure, based upon the PLA and the extension of FDEFs in DEA, can be implemented in one step by a mixed integer linear program (MILP) model. Furthermore, for illustrative purposes, we apply the new approach to a dataset on the corporate social responsibility (CSR) activities of European food, beverage and tobacco manufacturing companies in 2017.

The paper unfolds as follows: In the following section, we briefly review the literature on DEA and the PLA. In Sect. 3, we introduce a new composite indicator. Section 4 includes an illustration of the new approach proposed in this paper, focused on CSR data, showing the empirical implications of the theoretical properties. Finally, Sect. 5 concludes.

2 The Principle of Least Action in DEA

Regarding a brief revision of the literature linked to the PLA, Refs. [27, 28] were the precursors of this line of research in DEA. Frei and Harker [29] suggested resorting to the Euclidean distance to measure technical inefficiency. Cherchye and Van Puyenbroeck [30] defined a least-distance model in an input-oriented-space framework. González and Álvarez [31] maximized the input-oriented Russell efficiency measure, determining the closest efficient targets. Later, [32] established that the general aim in efficiency evaluation should be to globally minimize the input and output slacks, regardless of which DEA measure is finally applied. Aparicio et al. [9] identified the closest targets for a dataset of international airlines, applying a new version of the SBM. More recent approaches are [10, 33,34,35,36]. For more details, see [37].

DEA measures, based on the determination of the least distance, usually lack the important property of strong monotonicity; see [33, 38, 39]. Nevertheless, [34] proved that the output-oriented version of the Russell measure [4] is a well-defined efficiency measure, satisfying strong monotonicity on the strongly efficient frontier, if efficiency is evaluated with respect to an extended facet production possibility set based on FDEF, instead of standard DEA technology. Figure 1 illustrates the idea behind this method graphically. In this figure, two FDEFs appear, AB and BC, which are ‘extended’ by the dot lines generating a new production possibility set, which contains the original one. In this figure, A, B and C are the set of extreme efficient points (corners) of the output production set. Additionally, [35] showed the reason why strong monotonicity fails. The use of FDEFs avoids the problems associated with the dimensionality of the strongly efficient frontier and other related problems [40,41,42,43].

Fig. 1
figure 1

The standard DEA technology vs the extended DEA technology

In the same line as [34], we will define in the next section a composite indicator, based upon the BoD model, which implements the determination of least distance, but ensuring strong monotonicity and the satisfaction of a set of key properties for efficiency measurement. From a methodological point of view, we show how to implement the model of [34] for the first time, by resorting to mathematical programming in just one step. Instead, [34] needed to identify all the FDEFs in a first stage and then, determine the least distance from each DMU, which is a highly time-consuming step for big problems.

In what follows, we introduce some definitions that are needed to understand the approach, based upon the extension of the traditional DEA technology; see [34, 35]. Nevertheless, we first need to introduce some notation. Consider that we have observed n DMUs that use m inputs to produce s outputs. These are denoted by \( \left( {X_{j} ,Y_{j} } \right) \), \( j = 1, \ldots ,n \). It is assumed \( X_{j} = \left( {x_{1j} , \ldots ,x_{mj} } \right) \ge 0_{m} \), \( j = 1, \ldots ,n \), and \( Y_{j} = \left( {y_{1j} , \ldots ,y_{sj} } \right) \ge 0_{s} \), \( j = 1, \ldots ,n \). The relative efficiency of each DMU0 in the sample is assessed with reference to the so-called production possibility set \( T: = \left\{ {\left( {X,Y} \right)|X\;{\text{can}}\;{\text{produce}}\;Y} \right\} \), which can be empirically constructed from the n observations by assuming several postulates; see [2]. If, in particular, variable returns to scale is assumed, then \( T \) can be characterized as follows:

$$ T = \left\{ {\left( {X,Y} \right) \in R_{ + }^{m + s} \left| {\sum\limits_{j = 1}^{n} {\lambda_{j} X_{j} } \le X,\;\sum\limits_{j = 1}^{n} {\lambda_{j} Y_{j} } \ge Y,\sum\limits_{j = 1}^{n} {\lambda_{j} } = 1,\lambda_{j} \ge 0,\;j = 1, \ldots ,n} \right.} \right\} $$
(1)

Hereinafter, we assume that each DMU is interested in maximizing outputs, while using no more than the observed amount of any input. This type of approach is called output oriented in the literature. In order to implement this approach, it is useful to introduce the output production set. In this sense, for each input vector, \( X \), let \( P\left( X \right) \) be the set of feasible (producible) outputs. Formally, \( P\left( X \right): = \left\{ {Y:\left( {X,Y} \right) \in T} \right\} \). On the other hand, given \( X \), the efficient frontier of \( P\left( X \right) \), also called the weak efficient frontier, is defined as \( \partial \left( {P\left( X \right)} \right): = \left\{ {Y \in P\left( X \right):\hat{Y} > Y\, \Rightarrow \hat{Y} \notin P\left( X \right)} \right\} \); see [28]. Following [44], in order to measure technical efficiency in the Pareto sense, it is necessary to isolate a certain subset of \( \partial \left( {P\left( X \right)} \right) \). We are referring to the strong efficient frontier, as defined below:

$$ \partial^{s} \left( {P\left( X \right)} \right): = \left\{ {Y \in P\left( X \right):\hat{Y} \ge Y,\hat{Y} \ne Y\, \Rightarrow \hat{Y} \notin P\left( X \right)} \right\} $$
(2)

Throughout the paper, with the aim of measuring technical efficiency, we will compare the actual performance of each DMU0 with respect to the points belonging to the strong efficient frontier.

In this paper and under the BoD model, it is assumed that all DMUs use a unique input equal to one. Hence, hereinafter, we will use the notation \( P\left( 1 \right) \) and \( \partial^{s} \left( {P\left( 1 \right)} \right) \) for denoting the output production set and its corresponding strong efficient frontier, respectively, both sets being common for all the units. Note, additionally, that due to the definition of \( P\left( X \right) \) and \( T \), \( P\left( 1 \right) = \left\{ {Y \in R_{ + }^{s} :\sum\nolimits_{j = 1}^{n} {\lambda_{j} Y_{j} } \ge Y,\sum\nolimits_{j = 1}^{n} {\lambda_{j} } = 1,\lambda_{j} \ge 0,\;\forall j} \right\} \). Furthermore, the set of all the extreme efficient points of \( P\left( 1 \right) \) will be denoted hereafter as \( E \). In this way, the polyhedron \( P\left( 1 \right) \) can be then rewritten equivalently by substituting \( j = 1, \ldots ,n \) by \( j \in E \): \( P\left( 1 \right) = \left\{ {Y \in R_{ + }^{s} :\sum\nolimits_{j \in E} {\lambda_{j} Y_{j} } \ge Y,\sum\nolimits_{j \in E} {\lambda_{j} } = 1,\lambda_{j} \ge 0,\;\forall j} \right\} \).

To check Pareto efficiency, it is usual to resort to the weighted additive model; see [6]. In particular, an output vector \( Y^{\prime} \in R_{ + }^{s} \) is assessed by (3) or by (4), being (4) the dual problem of (3).

$$ \begin{array}{*{20}l} {\text{Max}} \hfill & {\sum\limits_{r = 1}^{s} {w_{r}^{ + } s_{r}^{ + } } } \hfill & {} \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & { - \sum\limits_{j = 1}^{n} {\lambda_{j} y_{rj} } + s_{r}^{ + } \le - y^{\prime}_{r} ,} \hfill & {s_{r}^{ + } \ge 0,\quad r = 1, \ldots ,s} \hfill \\ {} \hfill & {\sum\limits_{j = 1}^{n} {\lambda_{j} } = 1,} \hfill & {\lambda_{j} \ge 0,\quad j = 1, \ldots ,n} \hfill \\ \end{array} $$
(3)
$$ \begin{array}{*{20}l} {\text{Min}} \hfill & { - \sum\limits_{r = 1}^{s} {u_{r} y^{\prime}_{r} } + \delta } \hfill & {} \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & { - \sum\limits_{r = 1}^{s} {u_{r} y_{rj} } + \delta \ge 0,} \hfill & {j = 1, \ldots ,n} \hfill \\ {} \hfill & {u_{r} \ge w_{r}^{ + } ,} \hfill & {r = 1, \ldots ,s} \hfill \\ \end{array} $$
(4)

If \( w_{r}^{ + } > 0 \) for all \( r = 1, \ldots ,s \), then it not difficult to prove that \( Y^{\prime} \) is a Pareto-efficient point of \( P\left( 1 \right) \), if and only if, the objective function of (3), or equivalently (4), equals zero.

Proposition 2.1

\( Y^{\prime} \in \partial^{s} \left( {P\left( 1 \right)} \right) \), if and only if the optimal values of models (3) and (4) are equal to zero.

Proof

See [6]. □

Now, before showing the version of the Russell output measure based on the PLA from [34], we turn to the definition of the traditional Russell output measure of technical efficiency [4, p. 149]: \( \varGamma \left( {Y_{0} } \right): = {\text{Max}}\left\{ {\frac{1}{s}\sum\nolimits_{r = 1}^{s} {\phi_{r} } :\left( {\phi_{1} y_{10} , \ldots ,\phi_{s} y_{s0} } \right) \in P\left( 1 \right),\;\left( {\phi_{1} , \ldots ,\phi_{s} } \right) \ge 1_{s} } \right\} \).

In the above model, \( \phi_{r} \) evaluates the relative proportional expansion rate of output \( r \), \( r = 1, \ldots ,s \), whereas the objective function averages these proportional rates of output expansion. Also, in the above formulation, the constraints \( \phi_{r} \ge 1 \), \( r = 1, \ldots ,s \) are the requirements for dominance. Additionally, by a change of variables, it is possible to prove that the traditional Russell output measure of technical efficiency is equivalent to the following formulation:

$$ \varGamma \left( {Y_{0} } \right) = 1 + {\text{Max}}\left\{ {\frac{1}{s}\sum\limits_{r = 1}^{s} {\frac{{s_{r}^{ + } }}{{y_{r0} }}} :\left( {Y_{0} + s^{ + } } \right) \in \partial^{s} \left( {P\left( 1 \right)} \right),\;s^{ + } = \left( {s_{1}^{ + } , \ldots ,s_{s}^{ + } } \right) \ge 0_{s} } \right\} , $$
(5)

where we use slacks and have changed \( P\left( 1 \right) \) to \( \partial^{s} \left( {P\left( 1 \right)} \right) \). It is worth mentioning that the traditional Russell output measure determines the furthest targets from DMU0 to the strong efficient frontier of \( P\left( 1 \right) \), since the objective function maximizes the (weighted) sum of the slacks and, at the optimum, \( s_{r}^{ + *} = \sum\nolimits_{j = 1}^{n} {\lambda_{j}^{*} y_{rj} } - y_{r0} = y_{r0}^{*} - y_{r0} \) for all \( r = 1, \ldots ,s \).

Next, we show the formulation of the PLA version of the Russell output measure by [34]. In particular, “Max” is changed by “Min” in (5) in order to determine closest targets instead of furthest targets.

$$ \varGamma_{{}}^{{\min} } \left( {Y_{0} } \right): = 1 + {\text{Min}}\left\{ {\frac{1}{s}\sum\limits_{r = 1}^{s} {\frac{{s_{r}^{ + } }}{{y_{r0} }}} :\left( {Y_{0} + s^{ + } } \right) \in \partial^{s} \left( {P\left( 1 \right)} \right),\;s^{ + } = \left( {s_{1}^{ + } , \ldots ,s_{s}^{ + } } \right) \ge 0_{s} } \right\} $$
(6)

Unfortunately, \( \varGamma_{{}}^{{\min} } \left( {Y_{0} } \right) \) lacks some properties. In particular, it does not fulfill neither translation invariance nor strong monotonicity. [34] studied the property of strong monotonicity in detail and its relationship with (6), showing that strong monotonicity fails. For this reason, these authors proposed a solution by identifying and extending FDEFs. In particular, they proved that the Russell output measure based on the PLA satisfies strong monotonicity, when we deal with an extended facet production possibility set instead of the usual DEA technology.

In what follows, we show the definition of the extended facet production possibility set. To this end, it is necessary to introduce some notation and definitions; see, for more details, [35, 42, 45]. We assume that \( P\left( 1 \right) \subset R^{s} \) contains at least an interior point, and therefore, its dimension equals \( s \).

Definition 2.1

An ‘efficient’ face \( F \) of \( P\left( 1 \right) \) is a face of \( P\left( 1 \right) \) such that it has at least one supporting hyperplane \( \sum\nolimits_{r = 1}^{s} {\mu_{r}^{{}} y_{r} } = \psi \), which presents nonnegative coefficients, that is, \( \mu_{r} \ge 0 \), \( r = 1, \ldots ,s \).

Definition 2.2

A full-dimensional ‘efficient’ facet (FDEF) of \( P\left( 1 \right) \) is an efficient face \( F \) of the polyhedron \( \left\{ {Y \in R_{ + }^{s} :\sum\nolimits_{j = 1}^{n} {\lambda_{j} Y_{j} } \ge Y,\;\sum\nolimits_{j = 1}^{n} {\lambda_{j} } = 1,\;\lambda_{j} \ge 0,\;\forall j} \right\} \) such that a set of \( s \) affinely independent extreme efficient points belongs to \( F \), and its corresponding supporting hyperplane \( \sum\nolimits_{r = 1}^{s} {\mu_{r}^{{}} y_{r} } = \psi \) presents strictly positive coefficients, that is, \( \mu_{r} > 0 \), \( r = 1, \ldots ,s \).

The graphical idea behind the extension of FDEFs illustrated in Fig. 1 needs to be mathematically formalized, before being used in our problem. Let us introduce some notation. Let \( J_{k} \) be the subset of extreme efficient points that span the kth FDEF, \( k = 1, \ldots ,K \). Finally, let \( \sum\nolimits_{r = 1}^{s} {\mu_{r}^{k} y_{r} } = \psi^{k} \) be the mathematical expression of the supporting hyperplaneFootnote 3 associated with \( J_{k} \). In this way, we can generate an empirical output production set, \( P_{{{\text{EEF}}\left( {J_{k} } \right)}} \left( 1 \right) \), as the intersection between the half-space generated by the supporting hyperplane \( \sum\nolimits_{r = 1}^{s} {\mu_{r}^{k} y_{r} } = \psi^{k} \) and the nonnegative orthant: \( P_{{{\text{EEF}}(J_{k} )}} \left( x \right) = \left\{ {y \in R_{ + }^{s} :{\mkern 1mu} \sum\nolimits_{r = 1}^{s} {\mu_{r}^{k} y_{r} } \le \psi^{k} } \right\} \). Using the last expression, the extended facet output production possibility set, denoted hereinafter as \( P_{\text{EXFA}} \left( 1 \right) \), is defined as \( P_{\text{EXFA}} \left( 1 \right): = \bigcap\limits_{k = 1}^{K} {P_{{{\text{EEF}}(J_{k} )}} \left( 1 \right)} \), which is equivalent to \( P_{\text{EXFA}} \left( 1 \right) = \left\{ {y \in R_{ + }^{s} :{\mkern 1mu} \sum\nolimits_{r = 1}^{s} {\mu_{r}^{k} y_{r} } \le \psi^{k} ,\;k = 1, \ldots ,F} \right\} \).

Regarding the strongly efficient frontier of \( P_{\text{EXFA}} \left( 1 \right) \), it is defined by analogy with \( \partial^{s} \left( {P\left( 1 \right)} \right) \) as \( \partial^{s} \left( {P_{\text{EXFA}} \left( 1 \right)} \right): = \left\{ {y \in P_{\text{EXFA}} \left( 1 \right):\hat{y} \ge y,\hat{y} \ne y\, \Rightarrow \hat{y} \notin P_{\text{EXFA}} \left( 1 \right)} \right\} \).

We now turn to the definition of the Russell output measure based on both least distance and the extended facet production possibility set. In this sense, by analogy with (6), we directly derive this measure simply changing the reference set \( \partial^{s} \left( {P\left( 1 \right)} \right) \) by \( \partial^{s} \left( {P_{\text{EXFA}} \left( 1 \right)} \right) \).

$$ \varGamma_{\text{EXFA}}^{{\min} } \left( {Y_{0} } \right): = 1 + {\text{Min}}\left\{ {\frac{1}{s}\sum\limits_{r = 1}^{s} {\frac{{s_{r}^{ + } }}{{y_{r0} }}} :\left( {Y_{0} + s^{ + } } \right) \in \partial^{s} \left( {P_{\text{EXFA}} \left( 1 \right)} \right),\;s^{ + } = \left( {s_{1}^{ + } , \ldots ,s_{s}^{ + } } \right) \ge 0_{s} } \right\} $$
(7)

3 A New Aggregated Indicator Based on DEA and the Principle of Least Action

Next, we define the new composite indicator (NCI).

$$ {\text{NCI}}_{\text{EXFA}}^{{\min} } \left( {Y_{0} } \right): = 1 + {\text{Min}}\left\{ {\frac{1}{s}\sum\limits_{r = 1}^{s} {\frac{{s_{r}^{ + } }}{{y_{r}^{U} - y_{r0} }}} :\left( {Y_{0} + s^{ + } } \right) \in \partial^{s} \left( {P_{\text{EXFA}} \left( 1 \right)} \right),\;s^{ + } = \left( {s_{1}^{ + } , \ldots ,s_{s}^{ + } } \right) \ge 0_{s} } \right\} $$
(8)

In contrast to the original Russell output measure based on both PLA and the extended facet production possibility set (7), in (8), the denominator of the objective function has been transformed in order to ensure that translation invariance holds. Specifically, \( y_{r}^{U} \) corresponds to the maximum observed in the sample for output \( r \), for all \( r = 1, \ldots ,s \). Using \( \left( {y_{r}^{U} - y_{r0} } \right) \) as denominator, instead of \( y_{r0} \), allows canceling any translation of the output data. Another possibility, which will allow us to define a translation invariance measure, would be directly using the ranges of each output in the denominators. However, as [46] pointed out, this last type of measure has a smaller discriminatory power with respect to the efficiency scores, which we would obtain for the set of DMUs. This is the reason why we prefer to resort to a unit-specific range as a denominator. Additionally, the NCI is well defined, if we adopt the next convention. If, for DMU0, output \( r^{\prime} \) satisfies that \( y_{{r^{\prime}0}} = y_{{r^{\prime}}}^{U} \), then there is no room for improvement according to data information, and, by convention, we will take \( \frac{{s_{{r^{\prime}}}^{ + *} }}{{y_{{r^{\prime}}}^{U} - y_{{r^{\prime}0}} }} = 0 \), where * denotes optimality. This same idea was previously suggested in the literature by [46], for defining the bounded adjusted measure. Nevertheless, seeking simplicity, hereinafter we will describe what we have to do to evaluate a unit such that \( y_{r0} \ne y_{r}^{U} \), for all \( r = 1, \ldots ,s \).

Moreover, and from a computational viewpoint, the NCI is not easy to calculate. This difficulty results from the complexity of determining the least distance to the frontier of a convex set (a polyhedron in our context) from an interior point of this set, as this problem is equivalent to minimizing a convex function on the complement of a convex set. This is known in optimization theory as the reverse convex best approximation problem [47].

Additionally, (8) is not a directly implementable optimization program in usual optimizers. Hence, we want to repress (8) in a standard way. To do so, we first prove a result, which states that if you have explicitly determined all the FDEFs of your database, then we can get the optimal value of (8) by calculating \( K \) simple ratios.

Proposition 3.1

\( {\text{NCI}}_{\text{EXFA}}^{{\min} } \left( {Y_{0} } \right) = 1 + \mathop {{\min} }\limits_{1 \le k \le K} \left\{ {\frac{{\psi^{k} - \sum\limits_{r = 1}^{s} {\mu_{r}^{k} y_{r0} } }}{{{\max} \left\{ {s\mu_{1}^{k} \left( {y_{1}^{U} - y_{10} } \right), \ldots ,s\mu_{s}^{k} \left( {y_{s}^{U} - y_{s0} } \right)} \right\}}}} \right\} \).

Proof

It is an adaptation, with other weights, of the proof of Proposition 5 in [34]. □

Following Proposition 3.1, we would have to determine all the FDEFs before solving (8). In particular, if we resort, for example, to the Qhull software, then we have to carry out a multistage procedure as follows. First, using Qhull we would get a set of hyperplanes. Second, we should check both the slopes and the offset of each of them in order to discard those, which cannot define an efficient face. After applying this filter, we would get a group of candidates. Third, we should analyze the units, which span each hyperplane, in order to select those, that actually define a FDEF. In the fourth stage, for each inefficient unit and FDEF, a ratio—as in Proposition 3.1—should be solved. Finally, in the fifth stage, the minimum value of these ratios would be the desired optimal value of (8) for each assessed DMU.

However, in contrast to the multi-stage procedure described above, we now go on to show, how to determine the optimal value of model (8), invoking Proposition 3.1, by solving a MILP for each inefficient unit, something that is implementable in any optimizer. To get that result, we resort to bi-level programming, which has been previously used in DEA in [48, 49].

Next, we show the model that makes it possible to determine the new composite indicator. To do that, we next specify and explain each constraint in a separate way.

Proposition 3.1 states that the NCI can be computed through the ratios \( {{\psi^{k} - \sum\limits_{r = 1}^{s} {\mu_{r}^{k} y_{r0} } } \mathord{\left/ {\vphantom {{\psi^{k} - \sum\limits_{r = 1}^{s} {\mu_{r}^{k} y_{r0} } } {{\max} \left\{ {s\mu_{1}^{k} \left( {y_{1}^{U} - y_{10} } \right), \ldots ,s\mu_{s}^{k} \left( {y_{s}^{U} - y_{s0} } \right)} \right\}}}} \right. \kern-0pt} {{\max} \left\{ {s\mu_{1}^{k} \left( {y_{1}^{U} - y_{10} } \right), \ldots ,s\mu_{s}^{k} \left( {y_{s}^{U} - y_{s0} } \right)} \right\}}} \), \( k = 1, \ldots ,K \), once we have determined the supporting hyperplane \( \sum\nolimits_{r = 1}^{s} {\mu_{r}^{k} y_{r} } = \psi^{k} \) associated with each FDEF of \( P\left( 1 \right) \). Note that this fractional form can be linearized by defining \( \left( {{\max} \left\{ {s\mu_{1}^{k} \left( {y_{1}^{U} - y_{10} } \right), \ldots ,s\mu_{s}^{k} \left( {y_{s}^{U} - y_{s0} } \right)} \right\}} \right) = 1 \), which can be also modeled in a linear way through constraints:

$$ \begin{array}{*{20}l} {s\mu_{r0} \left( {y_{r}^{U} - y_{r0} } \right) \le 1,} \hfill & {s\mu_{r0} \left( {y_{r}^{U} - y_{r0} } \right) \ge \delta_{r0} ,} \hfill & {r = 1, \ldots ,s} \hfill \\ {\sum\limits_{r = 1}^{s} {\delta_{r0} } \ge 1,} \hfill & {\delta_{r0} \in \left\{ {0,1} \right\},} \hfill & {r = 1, \ldots ,s} \hfill \\ \end{array} $$
(9)

In this way, if we work with supporting hyperplanes \( \sum\nolimits_{r = 1}^{s} {\mu_{r}^{{}} y_{r} } = \psi \) that satisfy the above constraints, the NCI can be determined through the minimization of \( \left( {\psi_{0} - \sum\nolimits_{r = 1}^{s} {\mu_{r0}^{{}} y_{r0} } } \right) \) on the set of all the supporting hyperplanes of FDEFs of \( P\left( 1 \right) \). Additionally, any feasible solution of the model, which makes it possible to determine the new composite indicator, must be related to a FDEF of \( P\left( 1 \right) \) and, at the same time, any FDEF of \( P\left( 1 \right) \) must be associated with a feasible solution of the new model to work properly. To obtain this, we need to incorporate the characteristics of a FDEF to the constraints of the model. First, the supporting hyperplane, which we want to determine, linked to a FDEF, \( \sum\nolimits_{r = 1}^{s} {\mu_{r0}^{{}} y_{r} } = \psi_{0} \), should be a valid inequality of \( P\left( 1 \right) \); that is,

$$ \begin{array}{*{20}l} {\sum\limits_{r = 1}^{s} {\mu_{r0} y_{rj} } - \psi_{0} + d_{j0} = 0,} \hfill & {d_{j0} \ge 0,} \hfill & {j \in E} \hfill \\ {\mu_{r0} \ge 0,} \hfill & {r = 1, \ldots ,s} \hfill & {} \hfill \\ \end{array} $$
(10)

Additionally, this valid inequality makes it possible to define a face of \( P\left( 1 \right) \): \( F_{0} : = \left\{ {Y \in P\left( 1 \right):\sum\nolimits_{r = 1}^{s} {\mu_{r0} y_{r} } = \psi_{0} } \right\} \ne \emptyset \). In particular, (11) ensures that \( F_{0} \) is not an empty set and there are \( s \) extreme efficient points, which belong to \( F_{0} \): \( \tau_{j0} = 1 \) implies \( d_{j0} = 0 \) and, therefore, \( \sum\nolimits_{r = 1}^{s} {\mu_{r0} y_{rj} } = \psi_{0} \).

$$ \begin{array}{*{20}l} {d_{j0} \tau_{j0} = 0,} \hfill & {\tau_{j0} \in \left\{ {0,1} \right\},} \hfill & {j \in E} \hfill \\ {\sum\limits_{j \in E} {\tau_{j0} } = s} \hfill & {} \hfill & {} \hfill \\ \end{array} $$
(11)

Let \( E_{0} \) be the set of points such that \( \tau_{j0} = 1 \). \( F_{0} \) is an efficient face of \( P\left( 1 \right) \). To ensure that the dimension of this face is maximal, that is, \( s - 1 \), the \( s \) extreme efficient points must also be affinely independent points. By definition, the unique solution of \( \sum\nolimits_{{j \in E_{0} }} {\alpha_{j} Y_{j} } = 0 \), \( \sum\nolimits_{{j \in E_{0} }} {\alpha_{j} } = 0 \) must be \( \alpha_{j} = 0 \) for all \( j \in E_{0} \). Being \( M \) a strictly positive large number, this is guaranteed by solving (12) and (13) for each \( k \in E \) and forcing the satisfaction of (14) at optimum:

$$ \begin{array}{*{20}l} {{\min} } \hfill & {\alpha_{kk0}^{{\min} } } \hfill & {} \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & {\sum\limits_{j \in E} {\alpha_{jk0}^{{\min} } y_{rj} } = 0,} \hfill & {r = 1, \ldots ,s} \hfill \\ {} \hfill & {\sum\limits_{j \in E} {\alpha_{jk0}^{{\min} } } = 0,} \hfill & { - M\tau_{j0} \le \alpha_{jk0}^{{\min} } \le M\tau_{j0} ,\quad j \in E} \hfill \\ \end{array} $$
(12)
$$ \begin{array}{*{20}l} {{\min} } \hfill & {\alpha_{kk0}^{{\max} } } \hfill & {} \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & {\sum\limits_{j \in E} {\alpha_{jk0}^{{\max} } y_{rj} } = 0,} \hfill & {r = 1, \ldots ,s} \hfill \\ {} \hfill & {\sum\limits_{j \in E} {\alpha_{jk0}^{{\max} } } = 0,} \hfill & { - M\tau_{j0} \le \alpha_{jk0}^{{\max} } \le M\tau_{j0} ,\quad j \in E} \hfill \\ \end{array} $$
(13)
$$ \alpha_{jk0}^{{\min} } = \alpha_{jk0}^{{\max} } = 0,\quad j,k \in E $$
(14)

Finally, since we are seeking supporting hyperplanes associated with FDEFs, we need to ensure that \( \mu_{r0}^{{}} > 0 \), for all \( r = 1, \ldots ,s \). This is obtained by checking the Pareto efficiency of the centroid of the \( s \) extreme efficient points in \( E_{0} \); that is, the following expression:

$$ \bar{y}_{r0} = \tfrac{1}{s}\sum\limits_{j \in E} {\tau_{j0} y_{rj} } ,\quad r = 1, \ldots ,s $$
(15)

To do that, we use program (16) and force the satisfaction of (17) at optimum.

$$ \begin{array}{*{20}l} {{\max} } \hfill & {\sum\limits_{r = 1}^{s} {\frac{{t_{r0} }}{{R_{r} }}} } \hfill & {} \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & {\sum\limits_{j \in E} {\lambda_{j0} y_{rj} } = \bar{y}_{r0} + t_{r0} ,} \hfill & {r = 1, \ldots ,s} \hfill \\ {} \hfill & {\sum\limits_{j \in E} {\lambda_{j0} } = 1,} \hfill & {t_{r0} ,\lambda_{j0} \ge 0,\quad r = 1, \ldots \ldots ,s,j \in E} \hfill \\ \end{array} $$
(16)

where \( R_{r} \) is the output range, \( r = 1, \ldots ,s \), which is assumed to be strictly positive.

$$ \sum\limits_{r = 1}^{s} {t_{r0} } = 0. $$
(17)

By Lemma 3.2 and Proposition 2.1, the centroid is a point non-dominated in \( P\left( 1 \right) \); consequently, \( \mu_{r0}^{{}} > 0 \), for all \( r = 1, \ldots ,s \). All this guarantees that \( \sum\nolimits_{r = 1}^{s} {\mu_{r0} y_{r} } = \psi_{0} \) is a supporting hyperplane associated with a FDEF. Combining all the previous constraints, the program to be solved would be:

$$ \begin{array}{*{20}l} {{\min} } \hfill & {\psi_{0} - \sum\limits_{r = 1}^{s} {\mu_{r0} y_{r0} } } \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & {(9) - (17)} \hfill \\ \end{array} $$
(18)

Now, we need to prove that the NCI coincides with one plus the optimal value of model (18). To do that, we first prove some lemmas, which we will use in the proof of the main result: Theorem 3.1.

Lemma 3.1

Let \( E^{\prime} \) be a set of s affinely independent extreme points. Then, the following linear system has a unique solution.

$$ \sum\limits_{r = 1}^{s} {\mu_{r} y_{rj} } - \psi = 0,\quad j \in E^{\prime},\quad \sum\limits_{r = 1}^{s} {\mu_{r} } = 1. $$
(19)

Proof

See “Appendix A.” □

Lemma 3.2

Let\( E^{\prime} \)be a set of s affinely independent extreme points on an efficient face\( F \)of\( P\left( 1 \right) \). And let\( \sum\nolimits_{r = 1}^{s} {\mu_{r}^{\prime } y_{r} } = \psi^{\prime } \)be a supporting hyperplane associated with\( F \), with\( \left\| {\mu^{\prime } } \right\|_{{\ell_{1} }} = \sum\nolimits_{r = 1}^{s} {\mu_{r}^{\prime } } = 1 \). Then,\( \mu^{\prime}_{r} > 0 \), \( r = 1, \ldots ,s \), if and only if\( \bar{Y} \), defined as\( \bar{y}_{r} = \tfrac{1}{s}\sum\nolimits_{{j \in E^{\prime } }} {y_{rj} } \), \( r = 1, \ldots ,s \), is a Pareto-efficient point in\( P\left( 1 \right) \).

Proof

See “Appendix A.” □

Now, we are ready to prove the main result of this paper: the equivalence between (8) and (18).

Theorem 3.1

One plus the optimal value of model (18) coincides with\( {\text{NCI}}_{\text{EXFA}}^{{\min} } \left( {Y_{0} } \right) \).

Proof

See “Appendix A.” □

From a computational point of view, model (18) needs to be reformulated in order to be implemented in a standard optimizer. First, the multiple lower-level decision problems in (12), (13) and (16) should be treated in a way such that model (18) would be reformulated as a one-level problem. To do that, it is enough to apply the Karush–Kuhn–Tucker (KKT) optimality conditions to the lower-level problems [50]. Additionally, constraint \( d_{j0} \tau_{j0} = 0,\,\,j \in E \), is not linear, but can be identified as a special ordered set (SOS) [51]. SOS is a way to specify that a pair of variables cannot take strictly positive values at the same time and is a technique related to using special branching strategies. Traditionally, SOS was used with discrete and integer variables, but modern optimizers, like CPLEX, also use SOS with continuous variables. In this way, we are able to rewrite model (18) as a mathematical problem, which can be implemented in standard optimizers, by transforming the bi-level program (18) in a standard one-level problem using KKT and resorting to SOS conditions.

Now, it is interesting to show the properties that the new Russell-type measure of technical efficiency satisfies, properties that will be inherited by the NCI composite indicator. Regarding this issue, [26] were the first who proposed a set of desirable properties that an ideal efficiency measure should meet. Later, [7] listed similar requirements and suggested a few others. In particular, the main properties for an output-oriented measure would be (P1) the measure should be greater or equal to one, where one represents full efficiency (Pareto–Koopmans efficiency); (P2) units invariance; (P3) translation invariance; and finally, (P4) strong monotonicity in outputs.

Theorem 3.2

Let \( Y_{0} \) be the output vector corresponding to DMU 0 . Then, the following is true for NCI:

  • [P1](i)\( {\text{NCI}}_{\text{EXFA}}^{ {\min} } \left( {Y_{0} } \right) \ge 1 \)and (ii)\( {\text{NCI}}_{\text{EXFA}}^{ {\min} } \left( {Y_{0} } \right) = 1 \)if and only if\( Y_{0} \in \partial^{s} \left( {P_{\text{EXFA}} \left( 1 \right)} \right) \).

  • [P2] (iii) \( {\text{NCI}}_{\text{EXFA}}^{ {\min} } \left( {Y_{0} } \right) \) is units invariant.

  • [P3] (iv) \( {\text{NCI}}_{\text{EXFA}}^{ {\min} } \left( {Y_{0} } \right) \) is translation invariant.

  • [P4] (v) \( {\text{NCI}}_{\text{EXFA}}^{ {\min} } \left( {Y_{0} } \right) = 1 \) is strongly monotonic in outputs.

Proof

The proof is apparent from the definition of (8) and model (18). □

Additionally, in the context of composite indicators, two supplementary properties could be of interest. The first one has been recently pointed out by [23] indicating that weights (shadow prices) associated with each sub-indicator (output) should be strictly positive for assuring that no dimension is omitted in the computation of the final composite indicator [P5]. Our approach satisfies this property as a consequence of determining FDEFs.Footnote 4 Secondly, the new measure based on the least distance provides targets that can be achieved with the minimum effort [P6]. This fact is very important in practice since demanding targets could demotivate firms to achieve the efficient status by improving its outputs.

Furthermore, as we show in “Appendix B,” the new output-oriented measure \( {\text{NCI}}_{\text{EXFA}}^{ {\min} } \left( {Y_{0} } \right) \) is the only measure in the context of BoD models, among the most famous ones in the DEA literature that satisfies properties [P1]–[P6].

Finally, to get the optimal targets for each DMU0, we have to solve the linear system of Eqs. (20), where through the first equation we consider all the feasible projection points, which are on the optimal hyperplane determined by (20), and, through the second equation, we are sure that \( Y_{0}^{*} = Y_{0} + s_{0}^{*} \) is the point where \( {\text{NCI}}_{\text{EXFA}}^{ {\min} } \left( {Y_{0} } \right) \) is achieved:Footnote 5

$$ \begin{array}{*{20}l} {\sum\limits_{r = 1}^{s} {\mu_{r0}^{*} \left( {y_{r0} + s_{r0}^{{}} } \right)} + \psi_{0}^{*} = 0,} \hfill & {1 + \frac{1}{s}\sum\limits_{r = 1}^{s} {\frac{{s_{r0}^{{}} }}{{y_{r}^{U} - y_{r0} }} = {\text{NCI}}_{\text{EXFA}}^{ {\min} } \left( {Y_{0} } \right),} } \hfill \\ {s_{r0}^{{}} \ge 0,} \hfill & {r = 1, \ldots ,s} \hfill \\ \end{array} $$
(20)

4 Empirical Illustration

The new composite indicator is applied to the dataset on CSR activities of the main European companies in the food, beverage and tobacco manufacturing industry for the year 2017, obtained from Sustainalytics. The Sustainalytics dataset has largely been used in previous research; see, for example, [55,56,57]. The data in this dataset are supplied in the form of scores on over 70 indicators, and the raw scores range from 0 to 100, where 0 is the lowest rating. The dataset also provides information on the weighted scores, which reflect the importance of an indicator in a specific industry. In this study, we apply weighted scores of five indicators, which are classified by Sustainalytics as being specifically dedicated to the industry under study: fair trade products (output 1), water management programs (output 2), sustainable agriculture programs (output 3), organic products (output 4) and genetically modified organism (GMO) policy (output 5) (their values range from 0 to 4.14).Footnote 6 The dataset comprises both owner firms and their subsidiaries, and to avoid repetition of scores, we restrict our analysis to owner firms. The resulting sample comprises 58 firms. With this sample size, we satisfy all popular rules of thumb regarding the number of DMUs versus the number of inputs and outputs in DEA; see, for example, [59,60,61,62,63]. We also pass the diagnostics of [64], which make it possible to determine whether dimensionality reduction is necessary.Footnote 7 Overall, with this sample, we do not pretend to estimate the theoretical population frontier, for which statistical arguments would need to be applied; instead, we analyze the specific set of companies with the purpose of benchmarking. The sample contains the largest food, beverage and tobacco manufacturing firms by market capitalization in Europe, such as Danone, Nestlé and the Coca-Cola Company, which account for approximately 30% of the total revenues of this sector in Europe.

Table 1 summarizes the results for 25 selected DMUs: the value of our new composite indicator, and the DMUs actual values of CSR outputs together with the corresponding targets (in parentheses).Footnote 8 The results for all DMUs can be obtained upon request. The analysis reveals seven DMUs (DMUs 23, 25, 38, 41, 44, 55 and 56) as Pareto-efficient and forming part of FDEFs. The remaining fifty-one DMUs are inefficient, that is, they could increase CSR outputs in order to reach an “ideal point” in terms of CSR activities, defined by the maximum values of each CSR output. For example, the target values indicate that DMU5 should exclusively increase its output 2 (that is, investment in water management programs) by 8.9%. Meanwhile, DMU6 should increase its output 4 (organic products) by 22.3% and output 5 (GMO policy) by 8.9%. The “ideal point” might not be feasible and not observed within the current values of DMUs outputs (as we work with FDEFs, which are not always observable), but still it is something that firms in the sample should seek to reach. For example, the value of the new indicator equal to 1.15 for DMU1 could be given the following interpretation: DMU1 could, on average, increase all of its CSR outputs by approximately 15% of the amount left to reach this “ideal point.” Examining the results of the table in depth, we now draw our attention to DMUs 19, 27 and 28, which are the most inefficient in terms of producing CSR activities (composite indicators equal to 1.25) and at the same time the targets obtained suggest making greater improvements in outputs. In particular, for these DMUs, the values of targets suggest that firms should increase their score in output 2 (that is, CSR projects related with water management programs) from 0 to 4.45. This is really a considerable improvement, which might not be possible for a DMU to reach in the short run and hence, gradual improvement in the target might be made year by year [66]. In fact, CSR in general is an asset, that is long-term, and firms, that have longer investor horizons in CSR, create larger shareholder values [67]. In particular, CSR activity of water management programs has a long-term character, which deals with water scarcity, water quality and climate change [68].

Table 1 PLA-based composite indicators for 25 selected DMUs, targets are reported in parentheses

Furthermore, we wish to make some points regarding the ‘empirical’ importance of the properties that new model satisfies, compared to the models highlighted in “Appendix B.” In particular, beginning with the property of Pareto efficiency, we ran a traditional radial BoD model [11,12,13,14], adding small exemplary values to all output values in order to be able to undertake the analysis with the radial model with 0 values for some outputs; that is, two different constants k = 1 and k = 4 seeking robustness in our analysis. We found, for k = 1, that 34% of the DMUs (35 units) are not Pareto-efficient and, for k = 4, we observed that this is the case for 50% of the DMUs (29 units). In particular, for k = 1, DMU2 is radially inefficient (score = 1.39). Its corresponding output projection is (1.39, 2.64, 1.39, 1.39, 1.39), which is dominated in the sense of Pareto by a convex combination of DMU44 (0.995) and DMU56 (0.05). Therefore, in practice, if we use the radial model (equivalently the DDF BoD model with \( g = Y_{0} \)), then the approach will provide targets that could be improved. This result is in contrast to the new model, for which all DMUs satisfy the property of Pareto-efficiency. Regarding units invariance, there is nothing to say, since all the measures satisfy this property. Furthermore, we analyzed the property of translation invariance; that is, invariance of efficiency scores with respect to adding a strictly positive constant to any considered input and/or output variable. To show its importance, we ran an output-oriented SBM model, which coincides with Russell model, with k = 1 and k = 4 added to output variables. Otherwise, these measures could not be correctly determined since our database contains zeros (the same happens with respect to the radial and DDF BoD models). None of these measures satisfy translation invariance. Let us show an example of the implications of this drawback in practice. For DMU2, for example, the score for the SBM indicator changed from 2.37, for k = 1, to 1.34, for k = 4, something that would not occur with the new model. This finding has important practical implications, because, when we use the model that is not translation-invariant, it is not clear which values of efficiency should be reported, since the selection of constant k is completely arbitrary, and the resulting score can be very different depending on the selected constant. Moreover, the importance of the property of strong monotonicity can be assessed by looking at DMUs 1 and 23, in which DMU23 dominates DMU1, as all its outputs are larger or equal to outputs of DMU1. Therefore, the performance of DMU23 is clearly better than that of DMU1, regardless of which measure is selected. However, for these DMUs, the radial BoD model (and also DDF BoD model with \( g = Y_{0} \)) with, for example, k = 1 added presents scores equal to 1 for both DMUs, which can be understood as an inconsistency with respect to the observed output values; consequently, strong monotonicity is not satisfied. For the new model, DMU’s 23 score was 1 and for DMU1 it was 1.15; hence strong monotonicity is obeyed. Therefore, new model gives consistent results, which is important in practice, since otherwise the practitioner would see that DMU23 is clearly better than DMU1. However, the measure did not tell the same story. Regarding the importance of the property of the strict positivity of the weights (shadow prices), using the radial BoD model (and also DDF BoD model) with k = 1 added, all but one of the DMUs had at least one weight equal to 0, while, for k = 4, this was the case for all DMUs except four. This is in contrast to the new model, in which all weights are strictly positive, which ensures that no dimension (information) is omitted in the computation of the composite indicator. This demonstrates the advantage of our model with regard to weights compared to the traditional radial and DDF BoD models in practice: Certain traditional measures can neglect some information (output dimensions), when the composite indicator is built. Finally, regarding the property of least distance, the BoD model was run based on the well-known range-adjusted measure (RAM) [25]; for example, for DMU1, we obtained the targets such that for output 4 were considerably larger than for the new model (1.79 compared to 0) and for output 5 were negligibly smaller (0.84 compared to 0.93), without changes in the other dimensions. Therefore, the new model provided smaller targets overall than those based on the RAM, which signifies smaller improvements for DMUs to become efficient. This fact is important in practice since demanding targets could demotivate firms to achieve the efficient status by improving its outputs in the short run. “Appendix C” summarizes the results of radial BoD model for k = 1 added to outputs equal to zero, for the same 25 selected DMUs. Due to space limitations, we do not report the detailed results for all other models, which prove the empirical importance of the properties that our new model satisfies. These results can be obtained from the authors upon request.

5 Conclusions

In this paper, for the first time, a composite indicator based on the PLA in DEA has been introduced to the literature that, additionally, satisfies a set of interesting properties. Indeed, it satisfies more properties than previous DEA approaches. This development is important for several reasons. First, the traditional DEA models yield targets that are related in general to the furthest projection to the assessed DMU onto the efficient frontier, making this benchmarking evaluation not credible. In contrast, finding the closest targets allows determining directions of improvement associated with less effort as opposed to the traditional approach. Second, our approach satisfies several interesting properties from an economic and mathematical point of view. Our indicator is units invariant, translation invariant, strong monotonic and, additionally, takes into account all sources of inefficiency, since it evaluates Pareto–Koopmans efficiency instead of the Debreu–Farrell notion of technical efficiency. Third, our methodology guarantees that the weights used for aggregating all sub-indicators are always strictly positive, something that contrasts with traditional approaches based upon radial models.

In order to ensure the satisfaction of all the properties, we resorted to a solution, which is based upon extending the FDEFs of the original DEA polyhedral technology, and introduced, for calculating the composite indicator, a new bi-level linear program, that avoids determining explicitly all the FDEFs and does its job in only one step. This feature contrasts to previous existing methods, which needed a multistage procedure.

To show the workability of the developed model and the empirical implications of the theoretical properties, the new model was applied using a recent dataset consisting of CSR activities for the main food, beverage and tobacco manufacturing firms in Europe. The results of this study are important for efficiency researchers and practitioners, as the new model with important properties is put forward. However, it could also be of interest to CSR analysts on how firms could improve CSR performance with minimal effort. Future research efforts could be dedicated toward the analysis of computational complexity of the developed estimator, given it is a NP-hard problem.