1 Introduction

Introduced by Pietra (1915) as one of the first inequality measures, the Pietra index has more than a century of history, albeit that a quite similar measure was proposed a few years before (see Bresciani Turroni 1907; Hasan and Malik 2019). In the literature, the Pietra index has been “rediscovered” many times with different names: it coincides with the index proposed by Ricci (1916), with the Hoover index (Hoover 1936), and with the Schutz coefficient (Schutz 1951), and some papers refer to it as the Robin Hood index (see for example Koolman and van Doorslaer 2004; Wilkinson and Symon 2000; Kennedy et al. 1996).

Historically, the popularity of this index decreased as the Pigou–Dalton transfer principle became popular. It is well known that the Pietra index satisfies only the weak version of that transfer principle, given that it is not sensible to transfers between units with values on the same side of the mean (see for example Castagnoli and Muliere 1990; Frosini 2012).

Nevertheless, the Pietra index continues to be used in many different fields, as an indicator of heterogeneity, given that it can be regarded as a measure of the distance between the situation at stake and the egalitarian one, where all the units have the same amount. The Pietra index appears in many papers in the fields of public health (Wilkinson and Symon 2000; Theodorakis et al. 2006; Mobaraki et al. 2013; Koolman and van Doorslaer 2004; Zafari and Ekin 2019; Mantzavinis et al. 2002; De Maio 2007; Johnston and Wilkinson 2001) and medicine (see among others Gravelle and Sutton 1998; Kennedy et al. 1996; Beck et al. 2013). It has been also used in some sociological analyses (Shi et al. 2003; Kennedy et al. 1998; Rogerson and Plane 2013; Shumway and Otterstrom 2001; Ray and Singer 1973; Alker and Russett 1964; Khanal 2011). Finally and unsurprisingly, the Pietra index has been considered in many economic studies (see among others Hasan and Malik 2019; Khosravi Tanak et al. 2015; Moothathua 1989; Davydov and Zitikis 2005; Sarabia and Jorda 2014; Koolman and van Doorslaer 2004; Sarabia 2008; Eliazar and Sokolov 2010; Hustopecky and Vlachy 1978; Habib 2012; Huang and Leung 2009; Frosini 2012).

Another reason for the broad notoriety and longevity of the Pietra index is likely its very intuitive interpretation: it represents the portion of the total amount to be redistributed from the owners with more than the mean to the others to obtain the egalitarian situation. Moreover, the Pietra index has been shown to have an immediate and fundamental interpretation within renewal processes and continuous-time random walks, infinite-server queueing and shot-noise processes, and even in the field of financial derivatives (Eliazar and Sokolov 2010). In the more general context of infinite populations, dealing with random variables, the Pietra index can be used as heterogeneity index in more cases than can the relative standard deviation (RSD), since it can be calculated also whether the variance is not finite. In the literature, many aspects related to inequality (and heterogeneity) analysis have been investigated, mainly because of the important applicative implications. Among the others, decompositions play an important role. The two more general kinds of decomposition are by subpopulations (or by subgroups) and by sources (or by factors). The first one is performed when the population is divided into, say k, exhaustive and disjoint subpopulations. The question to be answered is how the subpopulations contribute to the value of the index, with the ideal answer being an evaluation of the contribution of each single subpopulation. Unfortunately, many decomposition procedures in the literature do not achieve this goal, given that they stop at a “higher” level of detail: usually, inspired by the well-known classical variance decomposition, they identify a within component (related to the inequality into the subpopulations) and a between component (depending on the inequality across the subpopulations). Moreover, many of them also require a third residual part (sometimes called the Transvariation component) that rescales the sum into the interval [0, 1], for example as in Dagum (1997).

The second type of decomposition arises when the investigated variable is the sum of other (say c) variables, called sources. In this framework, the research question is how to assess the contribution of each source. For two recent decompositions of the Pietra index and further references on the topic, see Habib (2012) and Frosini (2012).

Regarding the procedures for decomposing inequality indexes, a number of previous studies followed the approach proposed by Shorrocks (1982, 1984). In those two papers, some constricting hypotheses about the decomposition procedure are assumed, by forcing an analogy with the variance decomposition. The result is a very restricted class of decomposable inequality measures that does not contain many widely used ones such as the Gini coefficient and the Bonferroni index, to name but two. To overcome this issue, the methods proposed herein follow a different approach. The aim of this paper is to introduce two innovative procedures for decomposing the Pietra index, starting directly from its definition. The first is by sources, and the second is by subpopulations. As mentioned above, the relevant advantages of these two decompositions are that the first allows to assess the contribution of each source, while the second allows to evaluate the contribution of each subpopulation. For these reasons, these two procedures are very innovative and far more informative than many others proposed previously for the Pietra index. By using a different aggregation, the decomposition by subpopulations leads easily to the classical one into within and between components. The proposed methods are completed with two applications related to Italian professional football (soccer) teams. The decomposition by sources is applied to a dataset from the balance sheets of the teams in the top Italian league (Serie A), while the decomposition by subpopulations is illustrated by analyzing the values of all the players of the teams, grouped in all three professional leagues (Serie A, B, and C). The purpose of these applications is to visualize (albeit nonexaustively) and interpret the most popular sport in Italy.

This paper is organized as follows. In Sect. 2, the Pietra index is defined and some of its features are presented. In Sect. 3, a new equivalent formula for the Pietra index is given, one that is useful for the remainder of the paper. In Sects. 4 and 5, the two proposed decomposition procedures are detailed. Section 6 is devoted to the two applications to Italian football teams with real datasets, and the paper concludes in Sect. 7 with some final remarks. Appendix A provides an example in which the two proposed decompositions for the Pietra index are computed. For the datasets used in the applications and the R code to replicate the analyses, see the provided supplementary material.

2 The Pietra index

Let Y be a non-negative statistical variable on a population of size N. Let \(y_{1}<y_2<\cdots <y_r\) denote the distinct values assumed by Y with frequencies \(n_{1\cdot },n_{2\cdot },\ldots ,n_{r\cdot }\). Obviously, it holds that \(\sum _{h=1}^{r}n_{h\cdot}=N.\) Let \(M(Y)=\frac{T(Y)}{N}=\sum _{h=1}^{r}y_{h}\cdot \frac{n_{h.}}{N}\) be the arithmetic mean of Y in the whole population, and let T(Y) denote the sum of the values of Y. Let

$$\begin{aligned} S_{M(Y)}=\sum _{h=1}^{r}|y_{h}-M(Y)|\cdot \frac{n_{h\cdot }}{N} \end{aligned}$$

be the mean absolute deviation (MAD) of Y from M(Y). The index proposed by Pietra (1915) for the variable Y is given by

$$\begin{aligned} {\mathcal {P}}_Y =\frac{S_{M(Y)}}{2M(Y)} =\frac{\sum _{h=1}^{r}|y_{h}-M(Y)|n_{h\cdot }}{2\sum _{h=1}^{r}y_{h}n_{h\cdot }}. \end{aligned}$$

It is worth to remark that in that paper:

  1. (a)

    the Lorenz curve L(p), proposed by Lorenz (1905) and defined as the piecewise linear curve, starting from the origin and interpolating the r points \(L_h\)

    $$\begin{aligned} L_h=\left( \frac{n_{h\cdot }}{N};\sum _{i=1}^{h}\frac{y_in_{i\cdot }}{T(Y)}\right) , \qquad h=1,\ldots r \end{aligned}$$

    is formalized for the continuous case;

  2. (b)

    it is shown that

    $$\begin{aligned} {\mathcal {P}}= & {} \max _{p\in [0,1]} \left[ p-L(p)\right] \\= & {} \tilde{p}-L(\tilde{p}),\nonumber \end{aligned}$$
    (1)

    where \(\tilde{p}\) is the quantile corresponding to the mean of Y, meaning that \(y_{(\tilde{p})}=M(Y)\). Expression (1) can be seen as representing the maximum distance between the Lorenz curve of the variable at stake and the Lorenz curve of the egalitarian situation, therefore the Pietra index can be considered as “the Lorenzian counterpart of the Kolmogorov–Smirnov statistic, which quantifies the distance between two probability laws as the \(L_{\infty }\) distance between their cumulative distribution functions” (Eliazar and Sokolov 2010).

The interested reader may care to know that De Capitani (2013a, 2013b) provided an English translation of the paper by Pietra (1915).

The Pietra index \({\mathcal {P}}\) has the following properties.

  1. 1.

    The minimum of \({\mathcal {P}}\) occurs in the case of perfect equality, namely \(r=1, y_1=M(Y)\), and \(n_{1\cdot }=N\). In a such case:

    $$\begin{aligned} {\mathcal {P}}=0. \end{aligned}$$
  2. 2.

    The maximum of \({\mathcal {P}}\) occurs if \(r=2\), \(y_1=0\), \(y_2=T(Y)\), \(n_{1\cdot }=N-1\), and \(n_{2\cdot }=1\), resulting in

    $$\begin{aligned} {\mathcal {P}}=\frac{N-1}{N}. \end{aligned}$$
  3. 3.

    \({\mathcal {P}}\) decreases in case of positive translations: if \(c>0\) and \(W=Y+c\), then

    $$\begin{aligned} {\mathcal {P}}_W<{\mathcal {P}}_Y. \end{aligned}$$
  4. 4.

    \({\mathcal {P}}\) is invariant to positive scale transformations: if \(c>0\) and \(W=c\cdot Y\), then

    $$\begin{aligned} {\mathcal {P}}_W={\mathcal {P}}_Y. \end{aligned}$$
  5. 5.

    As mentioned in Sect. 1, \({\mathcal {P}}\) satisfies the weak principle of transfers but not the strong principle of transfers. This means that it is sensible to transfer between two units only if the corresponding values are one lower and the other higher than the mean. If the two values are both lower (or higher) than the mean, the Pietra index does not change. For more details on this point, see Castagnoli and Muliere (1990), and Frosini (2012).

  6. 6.

    \({\mathcal {P}}\) satisfies the population replication principle, since both \(S_{M(Y)}\) and M(Y) are invariant to population replication.

The interpretation of the Pietra index is very interesting and immediate: it is the share of the total amount T(Y) that should be properly redistributed from the units possessing more than the mean M(Y) toward the units possessing less than or equal to M(Y), in order to achieve the situation of the perfect equality, where all the units have the same amount (absence of inequality). In fact, it holds that:

$$\begin{aligned} {\mathcal {P}} \cdot T(Y) = \sum _{\{y_h \le M(Y)\}}[M(Y)-y_h]n_{h\cdot }= \sum _{\{y_h > M(Y)\}}[y_h-M(Y)]n_{h\cdot } \end{aligned}$$

3 An alternative useful expression for the Pietra index

At each \(y_{h}, \; h \in \{1,2,\ldots , r\}\) the whole population can split into two non-overlapping groups:

  • a lower group corresponding to \(\left\{ Y\le y_{h}\right\}\), including the first \(P_{h\cdot }=\sum _{t=1}^{h}n_{t\cdot }\) units with total amount \(Q_{h\cdot }\left( Y\right) =\sum _{t=1}^{h}y_{t} n_{t\cdot }\);

  • an upper group corresponding to \(\left\{ Y>y_{h}\right\}\) that contains the remaining \(N-P_{h\cdot }\) units, with amount \(T\left( Y\right) -Q_{h\cdot }\left( Y\right) .\)

Let

$$\begin{aligned} \bar{h}=\max \{h: y_h\le M(Y)\}, \end{aligned}$$

and let

$$\begin{aligned} \bar{M}_{\bar{h}\cdot }\left( Y\right) =\frac{Q_{\bar{h}\cdot } \left( Y\right) }{P_{\bar{h}\cdot }}, \end{aligned}$$

be the arithmetic mean of the lower group corresponding to \(\left\{ Y\le y_{\bar{h}}\right\}\). Now, it follows that

$$\begin{aligned} {\mathcal {P}}= & {} \frac{1}{T(Y)}\cdot \sum _{h=1}^{\bar{h}} [M(Y)-y_h]\cdot n_{h\cdot }\nonumber \\= & {} \frac{M(Y)P_{\bar{h} \cdot }-Q_{\bar{h}\cdot }(Y)}{NM(Y)}\nonumber \\= & {} \frac{M(Y)P_{\bar{h}\cdot }-{\bar{M}}_{\bar{h}\cdot }(Y)P_{\bar{h}\cdot }}{NM(Y)}\nonumber \\= & {} \frac{M(Y)-{\bar{M}}_{\bar{h}\cdot }(Y)}{NM(Y)}\cdot P_{\bar{h} \cdot }\nonumber \\= & {} V_{\bar{h}}(Y)p_{\bar{h}\cdot }, \end{aligned}$$
(2)

where

$$\begin{aligned} V_{\bar{h}}(Y)=\frac{M(Y)-{\bar{M}}_{\bar{h}\cdot }(Y)}{M(Y)} \end{aligned}$$

is the relative variation of the lower mean \({\bar{M}}_{\bar{h}\cdot }(Y)\) with respect to the mean M(Y), and \(p_{\bar{h}\cdot }=\frac{P_{\bar{h}\cdot }}{N}\) is the cumulative relative frequency of the lower group with \(\left\{ Y\le y_{\bar{h}}\right\}\). It is worth remarking that the quantity \(V_{\bar{h}}(Y)\) is the Bonferroni pointwise measure of inequality at \(\bar{h}\): for details see Zenga (2013), and Zenga and Valli (2016). It is also important to note that the formula (2) shows the Pietra index as the product of two factors: the first one, namely \(V_{\bar{h}}(Y)\), is the economic distance between the lower mean \({\bar{M}}_{\bar{h}\cdot }(Y)\) and the total mean M(Y); the second one, namely \(p_{\bar{h}\cdot }\), is the relative weight associated to the units with amount less than or equal to the total mean M(Y).

The Sect. 4 will show why this expression for the Pietra index is very suitable for the proposed decompositions by sources and by subpopulations.

4 Decomposition by sources

Let the variable Y be the sum of c variables \(X_1,X_2, \dots ,X_c\) that represent the sources. Using the same notation as that in the previous sections, let

$$\begin{aligned} Q_{\bar{h}\cdot }(X_j),\quad j=1,2,\ldots ,c \end{aligned}$$

be the sum of the values assumed by the source \(X_j\) on the \(P_{\bar{h}\cdot }\) units belonging to the lower group \(\{Y \le y_{\bar{h}}\}\), let

$$\begin{aligned} {\bar{M}}_{\bar{h} \cdot }(X_j) =\frac{Q_{\bar{h}\cdot }(X_j)}{P_{\bar{h}\cdot }}\quad j=1,2,\ldots ,c \end{aligned}$$

be the arithmetic mean of \(X_j\) in the lower group, and let \(M(X_j)=\frac{T(X_j)}{N}\) be the mean of \(X_j\) in the whole population. As \(Y=\sum _{j=1}^c X_j\), it follows that

$$\begin{aligned} M(Y)=\sum _{j=1}^c M(X_j)\; \text{ and } \;{\bar{M}}_{\bar{h} \cdot }(Y)=\sum _{j=1}^c {\bar{M}}_{\bar{h} \cdot }(X_j), \end{aligned}$$
(3)

therefore the Pietra index is given by

$$\begin{aligned} {\mathcal {P}}= & {} \frac{M(Y)-{\bar{M}}_{\bar{h} \cdot }(Y)}{M(Y)} \cdot \frac{P_{\bar{h}\cdot }}{N}\\= & {} \sum _{j=1}^c \frac{M(X_j)-{\bar{M}}_{\bar{h} \cdot }(X_j)}{M(Y)} \cdot p_{\bar{h}\cdot }\\= & {} \sum _{j=1}^c W_{\bar{h} \cdot }(X_j)p_{\bar{h}\cdot }, \end{aligned}$$

where \(W_{\bar{h} \cdot }(X_j)p_{\bar{h}\cdot }\) is the contribution of the source \(X_j\) to \({\mathcal {P}}\), and indeed it is the relative difference \(\frac{M(X_j)-{\bar{M}}_{\bar{h} \cdot }(X_j)}{M(Y)}\) times the relative frequency \(\frac{P_{\bar{h}\cdot }}{N}\) of the lower group \(\{Y \le y_{\bar{h}}\}\). The relative contribution of the source \(X_j\) to the value of the Pietra index is given by the ratio

$$\begin{aligned} \omega _{\bar{h}\cdot }(X_j)=\frac{W_{\bar{h} \cdot }(X_j)p_{\bar{h}\cdot }}{{\mathcal {P}}}=\frac{M(X_j)-{\bar{M}}_{\bar{h} \cdot }(X_j)}{M(Y)-{\bar{M}}_{\bar{h} \cdot }(Y)} \end{aligned}$$
(4)

with obviously \(\sum _{j=1}^c\omega _{\bar{h}\cdot }(X_j)=1\). By comparing these contributions and the shares

$$\begin{aligned} \gamma (X_j)=\frac{M(X_j)}{M(Y)}=\frac{T(X_j)}{T(Y)}\quad j=1,2,\ldots ,c \end{aligned}$$
(5)

it is possible to understand whether a given source \(X_j\) has an exacerbating or a mitigating impact on inequality (or heterogeneity) in the distribution of Y. In more detail, the quantity \(\omega _{\bar{h}\cdot }(X_j)-\gamma (X_j)\) being positive means that the source \(X_j\) plays an “increasing” role in terms of inequality (or heterogeneity), whereas it being negative means that the source \(X_j\) “decreases” the inequality (or heterogeneity).

In real situations, the sources can also have negative values. When a variable assumes negative values, the use of any inequality index requires great attention; for example, see De Battisti et al. (2019) and Manero (2017) for further details about the Gini index. However, in the proposed decomposition of the Pietra index, the sources can assume also negative values, and in such a case attention is required only for the interpretation of the quantities \(\omega _{\bar{h}\cdot }(X_j)\) and \(\gamma (X_j)\) defined in (4) and (5), respectively. Finally, it is also worth remarking that the relative contribution of \(X_j\) to the Pietra index is equal to those of the Gini, Bonferroni, and Zenga-2007 pointwise inequality measures at the cumulative relative frequency \(p=p_{\bar{h}\cdot }\). For more details on this point, see Zenga (2013), Zenga and Valli (2017), and Pasquazzi and Zenga (2018).

5 Decomposition by subpopulations

The procedure for decomposing the Pietra index by subpopulations is based on the bivariate distribution of the N units split into k disjoint subpopulations (with \(k\ge 2\)). Such a distribution is reported in Table 1, where \(n_{hl}\) denotes the frequency of the value \(y_h\) (with \(h=1,2,\ldots ,r\)) in subpopulation l (with \(l=1,2,\ldots ,k\)), and

$$\begin{aligned} n_{\cdot l}=\sum _{h=1}^r n_{hl} \end{aligned}$$

is the size of the subpopulation l. Obviously:

$$\begin{aligned} \sum _{h=1}^r \sum _{l=1}^k n_{hl}=\sum _{l=1}^k \sum _{h=1}^r n_{hl}=\sum _{h=1}^r n_{h\cdot }=\sum _{l=1}^k n_{\cdot l}=N. \end{aligned}$$
Table 1 Bivariate distribution of the variable Y, according to the k subpopulations

For the distribution \(\{(y_h,n_{hl}),\;h=1,2,\cdots ,r\}\) of subpopulation l, the analogs of the quantities \(P_{h\cdot }\) and \(Q_{h\cdot }\) are

$$\begin{aligned} P_{hl}=\sum _{t=1}^h n_{tl} \end{aligned}$$

which is the cumulative frequency of \(y_h\) in subpopulation l, and

$$\begin{aligned} Q_{hl}(Y)= \sum _{t=1}^h y_t n_{tl} \end{aligned}$$

which is the sum of the values of the lower group \(\{Y\le y_h\}\) in subpopulation l. Moreover,

$$\begin{aligned} M_l(Y)=\frac{Q_{rl}(Y)}{n_{\cdot l}}=\frac{T_l(Y)}{n_{\cdot l}}, \end{aligned}$$

where \(T_l(Y)\) is the sum of the \(n_{\cdot l}\) values of Y in subpopulation l and \(M_l(Y)\) is the corresponding arithmetic mean. The lower mean \({\bar{M}}_{hl}(Y)\) of the variable Y evaluated at h in subpopulation l can be defined as

$$\begin{aligned} {\bar{M}}_{hl}(Y)= \left\{ \begin{array}{ll} \min \{y_h:\;n_{hl}>0\} &{} \text{ if } P_{hl}=0,\\ \frac{Q_{ h l}(Y)}{P_{h l}} &{} \text{ if } P_{hl}>0.\\ \end{array} \right. \end{aligned}$$

In this definition of \({\bar{M}}_{hl}(Y)\) the lower mean of \(P_{hl}=0\) is prolonged by continuity, in analogy to the continuous case. Finally, the two ratios

$$\begin{aligned} \frac{n_{\cdot l}}{N}=\frac{\sum _{h=1}^r n_{hl}}{N} \qquad \text{ and } \qquad p(l|h)=\frac{P_{h l}}{P_{h \cdot }} \end{aligned}$$

can be defined: they are the relative frequencies of subpopulation l in the whole population and in the lower group \(\{Y \le y_{h}\}\), respectively.

Among the mean M(Y) and the means of the k subpopulations \(M_g(Y)\), it holds that

$$\begin{aligned} M(Y)=\sum _{g=1}^k M_g(Y)\cdot \frac{n_{\cdot g}}{N}, \end{aligned}$$
(6)

and similarly, among the lower mean \({\bar{M}}_{h \cdot }(Y)\) and the k lower means \({\bar{M}}_{hl \cdot }(Y)\), it holds that

$$\begin{aligned} {\bar{M}}_{ h \cdot }(Y)=\sum _{l=1}^k {\bar{M}}_{ hl}(Y)\cdot p(l| h). \end{aligned}$$
(7)

By using (7) in expression (2) of the Pietra index, and by recalling that for any \(h \in \{1,2,\dots r\}\) it holds that \(\sum _{l=1}^k p(l| h)=1\), it follows that

$$\begin{aligned} {\mathcal {P}}= & {} \frac{M(Y)-{\bar{M}}_{{\bar{h}} \cdot }(Y)}{M(Y)}\cdot p_{\bar{h}\cdot }\\= & {} \frac{\displaystyle \sum _{l=1}^k\left[ M(Y) p(l|\bar{h})-{\bar{M}}_{\bar{h} l}(Y)p(l|\bar{h})\right] }{M(Y)}\cdot p_{\bar{h}\cdot }\\= & {} \sum _{l=1}^k \frac{M(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)} \cdot p(l|\bar{h}) \cdot p_{\bar{h}\cdot }\\= & {} \sum _{l=1}^k V_{\bar{h} l \cdot } (Y)\cdot p_{\bar{h}\cdot }, \end{aligned}$$

where

$$\begin{aligned} V_{\bar{h} l \cdot } (Y)\cdot p_{\bar{h}\cdot }= \frac{M(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)} \cdot p(l|\bar{h})\cdot p_{\bar{h}\cdot } \end{aligned}$$
(8)

can be interpreted as the contribution of subpopulation l to the Pietra index, for \(l \in \{1,2,\ldots k\}\). Using this decomposition procedure, it is thus possible to assess the contribution to the total value of \(\mathcal P\) related to each single subpopulation, provided by (8). This result is important because, as already noted, other decomposition methods proposed in the literature cannot reach this goal.

For comparison, it is interesting to evaluate the relative contribution of subpopulation l to the Pietra index, given by

$$\begin{aligned} \frac{V_{\bar{h} l \cdot } (Y)\cdot p_{\bar{h}\cdot }}{\mathcal P}=\frac{M(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)-{\bar{M}}_{\bar{h} \cdot }(Y)}\cdot p(l|\bar{h}). \end{aligned}$$

As seen, this contribution depends on the ratio of two economic distances between the total mean of Y and a lower mean (in the numerator related to subpopulation l, in the denominator related to the whole population), and on the relative frequency \(p(l|\bar{h})\) of subpopulation l in the lower group \(\{Y\le y_{\bar{h}}\}\).

Now, from (6) and the fact that

$$\begin{aligned} \sum _{g=1}^k \frac{n_{\cdot g}}{N}=1, \end{aligned}$$

it follows that

$$\begin{aligned} {\mathcal {P}}= & {} \sum _{l=1}^k V_{\bar{h} l \cdot } (Y)\cdot p_{\bar{h}\cdot },\nonumber \\= & {} \sum _{l=1}^k \frac{M(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)} \cdot p(l|\bar{h}) \cdot p_{\bar{h}\cdot }\nonumber \\= & {} \sum _{l=1}^k \sum _{g=1}^k \left[ \frac{M_g(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)}\right] \cdot \frac{n_{\cdot g}}{N}\cdot p(l|\bar{h}) \cdot p_{\bar{h}\cdot }\nonumber \\= & {} \sum _{l=1}^k \sum _{g=1}^k V_{\bar{h} l g} (Y)\cdot p_{\bar{h}\cdot }, \end{aligned}$$
(9)

where

$$\begin{aligned} V_{\bar{h} l g} (Y)\cdot p_{\bar{h}\cdot }= \frac{M_g(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)} \cdot \frac{n_{\cdot g}}{N}\cdot p(l|\bar{h}) \cdot p_{\bar{h}\cdot } \end{aligned}$$
(10)

is the contribution to \(V_{\bar{h} l \cdot }\) related to the comparison between the lower mean \({\bar{M}}_{\bar{h} l}(Y)\) of subpopulation l and the mean \(M_g(Y)\) of subpopulation g.

In other words, (9) shows how the Pietra index can be split into a \(k\times k\) matrix, according to the partition induced by the k subpopulations.

The decomposition in (9) allows to further decompose the contribution of each subpopulation to the Pietra index into two quantities: the first one based on the comparison of means in the same subpopulation (which can be considered as within part), the second ones based on the comparison of means related to different subpopulations (which can be considered as between part). In effect

$$\begin{aligned} V_{\bar{h} l \cdot } (Y)\cdot p_{\bar{h}\cdot }= V_{\bar{h} l l} (Y)\cdot p_{\bar{h}\cdot }+\sum _{\{g: g\ne l\}}^k V_{\bar{h} l g} (Y)\cdot p_{\bar{h}\cdot }, \end{aligned}$$

with the within part of the contribution (due to subpopulation l) to the Pietra index given by

$$\begin{aligned} V_{\bar{h} l l} (Y)\cdot p_{\bar{h}\cdot }= \left[ \frac{M_l(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)}\right] \cdot \frac{n_{\cdot l}}{N}\cdot p(l|\bar{h}) \cdot p_{\bar{h}\cdot } \end{aligned}$$

and the between part of the contribution (due to subpopulation l) to the Pietra index equal to

$$\begin{aligned} \sum _{\{g: g\ne l\}}^k V_{\bar{h} l g} (Y)\cdot p_{\bar{h}\cdot }=\sum _{\{g: g\ne l\}}^k \left[ \frac{M_g(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)}\right] \cdot \frac{n_{\cdot g}}{N}\cdot p(l|\bar{h}) \cdot p_{\bar{h}\cdot }. \end{aligned}$$

At this point, it is clear the meaning of the two ratios

$$\begin{aligned} \frac{V_{\bar{h} l l} (Y)\cdot p_{\bar{h}\cdot }}{V_{\bar{h} l \cdot } (Y)\cdot p_{\bar{h}\cdot }}\qquad \text{ and } \qquad \frac{\sum _{\{g: g\ne l\}}^k V_{\bar{h} l g} (Y)\cdot p_{\bar{h}\cdot }}{V_{\bar{h} l \cdot } (Y)\cdot p_{\bar{h}\cdot }}, \end{aligned}$$

which describe the weights of the within and between parts in the contribution of subpopulation l to the Pietra index.

5.1 “Classical” decomposition into within and between components

From the decomposition represented in (9) it is also possible to reach the well-known decomposition into the within and between components. This is obtained by splitting the value of the Pietra index into two parts: the former (\({\mathcal {P}}_W\)) based on mean comparisons of the same subpopulation, and the latter (\({\mathcal {P}}_B\)) depending on comparison among means of different subpopulations:

$$\begin{aligned} {\mathcal {P}}= & {} \sum _{l=1}^k \sum _{g=1}^k \left[ \frac{M_g(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)}\right] \cdot \frac{n_{\cdot g}}{N}\cdot p(l|\bar{h}) \cdot p_{\bar{h}\cdot }\\= & {} \sum _{l=1}^k \frac{M_l(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)}\cdot \frac{n_{\cdot l}}{N} \cdot p(l|\bar{h})\cdot p_{\bar{h}\cdot }\\&+\sum _{l=1}^k \sum _{\{g: g\ne l\}}^k \frac{M_g(Y)-{\bar{M}}_{\bar{h} l}(Y)}{M(Y)}\cdot \frac{n_{\cdot g}}{N} \cdot p(l|\bar{h})\cdot p_{\bar{h}\cdot }\\= & {} \sum _{l=1}^k V_{\bar{h} l l} (Y)\cdot p_{\bar{h}\cdot }+\sum _{l=1}^k \sum _{\{g: g\ne l\}}^k V_{\bar{h} l g} (Y)\cdot p_{\bar{h}\cdot } \\= & {} {\mathcal {P}}_W+{\mathcal {P}}_B. \end{aligned}$$

5.2 Comparison with other decompositions by subpopulations

As mentioned in Sect. 1, the literature contains two quite recent decompositions of the Pietra index by subpopulations. The first one, proposed by Frosini (2012), splits the Pietra index \(\mathcal P\) into the sum of two terms, namely

$$\begin{aligned} {\mathcal {P}}=P_W^F+P_B^F. \end{aligned}$$

In this decomposition:

  • \(P_{W}^F\) is the within component, defined as

    $$\begin{aligned} P_{W}^F=\sum _{l=1}^k \frac{T_l(Y)}{T(Y)}\mathcal P_l, \end{aligned}$$
    (11)

    where \(\mathcal P_l\) is the Pietra index of Y in subpopulation l,

    $$\begin{aligned} \mathcal P_l=\frac{\sum _{h=1}^r |y_h-M_l(Y)|\cdot n_{hl}}{2M_l(Y)n_{\cdot l}}\qquad l=1,\dots ,k; \end{aligned}$$
  • \(P_B^F\) is the between component, and it is the sum of the two quantities \(P_{Bt}\) and \(P_{Bm}\):

    1. (i)

      \(P_{Bt}\) is the mixture effect and is equal to

      $$\begin{aligned} P_{Bt}= & {} \frac{1}{T(Y)}\left[ \sum _{M_l(Y)>M(Y)}\;\; \sum _{M(Y)<y_{h}\le M_l(Y)}[y_h-M_l(Y)]n_{hl} + \right. \\&\left. - \sum _{M_l(Y)<M(Y)}\;\; \sum _{M_l(Y)<y_{h}\le M(Y)}[y_h-M_l(Y)]n_{hl} \right] , \end{aligned}$$
    2. (ii)

      \(P_{Bm}\) is the mean effect, given by

      $$\begin{aligned} P_{Bm}= & {} \frac{1}{2T(Y)}\sum _{l=1}^k P_{Bm}^{l}\\= & {} \frac{1}{2T(Y)}\sum _{l=1}^k\left[ \sum _{h=\bar{h}+1}^rn_{hl}-\left( \sum _{h=1}^{\bar{h}} n_{hl}-K_l\right) \right] [M_l(Y)-M(Y)],\\ \end{aligned}$$

      where

      $$\begin{aligned} K_l=\left\{ \begin{array}{ll} n_{h_0l} &{} \text{ if } \exists \; h_0 \in \{1,\dots ,r\} \text{ such } \text{ that } y_{h_0}=M(Y) \\ 0 &{} \text{ otherwise. } \\ \end{array} \right. \end{aligned}$$

Moreover, in Frosini (2012), the contribution \(D_l\) to the Pietra index \(\mathcal P\) due to subpopulation l, can be computed as follows:

$$\begin{aligned} D_l=\left\{ \begin{array}{ll} \displaystyle \frac{T_l(Y)}{T(Y)}\mathcal P_l-\frac{1}{T(Y)} \displaystyle \sum _{M_l(Y)<y_{h}\le M(Y)}[y_h-M_l(Y)]n_{hl} -\frac{P_{Bm}^{l}}{2T(Y)} &{} \text{ if } M_l(Y)<M(Y)\\ \displaystyle \frac{T_l(Y)}{T(Y)}\mathcal P_l &{} \text{ if } M_l(Y)=M(Y)\\ \displaystyle \frac{T_l(Y)}{T(Y)}\mathcal P_l+\frac{1}{T(Y)} \displaystyle \sum _{M(Y)<y_{h}\le M_l(Y)}[y_h-M_l(Y)]n_{hl} +\frac{P_{Bm}^{l}}{2T(Y)} &{} \text{ if } M_l(Y)>M(Y).\\ \end{array} \right. \end{aligned}$$

The procedure proposed by Frosini (2012) can be useful, even if the between component \(P_B^F\) requires caution in the interpretation, given that it is the sum of two quantities related to different effects and its meaning is therefore neither very intuitive nor immediate. The possibility to assess the contribution due to each subpopulation is surely a worthy characteristic of this procedure.

The second of the aforementioned decompositions was proposed by Habib (2012). Starting from the definitions of

  • the overall variation

    $$\begin{aligned} d_{hl}=\left\{ \begin{array}{ll} [y_h-M(Y)]n_{hl} &{} \text{ if } y_{h}>M(Y) \\ 0 &{} \text{ otherwise; } \\ \end{array} \right. \end{aligned}$$
  • the within variation

    $$\begin{aligned} w_{hl}=\left\{ \begin{array}{ll} [y_h-M_l(Y)]n_{hl} &{} \text{ if } y_{h}>M_l(Y) \\ 0 &{} \text{ otherwise; } \\ \end{array} \right. \end{aligned}$$
  • the between variation

    $$\begin{aligned} z_{l}=\left\{ \begin{array}{ll} M_l(Y)-M(Y) &{} \text{ if } M_l(Y)>M(Y) \\ 0 &{} \text{ otherwise, } \\ \end{array} \right. \end{aligned}$$

the following decomposition of the Pietra index \(\mathcal P\) is obtained

$$\begin{aligned} \mathcal P={\tilde{P}}_W+ {\tilde{P}}_B+ {\tilde{P}}_E, \end{aligned}$$

where:

  • \({\tilde{P}}_W\) is the within component, defined as

    $$\begin{aligned} {\tilde{P}}_W=\displaystyle \sum _{l=1}^k \frac{n_{\cdot l}}{N}\cdot \frac{M_l(Y)}{M(Y)}\cdot \mathcal P_l; \end{aligned}$$
    (12)
  • \({\tilde{P}}_B\) is the between component

    $$\begin{aligned} {\tilde{P}}_B=\displaystyle \frac{\sum _{l=1}^k z_l}{\sum _{l=1}^k M_l(Y)}\cdot \displaystyle \frac{\sum _{l=1}^k M_l(Y)}{k\cdot M(Y)}\cdot \sum _{l=1}^k\frac{n_{\cdot l}}{N}\cdot \displaystyle \frac{z_l\cdot k}{\sum _{l=1}^k z_l}; \end{aligned}$$
  • \({\tilde{P}}_E\) is the (error) remaining term

    $$\begin{aligned} {\tilde{P}}_E=\displaystyle \frac{1}{NM(Y)}\displaystyle \sum _{l=1}^k\sum _{h=1}^r d_{hl}-\sum _{l=1}^k \frac{n_{\cdot l}}{N}\left[ \frac{\sum _{h=1}^r w_{hl}}{n_{\cdot l} M(Y)}+\frac{z_l}{M(Y)}\right] . \end{aligned}$$

In this decomposition, it is easy to see that the within component \({\tilde{P}}_W\) defined in (12) coincides with the corresponding \(P_W^F\) defined in (11), given that by definition it holds that

$$\begin{aligned} \frac{T_{l}(Y)}{T(Y)}=\frac{n_{\cdot l}}{N}\cdot \frac{M_{l}(Y)}{M(Y)}. \end{aligned}$$

As in the previous procedure, the interpretation of the between component \({\tilde{P}}_B\) is not very immediate, and the presence of the quantity \({\tilde{P}}_E\), required to rescale the sum of \({\tilde{P}}_W\) and \({\tilde{P}}_B\) into the interval [0, 1], does not facilitate that task. To simplify the decomposition, Habib (2012) proposes removing the third term \({\tilde{P}}_E\) by dividing it into two parts to be summed to \({\tilde{P}}_W\) and \({\tilde{P}}_B\), arguing that “the separation of the error term to within-groups and between-groups errors could be based on a proportionate of each error to the total error” (Habib 2012). In other words, he proposes splitting \({\tilde{P}}_E\) into the sum of \({\tilde{P}}_{E_W}\) and \({\tilde{P}}_{E_B}\), to obtain the so-called perfect decomposition

$$\begin{aligned} \mathcal P= P_W^H+P_B^H, \end{aligned}$$

where

$$\begin{aligned} P_W^H={\tilde{P}}_W+{\tilde{P}}_{E_W}\qquad \text{ and } \qquad P_B^H={\tilde{P}}_B+{\tilde{P}}_{E_B}. \end{aligned}$$

However, dividing \({\tilde{P}}_E\) into the sum of \({\tilde{P}}_{E_W}\) and \({\tilde{P}}_{E_B}\) with no objective rule for determining how the splitting must be performed seems very arbitrary, and it makes the interpretations of \(P_W^H\) and \(P_W^B\) less intuitive and more problematic.

6 Applications to two actual sport datasets

In this section, two applications of the proposed decomposition procedures are presented, both regarding professional football teams in Italy. The first one deals with their balance-sheet data, while the second one deals with the market values of all their players.

6.1 Decomposition by sources

Serie A is the most important football league in Italy, with \(N=20\) professional teams. As in other European countries, the correctness of the balance sheets of the football teams has become increasingly important in recent years, leading to burgeoning studies on this topic; for example, see PwC (2018) and KPMG (2019). Within this framework, the following analysis is provided. For each Serie A team, the following five balance-sheet variables are considered:

  • \(Y=\) Total assets;

  • \(X_1 = \text{Cash}\);

  • \(X_2=\) Total accounts receivable;

  • \(X_3=\) Inventories, short-term investments and other current assets;

  • \(X_4=\) Net property (tangible and intangible assets), investments and advances, other assets.

All the variables are in millions of Euros and refer to fiscal year 2018. The data are from https://www.analisiaziendale.it and are repeated in the provided supplementary material. The Total assets is the investigated variable, while \(X_j\), (with \(j=1,2,3,4\)) are the four sources, given that \(Y=\sum _{j=1}^4X_j.\) Table 2 provides some descriptive statistics about all the variables, and Fig. 1 shows their boxplots.

Table 2 Descriptive statistics of the variables considered for the decomposition by sources
Fig. 1
figure 1

Boxplots of the variables considered for the decomposition by sources

Clearly, Net property, investments and advances, other assets \((X_4)\) is the most relevant source: its mean and mean absolute deviation (MAD) are the highest and the most comparable to those of Total assets (Y).

The value of the Pietra index of Y is 0.3584, denoting a medium level of heterogeneity. The four contributions obtained by the proposed decomposition by sources are stored in Table 3.

Table 3 The contributions of the sources to the Pietra index \({\mathcal {P}}\)

As expected, the highest contribution is due to the source Net property, investments and advances, other assets \((X_4)\), which represents the most part of \({\mathcal {P}}\) (73.88%). Total accounts receivable \((X_2)\) follows with 18.42%, then Cash \((X_1)\) with 6.36%. The source Inventories, short-term investments and other current assets \((X_3)\) is the last one, with a negligible contribution of 1.34%.

The comparison of the differences reported in the last column of Table 3 shows that the sources Cash and Net property, investments and advances, other assets exacerbate the heterogeneity of Total assets, while the other two play a mitigating role. As a final remark, it can be argued that the heterogeneity in the distribution of Y is due mainly to the source Net property, investments and advances, other assets, given that this variable includes intangible assets, to which a percentage of the values of the team players is allocated: as analyzed in the following application, these values differ considerably also among teams in the same league.

6.2 Decomposition by subpopulations

This second application concerns not just Serie A teams but all Italian professional football teams in the three existing leagues, namely Serie A, B, and C, with \(n_{\cdot 1}=20\), \(n_{\cdot 2}=19\), and \(n_{\cdot 3}=59\) teams, respectively. In fact, Serie C is divided territorially into three more subgroups, but in this application they are considered all together. For each team, the variable \(Y=\) “Value (in millions of Euro) of the team players in November 2018”, which is the sum of the market values of all the team players, is investigated. The data are available online at https://www.transfermarkt.it. The three leagues are the three subpopulations, and the purpose of this analysis is to investigate how the heterogeneity of Y is split across the three leagues and to assess the heterogeneity levels between the leagues and within each one. Table 4 provides some statistical indicators for the variable Y regarding the three subpopulations and the whole population. The calculations give \(\bar{h}=\max \{h\;:\;y_h\le M(Y)\}=80\), and therefore \(p_{\bar{h}\cdot }=\frac{80}{98}=0.8163\). The lower means \({\bar M}_{\bar{h}l}\) and the relative frequencies \(p(l|\bar{h})\) needed for the calculations are also given in Table 4. The complete dataset can be found in the provided supplementary material.

Table 4 Descriptive statistics of the variable Y for the three subpopulations and for the whole population

An overview of the data shows that these three subpopulations are almost non-overlapping, given that all the values of teams in Serie A are higher than those of all the teams in the other two subpopulations and only one team in Serie C has a value higher than that of the poorest team in Serie B. The three subpopulations are therefore very different, as shown by the orders of magnitude of all the location indexes and by the variability indicator, the mean absolute deviation (MAD). It is also interesting to note that the values of Y for all the teams in Serie B and C are lower than the total mean \(M(Y)=53.778\).

Direct computation shows that for the whole population of the 98 professional football teams, the mean absolute deviation is \(S_{M(Y)}=74.483\), therefore the Pietra index is given by

$$\begin{aligned} \mathcal P=\frac{74.483}{2\cdot 53.778}=0.6925. \end{aligned}$$

The proposed procedure allows to decompose the Pietra index as

$$\begin{aligned} \mathcal P=\sum _{l=1}^3 \sum _{g=1}^3 V_{\bar{h}lg}(Y)\cdot p_{\bar{h}\cdot }. \end{aligned}$$

The values \(V_{\bar{h}lg}(Y)\cdot p_{\bar{h}\cdot }\) are the entries of the \(3\times 3\) decomposition-by-subpopulations matrix, reported in Table 5. Interestingly, note that in this application, the two values

$$\begin{aligned} V_{\bar{h} 22}(Y)\cdot p_{\bar{h} \cdot }=V_{\bar{h} 33}(Y)\cdot p_{\bar{h} \cdot }=0, \end{aligned}$$

since

$$\begin{aligned} V_{\bar{h} 22}(Y)=V_{\bar{h} 33}(Y)=0. \end{aligned}$$

This can be proved by replacing data in Table 4 by their definitions, given by

$$\begin{aligned} V_{\bar{h} 22}(Y)= & {} M_2(Y)-{\bar{M}}_{80,2}=17.556-17.556=0,\\ V_{\bar{h} 33}(Y)= & {} M_3(Y)-{\bar{M}}_{80,3}=4.017-4.017=0. \end{aligned}$$

To interpret the obtained values, the quantity

$$\begin{aligned} V_{\bar{h} 31}(Y)\cdot p_{\bar{h} \cdot }=\frac{M_1(Y)-{\bar{M}}_{\bar{h} 3}(Y)}{M(Y)} \cdot \frac{n_{\cdot 1}}{N}\cdot p(3|\bar{h}) \cdot p_{\bar{h}\cdot }=0.5277 \end{aligned}$$

shows that the average value of the teams in Serie A is much greater than the lower mean of the teams in Serie C, and therefore it allows to assess a relevant “economic distance” between the two subpopulations Serie A and C.

A minor distance is registered between Serie A and Serie B, since

$$\begin{aligned} V_{\bar{h} 21}(Y)\cdot p_{\bar{h} \cdot }=\frac{M_1(Y)-{\bar{M}}_{\bar{h} 2}(Y)}{M(Y)} \cdot \frac{n_{\cdot 1}}{N}\cdot p(2|\bar{h}) \cdot p_{\bar{h}\cdot }=0.1599, \end{aligned}$$

and a very low one is registered between Serie B and Serie C given that

$$\begin{aligned} V_{\bar{h} 32}(Y)\cdot p_{\bar{h} \cdot }=\frac{M_2(Y)-{\bar{M}}_{\bar{h} 3}(Y)}{M(Y)} \cdot \frac{n_{\cdot 2}}{N}\cdot p(3|\bar{h}) \cdot p_{\bar{h}\cdot }=0.0294. \end{aligned}$$

The value

$$\begin{aligned} V_{\bar{h} 12}(Y)\cdot p_{\bar{h} \cdot }=\frac{M_2(Y)-{\bar{M}}_{\bar{h} 1}(Y)}{M(Y)} \cdot \frac{n_{\cdot 2}}{N}\cdot p(1|\bar{h}) \cdot p_{\bar{h}\cdot }=-0.0017 \end{aligned}$$

and the other negative ones (\(V_{\bar{h} 13}(Y)\cdot p_{\bar{h} \cdot }=-0.0084\); \(V_{\bar{h} 23}(Y)\cdot p_{\bar{h} \cdot }=-0.0294\)) show that the average value of the teams in Serie B is less than the lower mean of the teams in Serie A, and the average value of the teams in Serie C is less than both the lower means of the other two leagues.

The aggregated values in the last rows of Table 5 show that the subpopulation with the largest contribution to the Pietra index is Serie C (with \(80.45\%\)), and the one with the smallest contribution is Serie A (with \(0.71\%\)). A reasonable cause for this can be identified also in the very high weight of Serie C in the whole population (since \(p(3|\bar{h})=73.75\%\)). Table 5 also provides the within and between parts of the contribution of each subpopulation. In this application, given that the quantities \(V_{\bar{h} 22}(Y)\) and \(V_{\bar{h} 33}(Y)\) are zero, Serie B and C do not contribute to the within component of the Pietra index, and the heterogeneity due to those is carried into the between component. The within part of Serie A coincides with the within component of \(\mathcal P\).

As special case, also the decomposition of the Pietra index into the within and between components can be obtained. The former is the sum of the entries in the main diagonal of Table 5, while the latter is the sum of all the remaining ones:

$$\begin{aligned} {\mathcal {P}}_W= 0.0150 \quad \text{ and } \quad {\mathcal {P}}_B = 0.6775. \end{aligned}$$

From these, it is interesting to evaluate the two ratios

$$\begin{aligned} \frac{{\mathcal {P}}_W}{{\mathcal {P}}}=\frac{0.0150}{0.6925}=0.022 \quad \text{ and } \quad \frac{{\mathcal {P}}_B}{{\mathcal {P}}}=\frac{0.6775}{0.6925}=0.978, \end{aligned}$$

which show that in this application the between component of the Pietra index is far more relevant than the within one. The between component represents the 97.8% of the total, while the within one represents only 2.2%. This result also shows that the disparity across the three leagues counts for much more than that into the leagues.

Table 5 The \(3\times 3\) decomposition by subpopulations matrix, and the contributions of each subpopulation split in within and between parts

Now, the other two decomposition procedures reported in the previous sections, are applied and the results compared with those already obtained. Applying the decomposition proposed by Frosini (2012) to the examined dataset gives the following values:

$$\begin{aligned} P_W^F= 0.3401 \quad \text{ and } \quad P_B^F = 0.3524, \end{aligned}$$

with

$$\begin{aligned} P_{Bt}= -0.2664 \quad \text{ and } \quad P_{Bm} = 0.6188. \end{aligned}$$

The computation of the two ratios

$$\begin{aligned} \frac{P_W^F}{{\mathcal {P}}}=\frac{0.3401}{0.6925}=0.491 \quad \text{ and } \quad \frac{P_B^F}{{\mathcal {P}}}=\frac{0.3524}{0.6925}=0.509 \end{aligned}$$

allows to argue that in this decomposition, the two components (within and between) are quite balanced, since they represent 49.1% and 50.9% of the total, respectively. This result is obtained even if the subpopulations are almost non-overlapping and have very different means, as already remarked.

Table 6 summarizes the contributions of each subpopulation according to the procedure proposed by Frosini (2012).

Table 6 The contributions of the three subpopulations in the decomposition proposed by Frosini (2012)

Unlike with the previous decomposition, here the subpopulation with the lowest contribution to the Pietra index is Serie B (9.43%). This conclusion is motivated neither by the order of magnitude of the values of Y in that league nor by the relative weight of that league in the whole population, which is very close to that of Serie A.

Computating the decomposition proposed by Habib (2012) gives the following values:

$$\begin{aligned} {\tilde{P}}_W= 0.3401, \quad {\tilde{P}}_B = 0.6876\quad \text{ and } \quad {\tilde{P}}_E = -0.3352, \end{aligned}$$

where the within component \({\tilde{P}}_W\) coincides with \(P_W^F\), as already remarked. The presence of the third term \({\tilde{P}}_E\) makes the interpretation of the results not very intuitive. Then, following the same approach used in the application in Habib (2012), \({\tilde{P}}_E\) can be divided equally into the two errors \({\tilde{P}}_{E_W}\) and \({\tilde{P}}_{E_B}\), giving

$$\begin{aligned} P_{W}^H= 0.1725 \quad \text{ and } \quad P_{B}^H = 0.5200. \end{aligned}$$

The relative weights of these two components are therefore

$$\begin{aligned} \frac{P^H_W}{{\mathcal {P}}}=\frac{0.1725}{0.6925}=0.249 \quad \text{ and } \quad \frac{P_B^H}{{\mathcal {P}}}=\frac{0.52}{0.6925}=0.751. \end{aligned}$$

These values show that it is the between component that is most relevant, given that it represents 75.1% of the total, while the within component represents the remaining 24.9%. It is worth recalling that the lack of a rule governing how to split \({\tilde{P}}_E\) into \(P_{E_W}\) and \(P_{E_B}\) is a non-negligible point of weakness of this procedure.

7 Conclusions and final remarks

In this paper, two innovative procedures for decomposing the Pietra index are introduced based on an alternative expression for this “evergreen” measure. The decomposition by sources allows to obtain the contribution related to each source. The decomposition by subpopulations allows to assess how each subpopulation contributes to the value of the index, also by assessing the within and between parts in each contribution. By using a different aggregation, it is also easy to obtain the classical decomposition of the Pietra index into the within and between components. Because of their very fine decomposition levels, these two procedures are very innovative and provide researchers with more information than do many others available in the literature for the Pietra index. The two presented applications add interesting details about Italian professional football teams: the first one shows how the heterogeneity of the Total assets in Serie A teams can be split among its sources, while the second one highlights how much of the disparity in team values is due to each of the three considered leagues.