Skip to main content

Advertisement

Log in

Further results on the regression-based approach to inequality decomposition with evidence from India

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

The paper revisits regression-based inequality decomposition, derives further theoretical results on the factor shares and applies them in an empirical setting. Noting that the approach based on Shorrocks and Fields is not directly applicable to an important welfare-based inequality index, namely Atkinson’s inequality index, we generalise it to derive shares for this index. We also derive the asymptotic distribution of all share estimators for obtaining their standard errors necessary for drawing inference. Finally, we use our theoretical results to examine the major factors that contribute to income inequality in India. Our results show that education and household size are the two most dominant factors contributing to income inequality in both rural and urban areas, followed by employment status and regional differences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Hereafter capital letters refer to concepts relating to the whole population and small letters to the sample counterparts.

  2. Note that \(Y_k\) is the vector containing \(Y_{ik}\)s, constructed in the same way as \(Y\).

  3. The share of the constant term, \(s_1\), is zero and including it will lead to a singular variance–covariance matrix.

  4. The reader should pay attention to the fact that the notation ‘I’ used here for the Identity matrix should not be confused with the inequality measure introduced in the previous section.

  5. Note that these shares \(s_k\) do not satisfy the original assumptions of Shorrocks, only the transformed shares do.

  6. \(f(0) = 1\) for the above \(f(\cdot )\).

  7. There is a debate on methodological comparability across the different NSSO rounds on the choice of the reference period. However, we have tried to overcome this problem in this paper as we use the 30-day uniform reference period, which is comparable across all rounds and we do not take into consideration the 55th round (1999–2000) for the analysis.

  8. For our case with only one restriction, Leamer’s critical value is given by \((T-K)(T^{1/T}-1)\) (cf. Leamer (1978), p. 114) which goes to \(ln T\) as \(T\) goes to \(\infty \).

  9. The procedure was also applied for \(\epsilon =0.5\); the results are consistent with the ones presented in the paper showing the same pattern across the studied period.

  10. For example, the total contribution of industry groups is made up of five terms (dummy variables): mining and manufacturing; electricity, gas and water; construction; low-skilled service sector; and high-skilled service sector. This aggregation is also done for social groups, employment status, education levels, industry groups and regions.

  11. Note that if a coefficient is not significant then its contribution to income inequality will be equal to zero. So we set the coefficient equal to zero and do not compute the shares for the insignificant parameters.

  12. We do not report the partial correlations here but they are available upon request.

  13. We do not discuss the differences between Atkinson’s index and Gini to avoid a lengthy paper but the Gini shares are available with the authors upon request.

  14. This could unfortunately be due to the nature of available data. In the analysis, we compare SCs and STs with ‘Others’, the latter being an all encompassing category that includes everyone else. This is a large heterogeneous category that includes castes that are very low in the hierarchy, not necessarily very different from the SCs and STs in status and in economic conditions. It is possible that this classification actually underestimates the relative disadvantage of scheduled castes and scheduled tribes with respect to the ‘higher’ castes.

  15. A combination of the formulations by Shorrocks (1982) and Fields (2003) has been used here.

  16. The notation \(\overset{p}{\longrightarrow }\) denotes convergence in probability and \(\overset{d}{\longrightarrow }\) convergence in distribution. The results of Lemma 1 will not be proved here as they are available in many econometrics textbooks (e.g. see Greene (2007), page 1049).

  17. Here we use the standard OLS estimator but other estimators could be used in which case the computation of the asymptotic distribution will have to be modified accordingly. Note that the constant term disappears with the \(M\) transformation.

  18. See for example Greene (2007), page 67.

  19. \(l_k\) is a vector of dimension \(K-1\) with 0 everywhere and 1 on the \((k-1)^{th}\) position: \(l_k=\left[ 0 \cdots 0 1 0 \cdots 0\right] '\). Using this vector we have \(x_k=\tilde{x} \cdot l_k = \left[ x_2 \cdots x_K \right] \left( \begin{array}{c} 0 \\ \vdots \\ 1\\ \vdots \\ 0 \end{array} \right) . \)

  20. Let \(L=\left[ \begin{array}{c@{\quad }c@{\quad }c} l_2 &{} &{} 0 \\ &{} \ddots &{} \\ 0 &{} &{} l_K\end{array} \right] \).

  21. We present the variance instead of MSE because the estimation of \(\beta \) is unbiased even with a small sample size; note that \(V(\hat{\beta })\) denotes the sample variance of \(\hat{\beta }\).

References

  • Ameida dos Reis JG, de Barros RE (1991) Wage inequality and the distribution of education. J Dev Econ 36(1):117–143

    Article  Google Scholar 

  • Barro RJ (2000) Inequality and growth in a panel of countries. J Econ Growth 5(1):5–32

    Article  Google Scholar 

  • Becker GS (1964) Human capital. National Bureau of Economic Research, New York

    Google Scholar 

  • Bhaduri A (2008) Predatory growth: commentary. Econ Political Wkly 43(16):10–14

    Google Scholar 

  • Chow G (1983) Econometrics. McGraw-Hill, New York

    Google Scholar 

  • Cowell FA, Jenkins SP (1995) How much inequality can we explain?: a methodology and an application to the United States. Econ J 105(429):421–430

    Article  Google Scholar 

  • Das DK (2003) Manufacturing productivity under varying trade regimes: India in the 1980s and 1990s. Working Paper 107, Indian Council for Research on International Economics Relations, New Delhi

  • Deaton A (1997) Analysis of household surveys: a microeconometric approach to development policy. World Bank Publications, Washington DC

    Book  Google Scholar 

  • Deininger K, Squire L (1998) New ways of looking at old issues: inequality and growth. J Dev Econ 57(2):259–287

    Article  Google Scholar 

  • Dev MS, Ravi C (2007) Poverty and inequality: all India and states, 1983–2005. Econ Political Wkly 42(6):509–521

    Google Scholar 

  • Fei JC, Ranis G, Kuo SWY (1978) Growth and the family distribution of income by factor components. Q J Econ 92:17–53

    Article  Google Scholar 

  • Fields GS (2003) Accounting for income inequality and its change: a new method, with application to the distribution of earnings in the United States. Res Labour Econ 22:1–38

    Article  Google Scholar 

  • Greene WH (2007) Econometric analysis, 6th edn. Prentice Hall, New York

    Google Scholar 

  • International Labour Organisation (LO) (2008) World of Work Report: Inequalities in the age of Financial Globalisation. International Institute of Labour Studies, Geneva

  • Kohli A (2006) Politics of economic growth in India, 1980–2005, Part II: the 1990s and beyond. Econ Political Wkly 41(14):1361–1370

    Google Scholar 

  • Lam D, Levison D (1991) Declining inequality in schooling in Brazil and its effects on inequality in earnings. J Dev Econ 37:199–225

    Article  Google Scholar 

  • Leamer E (1978) Ad Hoc inference with nonexperimental data. Wiley, New York

    Google Scholar 

  • Londono JL (1996) Poverty, inequality and human capital development in Latin America 1950–2025. World Bank Latin American and Caribbean Studies, Washington DC

    Book  Google Scholar 

  • Mincer J (1958) Investment in human capital and personal income distribution. J Political Econ 66:281–302

    Article  Google Scholar 

  • Mincer J (1970) The distribution of labour incomes: a survey with the special reference to the human capital approach. J Econ Lit 8(1):1–26

    Google Scholar 

  • Morduch J, Sicular T (2002) Rethinking inequality decomposition, with evidence from rural China. Econ J 112:93–106

    Article  Google Scholar 

  • National Sample Survey Organisation (NSSO) (1987). Employment and unemployment situation in India 1983. NSS 38 Round (January 1983–December 1983); No. 341. Department of Statistics, Government of India, New Delhi (Raw data from this survey was used)

  • National Sample Survey Organisation (NSSO) (1997). Employment and Unemployment Situation in India 1993–94. NSS 50 Round (July 1993–June 1994); No. 409 Department of Statistics, Government of India, New Delhi (Raw data from this survey was used)

  • National Sample Survey Organisation (NSSO) (2006a). Employment and unemployment situation in India 2004–5 (Part-I). NSS 61 Round (July 2004–June 2005); No. 515 (61/10/1). Department of Statistics, Government of India, New Delhi (Raw data from this survey was used)

  • National Sample Survey Organisation (NSSO) (2006b). Employment and unemployment situation in India 2004–5 (Part-II). NSS 61 Round (July 2004–June 2005); No. 515 (61/10/1). Department of Statistics, Government of India, New Delhi (Raw data from this survey was used)

  • Oaxaca RL (1973) Male–female wage differentials in urban labour markets. Int Econ Rev 14(3):693–709

    Article  Google Scholar 

  • Organisation for economic co-operation and development (OECD) (2008) Growing Unequal? Income Distribution and Poverty in OECD Countries, OECD, Paris

  • Pyatt G, Chen C, Fei J (1980) The distribution of income by factor components. Q J Econ 95(3):451–473

    Article  Google Scholar 

  • Rodrik D, Subramanian A (2005) From Hindu growth to productivity surge: the mystery of the indian growth transition. IMF Staff Papers 52(2):193–228

    Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  Google Scholar 

  • Sengupta A, Kannan KP, Raveendran G (2008) India’s common people: who are they, how many are they and how do they live? Econ Political Wkly 43(11):49–63

    Google Scholar 

  • Shapley L (1953) A value for \(n\)-person games. In: Kuhn HW, Tucker AW (eds) Contributions to the theory of games, vol 2. Princeton University Press, Princeton

    Google Scholar 

  • Shorrocks AF (1982) Inequality decomposition by factor components. Econometrica 50(1):193–211

    Article  Google Scholar 

  • Shorrocks AF (1984) Inequality decomposition by population sub-groups. Econometrica 52:1369–1385

    Article  Google Scholar 

  • Sundaram K, Tendulkar SD (2003) Poverty in India in the 1990s: revised estimates. Econ Political Wkly 38(46):4865–4872

    Google Scholar 

Download references

Acknowledgments

The authors are grateful to Gary Fields for useful comments on an earlier version of this paper. We also thank Ajit Ghose for helpful discussions. The authors would like to thank the referees for their valuable comments which helped to improve the paper substantially.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaya Krishnakumar.

Appendices

Appendices

1.1 Appendix A: Shorrocks’ theorem and its six assumptions:

Shorrocks’ theorem can be stated as follows:

Let the total income of an individual \(i\,(i=1,\ldots ,N)\) be written as a sum of different components (sources)

$$\begin{aligned} Y_i = \sum _{k=1}^{K}{Y_{ik}} \qquad k=1,\ldots ,K \end{aligned}$$

and let \(I(Y)\) be an inequality measure of the distribution of incomes denoted as \(Y = [Y_1,\ldots ,Y_N]'\). Then the share of each component \(s_k\) to the total inequality is given by:

$$\begin{aligned} s_k = \frac{cov(Y_k,Y)}{\sigma ^{2}(Y)} \end{aligned}$$

such that:

$$\begin{aligned} \sum _k{s_k} = 1 \end{aligned}$$

provided the following six important assumptionsFootnote 15 are satisfied by \(I(Y)\).

  1. 1.

    \(I(Y)\) is continuous and symmetric, and it is equal to 0 if and only if all individuals have the same income, \(Y' = [\mu ... \mu ]\).

  2. 2.

    \(S_k(Y_1,\dots ,Y_K,K)=S_k(Y_k,Y)\), the contribution of factor \(k\), is continuous in \(Y_k\) (continuity), and if \(\pi _1, \dots , \pi _K\) is a permutation of \(1, \dots ,K\), then \(S_k(Y_{\pi _1},\dots ,Y_{\pi _K},K)=S_{\pi _k}(Y_1,\dots ,Y_K,K)\) (symmetric treatment of factors).

  3. 3.

    The contribution of a factor does not depend on how the others are grouped (independence of the level of disaggregation).

  4. 4.

    The sum of the contributions is equal to the inequality measure (consistent decomposition): \(\sum _k{S_k(Y_1,\dots ,Y_K,K)}=I(Y)\).

  5. 5.

    If \(P\) is any permutation matrix, then \(S_k(Y_k,Y)=S_k(Y_{k}P,YP)\) (population symmetry) and if all individuals have the same income for factor \(k\) its contribution is \(S_k(Y_k,Y)=0\) (normalisation for equal factor distribution).

  6. 6.

    (Two factor symmetry) Suppose that the incomes of factor 2 are a permutation of those of factor 1 \((Y_2 = Y_{1}P)\). Then if there are only these two sources of income, they should receive the same value in the decomposition: \(S_1(Y_1;Y=Y_1+Y_{1}P)=S_2(Y_2=Y_{1}P;Y=Y_1+Y_{1}P).\)

As one can see from the substance of these assumptions inserted in parentheses, they are reasonably basic and self-explanatory. They can also be considered realistic as the majority of inequality measures satisfy them. The assumption of consistent decomposition (i.e. Assumption 4) is the only one that can cause some problems as an additive form cannot be obtained for a few measures and a workaround is needed. An alternative approach is introduced here for Atkinson’s measure which does not satisfy Assumption 4 and it can be replicated for other such measures.

1.2 Appendix B: Proof of Theorem 2

Let us recall some important limit results in the form of a lemma which will be useful for future referenceFootnote 16.

Lemma 1

  • 1. \(x_n \overset{d}{\longrightarrow } x, \quad y_n \overset{p}{\longrightarrow } \alpha \Rightarrow x_n + y_n \overset{d}{\longrightarrow } x + \alpha \)

  • 2. \(x_n \overset{d}{\longrightarrow } x, \quad y_n \overset{p}{\longrightarrow } 0 \Rightarrow y_n x_n \overset{d}{\longrightarrow } 0\)

  • 3. \(x_n \overset{d}{\longrightarrow } x, \quad A_n \overset{p}{\longrightarrow } A \Rightarrow A_nx_n \overset{d}{\longrightarrow } Ax\)

where \(x_n\), \(x\) and \(y_n\) are random variables, \(A_n\) is a deterministic sequence, and \(\alpha \) and \(A\) are constants.

We make two basic assumptions on \(x\) and \(\epsilon \):

$$\begin{aligned} \frac{1}{n}{x}'M{x} \overset{p}{\longrightarrow } Q_x, \quad \text {and} \quad \frac{1}{n}{x}'M\epsilon \overset{d}{\longrightarrow } 0 \end{aligned}$$

where \(M=I_n - \frac{1}{n}\iota _n \iota _n'\).

To prove the theorem we need to derive the limiting distribution of the vector of the shares for the factors (excluding the constant term), say \(\hat{s}^*\). From Eq. (6), we have:

$$\begin{aligned} \hat{s}^* = \left[ \begin{array}{c} \hat{s}_2\\ \vdots \\ \hat{s}_K \end{array} \right] = \left[ \begin{array}{c} \frac{\hat{\sigma }_{x_2 y}}{\hat{V}(y)}\hat{\beta _2} \\ \vdots \\ \frac{\hat{\sigma }_{x_K y}}{\hat{V}(y)}\hat{\beta _K} \end{array} \right] = \frac{1}{\hat{V}(y)} \left[ \begin{array}{ccc} \hat{\sigma }_{x_2 y} &{} &{} 0 \\ &{} \ddots &{} \\ 0 &{} &{} \hat{\sigma }_{x_K y} \end{array} \right] \left( \begin{array}{c} \hat{\beta }_2 \\ \vdots \\ \hat{\beta }_K \end{array} \right) \equiv \frac{1}{\hat{V}(y)} D \hat{\tilde{\beta }} \end{aligned}$$

using the notation \(D\) for the diagonal matrix in the middle of the above expression. We denote as \(\tilde{\beta }\) is the sub-vector of \(\beta \) when the constant term is excluded and \(\hat{\tilde{\beta }}\) is its estimator. Standard estimators of variances and covariances are given by:

$$\begin{aligned} \hat{V}(y)&= \frac{1}{n}\sum _i{\left( y_i - \bar{y}\right) ^2} = \frac{1}{n}y'My \\ \hat{\sigma }_{x_k y}&= \frac{1}{n}\sum _i{\left( x_{k,i} - \bar{x_k}\right) \left( y_i - \bar{y}\right) } = \frac{1}{n}y'Mx_k \end{aligned}$$

We can now write D as:

$$\begin{aligned} D&= \left[ \begin{array}{c@{\quad }c@{\quad }c} \frac{1}{n}y'Mx_2 &{} &{} 0 \\ &{} \ddots &{} \\ 0 &{} &{} \frac{1}{n}y'Mx_K\end{array} \right] \\&= \frac{1}{n}\left( \begin{array}{c@{\quad }c@{\quad }c} y'M &{} &{} 0 \\ &{} \ddots &{} \\ 0 &{} &{} y'M\end{array} \right) \left( \begin{array}{c@{\quad }c@{\quad }c} x_2 &{} &{} 0 \\ &{} \ddots &{} \\ 0 &{} &{} x_K\end{array} \right) = \frac{1}{n}\left( I_{K-1} \otimes y'M\right) x^* \end{aligned}$$

where the notation \(x^*\) is used for the last diagonal matrix in the above expression. The OLS estimator \(\hat{\tilde{\beta }}\) isFootnote 17:

$$\begin{aligned} \hat{\tilde{\beta }} = \left( x'Mx\right) ^{-1}x'My \end{aligned}$$

Replacing \(\hat{V}(y)\) and \(D\) in the expression of \(\hat{s}^*\), we get:

$$\begin{aligned} \hat{s}^*= \frac{1}{\hat{V}(y)} D \hat{\tilde{\beta }}=\frac{\frac{1}{n}\left( I_{K-1} \otimes y'M\right) x^*\hat{\tilde{\beta }}}{\frac{1}{n}y'My} \end{aligned}$$

We can now write \(\sqrt{n}\left( \hat{s}^*- s\right) \) as:

$$\begin{aligned} \sqrt{n}\left( \hat{s}^* - s\right)&= \sqrt{n}\left( \frac{\frac{1}{n}\left( I_{K-1} \otimes y'M\right) x^* \hat{\tilde{\beta }}}{\frac{1}{n}y'My} - \frac{\frac{1}{n}\left( I_{K-1} \otimes y'M\right) x^*\tilde{\beta }}{\frac{1}{n}y'My}\right) \\&= \frac{\frac{1}{n}\left( I_{K-1} \otimes y'M\right) x^*}{\frac{1}{n}y'My} \sqrt{n}\left( \hat{\tilde{\beta }}- \tilde{\beta }\right) \end{aligned}$$

Let us now study the different parts of the equation, starting with \(\frac{1}{n}y'My\):

$$\begin{aligned} \frac{1}{n}y'My&= \frac{1}{n}{\left( \beta _1\iota _n + \tilde{x}\tilde{\beta } +\epsilon \right) }'M\left( \beta _1\iota _n + \tilde{x}\tilde{\beta } +\epsilon \right) \\&= 0 + \frac{1}{n}\tilde{\beta }'\tilde{x}'M\tilde{x}\tilde{\beta } + \frac{1}{n}\epsilon 'M\tilde{x}\tilde{\beta } + \frac{1}{n}\tilde{\beta }'\tilde{x}'M\epsilon + \frac{1}{n}\epsilon 'M\epsilon \end{aligned}$$

The asymptotic limit of the first term is given by

$$\begin{aligned} \frac{1}{n}\tilde{\beta }'\tilde{x}'M\tilde{x}\tilde{\beta } \overset{p}{\longrightarrow } \tilde{\beta }' Q_x \tilde{\beta }, \end{aligned}$$

The second and third terms go to 0 and the last term goes to the variance of \(\epsilon \). So at the end we have:

$$\begin{aligned} \frac{1}{n}y'My \overset{p}{\longrightarrow } \tilde{\beta }' Q_x \tilde{\beta } + \sigma _{\epsilon }^2 \end{aligned}$$
(9)

We know the asymptotic distributionFootnote 18 of \(\sqrt{n}\left( \hat{\tilde{\beta }} - \tilde{\beta } \right) \):

$$\begin{aligned} \sqrt{n}\left( \hat{\tilde{\beta }} - \tilde{\beta } \right) \overset{d}{\longrightarrow } \textit{N}\left( 0; \sigma _{\epsilon }^2 Q_x^{-1}\right) \end{aligned}$$
(10)

The last part to be examined is \(\frac{1}{n}\left( I_{K-1} \otimes y'M\right) x^*\). To analyse its asymptotic behaviour, let us go back to the matrix notation:

$$\begin{aligned} \frac{1}{n}\left( I_{K-1} \otimes y'M\right) x^* = \left[ \begin{array}{ccc} \frac{1}{n}y'Mx_2 &{} &{} 0 \\ &{} \ddots &{} \\ 0 &{} &{} \frac{1}{n}y'Mx_K\end{array} \right] \end{aligned}$$

and look at the \(k\)-th term for example, \(\frac{1}{n}y'Mx_k\). Using a selection vector \(l_k\), that only selects the \(k^{th}\) elementFootnote 19 and substituting for \(y'\) we can write:

$$\begin{aligned} \frac{1}{n}y'Mx_k = \tilde{\beta }'{{\tilde{x}'M\tilde{x}}\frac{1}{n}}l_k + \frac{1}{n} \epsilon 'M\tilde{x}l_k \end{aligned}$$

It can be easily verified that:

$$\begin{aligned} \frac{1}{n}y'Mx_k \overset{p}{\longrightarrow } \tilde{\beta }'Q_xl_k + 0 = \tilde{\beta }'Q_xl_k \end{aligned}$$

Going back to the matrix notation we obtainFootnote 20:

$$\begin{aligned} \frac{1}{n}\left( I_{K-1} \otimes y'M\right) x^* \overset{p}{\longrightarrow } \left( I_{K-1}\otimes \tilde{\beta }'Q_x\right) L \end{aligned}$$
(11)

Putting together the results in Eqs. (10) and (11) and using Lemma 1.3, we get :

$$\begin{aligned} {{\frac{1}{n}\left( I_{K-1} \otimes y'M\right) x^*}} \sqrt{n}\left( \hat{\tilde{\beta }}-\tilde{\beta }\right) \overset{d}{\longrightarrow } \textit{N}\left( 0; \sigma _{\epsilon }^2 \left( I_{K-1}\otimes \tilde{\beta }'Q_x\right) L Q_x^{-1} L'\left( I_{K-1}\otimes Q_x\tilde{\beta }\right) \right) \end{aligned}$$

Now using Lemma 1.1 we find the result:

$$\begin{aligned} \sqrt{n}\left( \hat{s}^* - s\right) \overset{d}{\longrightarrow } \textit{N}\left( 0; \sigma _{\epsilon }^2 \frac{\left( I_K\otimes \beta 'Q_x\right) L Q_x^{-1} L'\left( I_K\otimes Q_x \beta \right) }{(\beta ' Q_x \beta + \sigma _{\epsilon }^2)^2} \right) \end{aligned}$$

The variance–covariance matrix obtained through the asymptotic results can then be written as:

$$\begin{aligned} Asy.Var(\hat{s}^*) = \Sigma = \frac{1}{n}\sigma _{\epsilon }^2 \frac{\left( I_K\otimes \beta 'Q_x\right) L Q_x^{-1} L'\left( I_K\otimes Q_x \beta \right) }{(\beta ' Q_x \beta + \sigma _{\epsilon }^2)^2} \end{aligned}$$

We can now move on to compute the variance of the all vector of shares \(\hat{s}\) (i.e. including the share for the residuals). We have \(\hat{s}_{\epsilon } = 1 - \sum _{k=1}^K \hat{s}_k = 1-\iota '\hat{s}^*\) and therefore:

$$\begin{aligned} {\hat{s}} = \left[ \begin{array}{c} \hat{s}_2\\ \vdots \\ \hat{s}_K \\ \hat{s}_\epsilon \end{array} \right] = \left[ \begin{array}{c} \hat{s}^* \\ 1-\iota '\hat{s}^* \end{array} \right] = \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] + \left[ \begin{array}{c} I_k \\ -\iota ' \end{array} \right] \hat{s}^* \end{aligned}$$

We can than easily compute the variance of the vector of all shares as:

$$\begin{aligned} Asy.Var(\hat{s}) = 0 + \left[ \begin{array}{c} I_k \\ -\iota ' \end{array} \right] Asy.Var{(\hat{s}^*)} \left[ \begin{array}{c@{\quad }c} I_k&-\iota \end{array} \right] = \left[ \begin{array}{c@{\quad }c} \Sigma &{} -\Sigma \iota \\ -\iota '\Sigma &{} -\iota '\Sigma \iota \end{array} \right] \end{aligned}$$

Thus we have:

$$\begin{aligned} \sqrt{n}\left( \hat{s} - s\right) \overset{d}{\longrightarrow } \textit{N}\left( 0; \left[ \begin{array}{c@{\quad }c} \Sigma &{} -\Sigma \iota \\ -\iota '\Sigma &{} \iota '\Sigma \iota \end{array} \right] \right) \end{aligned}$$

\(\square \)

1.3 Appendix C: Simulation experiment

We simulate a simple model specified as follows:

$$\begin{aligned} y = \beta _1 + x_2 \cdot \beta _2 + d_3 \cdot \beta _3 + \epsilon \end{aligned}$$
(12)

where \(y\) could be the income variable, \(x_2\) a continuous explanatory variable such as age in our case, and \(d_3\) a dummy explanatory variable such as gender. We generate the two explanatory variables and the error term as follows:

$$\begin{aligned}&x_2 \sim N(45,12)\\&d_3 = 1 \text { if } u \ge 0.6 \text { and 0 otherwise, where } u \sim U[0,1]\\&\epsilon \sim N(0, 100) \end{aligned}$$

Inspiring from our empirical results, we fix \(\beta _1 = 500\), \(\beta _2=3\) and \(\beta _3=40\) and generate \(y\) according to Eq. (12). Then \(\hat{\beta }\), the shares \(\hat{s}\) and their asymptotic variances are computed. We repeat the procedure 1’000 times and for different sample sizes \(n=100\), 1,000 and 10,000.

Table 3 presents a summary of the results, including the small sample variance \(V_s(\hat{s})\), the asymptotic variance \(Asy.Var(\hat{s})\) and the p-values of the Skewness–Kurtosis test for normalityFootnote 21.

We also explored different values for the coefficients and the results are similar.

Table 3 Summary of results for the simulation

1.4 Appendix D: Solving the problem of consistent decomposition for Atkinson’s Index

In this appendix we derive the decomposition function for Atkinson’s Index. We start by writing the measure as:

$$\begin{aligned} I_{A}(Y) = 1- \frac{1}{\mu } {\left[ \frac{1}{n}\sum _i {{\left( {y_i}\right) }^{1-\epsilon }}\right] }^\frac{1}{1-\epsilon } , \quad \epsilon \ge 0; \quad \epsilon \ne 1 \end{aligned}$$

Simply rearranging the expression of the inequality measure we obtain:

$$\begin{aligned} \left( 1-I_{A}(Y)\right) ^{1-\epsilon } = \frac{1}{\mu ^{1-\epsilon }}{\frac{1}{n} \sum _i{{\left( {y_i}\right) }^{1-\epsilon }}} \end{aligned}$$

The above expression gives us the transformation function proposed by Shorrocks (1982) \((i.e.{:} \,f(y)=(1-y)^{1-\epsilon })\). As observed in Sect. 3, this function does not satisfy the condition \(f(0)=0\), so we modify it, by subtracting one, to make it respect this condition. We then obtain the following expression for the transformed index:

$$\begin{aligned} \tilde{I_A}=f(I_A)=\left( 1-I_{A}(Y)\right) ^{1-\epsilon }-1 = \frac{1}{\mu ^{1-\epsilon }}{\frac{1}{n}\sum _i{{\left( {y_i}\right) }^{1-\epsilon }}}-1 \end{aligned}$$

The above expression can be rewritten as:

$$\begin{aligned} \tilde{I_A}= \frac{1}{\mu ^{1-\epsilon }}\frac{1}{n}\sum _i{{\left( {y_i} \right) }^{-\epsilon }\left[ \sum _k{y_{i,k}-\frac{1}{n}}\right] } \end{aligned}$$

Thus we can compute the contribution to inequality of the k-th factor as follows:

$$\begin{aligned} \tilde{S_k}= \frac{1}{\mu ^{1-\epsilon }}\frac{1}{n}\sum _i{{\left( {y_i} \right) }^{-\epsilon }\left[ y_{i,k}-\frac{1}{nK}\right] } \end{aligned}$$

It is easy to show that the two transformed measures (\(\tilde{S}\) and \(\tilde{I}\)) respect the six assumptions of Shorrocks’ theorem and hence the general formula of Theorem 1 can be applied to \(\tilde{I}_A\) to get \(\tilde{s}_k, k=1,\ldots ,K\) which can in turn be reconverted into \(s_k\) using \(f^{-1}(\cdot )\) as in (8).

1.5 Appendix E: Variances and covariances of Atkinson shares

Recall from (8) that

$$\begin{aligned} s_k = \frac{f^{-1}(\tilde{s}_k \tilde{I}_A)}{I_A} \end{aligned}$$

with

$$\begin{aligned} f^{-1}(y) = 1-(y+1)^{\frac{1}{1-\varepsilon }} \end{aligned}$$

Thus

$$\begin{aligned} \frac{\partial {s_k}}{\partial \tilde{s}_k}&= \frac{1}{I_A} \quad \frac{\partial {f^{-1}(\cdot )}}{\partial (\cdot )} \; \tilde{I}_A \\&= \frac{1}{I_A} \quad \frac{- 1}{(1- \varepsilon )} \quad (\tilde{s}_k \tilde{I}_k +1)^\frac{ \varepsilon }{(1- \varepsilon )} \; \tilde{I}_A \end{aligned}$$

and

$$\begin{aligned} \frac{\partial {s_k}}{\partial \tilde{s}_j} =0 \quad \text{ for } \quad k\ne j \end{aligned}$$

Hence,

$$\begin{aligned} V(\hat{s}_k) \cong \left( \frac{\partial {s_k}}{\partial \tilde{s}_k}\right) ^2 \; V(\hat{\tilde{s}}_k) \end{aligned}$$

where \(V(\hat{\tilde{s}}_k)\) is given by our Theorem 2.

Denoting the vectors of shares as

$$\begin{aligned} \hat{\tilde{s}} = \left[ \begin{array}{c} \hat{\tilde{s}}_1 \\ \hat{\tilde{s}}_2 \\ \vdots \\ \hat{\tilde{s}}_K \end{array} \right] ; \quad \quad \hat{s} = \left[ \begin{array}{c} \hat{s}_1 \\ \hat{s}_2 \\ \vdots \\ \hat{s}_K \end{array} \right] \end{aligned}$$

and writing

$$\begin{aligned} \frac{\partial {s}}{\partial \tilde{s}'} = \left[ \begin{array}{cccc} \frac{\partial {s_1}}{\partial \tilde{s}_1} &{} 0 &{} \ldots &{} 0 \\ 0 &{} \frac{\partial {s_2}}{\partial \tilde{s}_2} &{} \ldots &{} 0 \\ &{} &{} \ddots &{} \\ 0 &{} 0 &{} \ldots &{} \frac{\partial {s_K}}{\partial \tilde{s}_K} \end{array} \right] \equiv \text{ say } \; J \end{aligned}$$

we have:

$$\begin{aligned} Asy. V(\hat{s}) \cong \hat{J} [Asy. V(\hat{\tilde{s}})] \hat{J}' \equiv \text{ say } \; Q \end{aligned}$$
(13)

Therefore, the variance of the sum of shares, for instance \(\iota '\hat{s}\), can be derived as follows:

$$\begin{aligned} V(\iota '\hat{s}) \cong \iota ' Q \iota \end{aligned}$$

1.6 Appendix F: Data sources, variable names and definitions

The data are based on multiple rounds of the employment–unemployment survey along with the consumption expenditure survey undertaken by the NSSO every five years, covering major Indian States. We use three rounds corresponding to the years 1983 (38th round), 1993–1994 (50th round) and 2004–2005 (61st round). The detailed characteristics of all household members including sex, age, caste/religion, marital status, relation to the household head, education level, employment status, occupation, industry and the region are provided in the survey. The monthly per capita consumption expenditure that is used as a proxy variable for income is obtained for the same set of households from the consumer expenditure survey. The sample is restricted to the age group 15–64 years and the variables are defined below:

  • Income: The income variable is proxied by monthly per capita consumer expenditure.

  • Age: Age of the individual in logarithm.

  • Household size: Number of persons in the household.

  • Gender: Dummy variable, indicating female=1, 0 otherwise.

  • Land ownership: Per capita land possession obtained as land possession owned by the household divided by the number of persons in the household.

  • Social group: Social group consists of scheduled tribes, scheduled castes and others (other backward caste and forward caste). Two dummy variables for scheduled tribes and scheduled castes are constructed with “Others” as the reference category.

  • Religion: Religion comprises of Hindus, Muslims, Christians and Others. We have constructed three dummy variables for Hindus, Muslims and Christians separately, with “Others” as the reference category.

  • Education: We classify education into five categories: illiterate, primary, middle, secondary and above secondary. We generate four dummy variables for illiterate, primary, middle and secondary and the reference category is “above secondary”.

  • Employment status: The employment status categories that we consider are self-employment, casual worker, salaried and unemployed. The self-employment comprises own account workers, employers and unpaid family workers; salaried workers comprises regular salaried and waged employee and the casual workers comprises casual labour in public works or other type of works. We create three dummy variables with “unemployed” as the reference category.

  • Industry: We aggregate the industries classified under National Industrial Classification to six industry groups with similar qualitative characteristics: agriculture (comprises agriculture, forestry and fishing); manufacturing (comprises mining and manufacturing); electricity, gas and water; construction; low-skilled services sector (comprises trade, hotels and restaurant, transport and personal services) and high-skilled services sector (comprises banking and insurance, communication, real estate, business services and public administration). The categorization of the service sector into two groups is justified on the basis of skill and capital requirements. “agriculture” is used as reference category and we constructed five dummy variables for each of the other industry groups.

  • State dummies: We have generated state dummies for 15 major States in India and the remaining states are used as the reference category. The 15 major states for which we have generated dummies are Andhra Pradesh, Assam, Bihar, Gujarat, Haryana, Karnataka, Kerala, Madhya Pradesh, Maharashtra, Orissa, Punjab, Rajasthan, Tamil Nadu, Uttar Pradesh and West Bengal.

1.7 Appendix G: Shares of inequality and their asymptotic variances for Atkinson’s index

Table 4 Atkinson’s index: disaggregate factor shares, their asymptotic standard errors and contribution to the change of inequality

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bigotta, M., Krishnakumar, J. & Rani, U. Further results on the regression-based approach to inequality decomposition with evidence from India. Empir Econ 48, 1233–1266 (2015). https://doi.org/10.1007/s00181-014-0819-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-014-0819-5

Keywords

JEL Classification

Navigation