Skip to main content
Log in

RIF regression via sensitivity curves

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

A Correction to this article was published on 13 July 2022

This article has been updated

Abstract

This paper proposes an empirical method to implement the recentered influence function (RIF) regression of Firpo et al. (Econometrica 77(3):953–973, 2009), a relevant method to study the effect of covariates on many statistics beyond the mean. In empirically relevant situations where the influence function is not available or difficult to compute, we suggest to use the sensitivity curve (as reported by Tukey in Exploratory Data Analysis. Addison-Wesley, Reading, MA, 1977) as a feasible alternative. This may be computationally cumbersome when the sample size is large. The relevance of the proposed strategy derives from the fact that, under general conditions, the sensitivity curve converges in probability to the influence function. In order to save computational time we propose to use a cubic splines non-parametric method for a random subsample and then to interpolate to the rest of the cases where it was not computed. Monte Carlo simulations show good finite sample properties. We illustrate the proposed estimator with an application to the polarization index of Duclos et al. (Econometrica 72(6):1737–1772, 2004).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Change history

Notes

  1. Let \(G=2F\), then \(H_{(t,F,G)}=(1+t)F=cF\) and then by the invariance to scale

    $$\begin{aligned} \partial v\left( H_{t, F, G}\right) / \partial \left. t\right| _{t=0}=\lim _{t \downarrow 0} \frac{v(c F)-v(F)}{t}=\lim _{t \downarrow 0} \frac{0}{t}=0. \end{aligned}$$

    Moreover,

    $$\begin{aligned} \partial ^{2} v\left( H_{t, F, \Delta _{y}}\right) / \partial \left. t^{2}\right| _{t=0}=0. \end{aligned}$$
  2. A computational example for STATA is available at https://tinyurl.com/3bn63z3x.

References

  • Cowell FA, Flachaire E (2015) Statistical methods for distributional analysis. In: Atkinson AB, Bourguignon F (eds) Handbook of income distribution. Elsevier, Amsterdam

    MATH  Google Scholar 

  • Duclos J-Y, Esteban J, Ray D (2004) Polarization: concepts, measurement, estimation. Econometrica 72(6):1737–1772

    Article  MathSciNet  MATH  Google Scholar 

  • Durrleman S, Simon R (1989) Flexible regression models with cubic splines. Stat Med 8(5):551–561

    Article  Google Scholar 

  • Essama-Nssah B, Lambert PJ (2015) Chapter 6: influence functions for policy impact analysis. In: Bishop JA, Salas R (eds) Inequality, mobility and segregation: essays in honor of jacques silber. Emerald Group Publishing Limited, Bigley, UK, pp 135–159

    Google Scholar 

  • Firpo SP, Fortin NM, Lemieux T (2009) Unconditional quantile regressions. Econometrica 77(3):953–973

    Article  MathSciNet  MATH  Google Scholar 

  • Firpo SP, Fortin NM, Lemieux T (2018) Decomposing wage distributions using recentered influence function regressions. Econometrics 6(3):41

    Google Scholar 

  • Fortin NM, Lemieux T, Firpo SP (2011) Decomposition methods in economics. In: Ashenfelter O, Card D (eds) Handbook of labor economics. Elsevier, Amsterdam

    Google Scholar 

  • Gasparini L, Horenstein M, Molina E, Olivieri S (2008) Income polarization in Latin America: patterns and links with institutions and conflict. Oxf Dev Stud 36:461–484

    Article  Google Scholar 

  • Hampel F (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69(346):383–393

    Article  MathSciNet  MATH  Google Scholar 

  • Harrell FE Jr (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York

    Book  MATH  Google Scholar 

  • Huber P, Ronchetti EM (2009) Robust statistics (2nd edition). Wiley

  • Jaeckel LA (1972) Estimating regression coefficients by minimizing the dispersion of the residuals. Ann Math Stat 43:1449–1458

    Article  MathSciNet  MATH  Google Scholar 

  • Lemieux T (2006) Increasing residual wage inequality: Composition effects, noisy data, or rising demand for skill? Am Econ Rev 96(3):461–498

    Article  Google Scholar 

  • Nasser M, Alam M (2006) Estimators of influence function. Commun Stat Theor Methods 35(1):21–32

    Article  MathSciNet  MATH  Google Scholar 

  • Newson RB (2012) Sensible parameters for univariate and multivariate splines. Stata J 12:479–504

    Article  Google Scholar 

  • Orsini N, Greenland S (2011) A procedure to tabulate and plot results after flexible modeling of a quantitative covariate. Stata J 11:1–29

    Article  Google Scholar 

  • Smith PL (1979) Splines as a useful and convenient statistical tool. Am Stat 33:57–62

    Google Scholar 

  • Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading, MA

    MATH  Google Scholar 

  • von Mises R (1947) On the asymptotic distribution of differentiable statistical functions. Ann Math Stat 18(3):309–348

    Article  MathSciNet  MATH  Google Scholar 

  • Wegman EJ, Wright IW (1983) Splines in statistics. J Am Stat Assoc 78:351–365

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Alejo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: By mistake the title included the peer review manuscript number.

Appendices

Appendix 1

1.1 Proof of Proposition 1

Using Eq. (1) with \(F_n\) and \(F_n^{(j)}\) for the case of \(t=1\):

$$\begin{aligned} v\left( F_{n}\right) =v\left( F_{n}^{(j)}\right) +\int \psi _{n}(y) d\left( F_{n}-F_{n}^{(j)}\right) (y)+r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) , \end{aligned}$$
(6)

for some \({\tilde{t}}\in [0,1]\). Note that \(\psi _{n}(y)=I F\left( y, v, F_{n}\right) {\mathop {\rightarrow }\limits ^{p}} \psi (y)\) by continuity of the probability limit.

Now note that \(n\left[ F_{n}-F_{n}^{(j)}\right] =1\left( y_{j}<y\right) +O_{p}(1)\) because

$$\begin{aligned} \begin{array}{l} {F_{n}(y)=\frac{1}{n} 1\left( y_{j} \le y\right) +\frac{n-1}{n} F_{n}^{(j)}(y)}, \\ {F_{n}(y)-F_{n}^{(j)}(y)=\frac{1}{n} 1\left( y_{j} \le y\right) +\frac{n-1}{n} F_{n}^{(j)}(y)-F_{n}^{(j)}(y)}, \\ {F_{n}(y)-F_{n}^{(j)}(y)=\frac{1}{n} 1\left( y_{j} \le y\right) -\frac{1}{n} F_{n}^{(j)}(y)}. \end{array} \end{aligned}$$

That is,

$$\begin{aligned} n\left[ F_{n}(y)-F_{n}^{(j)}\right] =1\left( y_{j} \le y\right) -a_{n}, \end{aligned}$$
(7)

with \(a_{n}=F_{n}^{(j)}(y) {\mathop {\rightarrow }\limits ^{p}} F(y)\) by the Law of Large Numbers.

Then,

$$\begin{aligned}&{n\cdot \left[ v\left( F_{n}\right) -v\left( F_{n}^{(j)}\right) \right] =\int \psi _{n}(y) d\left( 1\left( y_{j} \le y\right) -a_{n}\right) (y)+n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) } \nonumber \\&n\cdot \left[ v\left( F_{n}\right) -v\left( F_{n}^{(j)}\right) \right] =\int \psi _{n}(y) d\left( 1\left( y_{j} \le y\right) \right) (y)-\int \psi _{n}(y) d\left( a_{n}\right) (y)\nonumber \\&\quad +n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) \end{aligned}$$
(8)

Using the fact that \(1\left( y_{j} \le y\right)\) is the Dirac function, the first term of Eq. (8) is

$$\begin{aligned} \int \psi _{n}(y) d\left( 1\left( y_{j} \le y\right) \right) (y)=\psi _{n}\left( y_{j}\right) {\mathop {\rightarrow }\limits ^{p}} \psi \left( y_{j}\right) . \end{aligned}$$

Noting that \(a_{n} {\mathop {\rightarrow }\limits ^{p}} F(y)\) and \(\psi _{n}(y) {\mathop {\rightarrow }\limits ^{p}} \psi (y)\), by continuity of the probability limit, the second term of (8) becomes

$$\begin{aligned} {\text {plim}} \int \psi _{n}(y) d\left( a_{n}\right) (y)=\int \psi (y) d F(y)=0, \end{aligned}$$

because of property (i). Then,

$$\begin{aligned} {\text {plim}} \int \psi (y) d\left( 1\left( y_{j} \le y\right) +a_{n}\right) (y)=\int \psi (y) d\left( 1\left( y_{j} \le y\right) \right) (y)=\psi (y). \end{aligned}$$

It remains to study the third term in (8). From (3),

$$\begin{aligned} \begin{array}{c} {n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) =n \cdot \frac{{\tilde{t}}^{2}}{2} \iint \psi (y, z) d\left( F_{n}-F_{n}^{(j)}\right) (y) d\left( F_{n}-F_{n}^{(j)}\right) (z)} \\ {n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) =\frac{1}{n} \cdot \frac{{\tilde{t}}^{2}}{2} \iint \psi (y, z) d\left[ n\left( F_{n}-F_{n}^{(j)}\right) \right] (y) d\left[ n\left( F_{n}-F_{n}^{(j)}\right) \right] (z)} \end{array} \end{aligned}$$

for some \({\tilde{t}}\in [0,1]\). Then using (7) and property (ii),

$$\begin{aligned} {\text {plim}} \iint \phi (y, z) d\left[ n\left( F_{n}-F_{n}^{(j)}\right) \right] (y) d\left[ n\left( F_{n}-F_{n}^{(j)}\right) \right] (z)=\phi \left( y_{j}, y_{j}\right) . \end{aligned}$$

Then it follows that

$$\begin{aligned} {\text {plim}}\left\{ n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) \right\} =\left( {\text {plim}} \frac{1}{n}\right) \cdot \frac{{\tilde{t}}^{2}}{2} \phi \left( y_{j}, y_{j}\right) =0. \end{aligned}$$

Then, the result follows,

$$\begin{aligned} {\text {plim}} \frac{v\left( F_{n}\right) -v\left( F_{n}^{(j)}\right) }{1 / n}=\psi \left( y_{j}\right) =I F\left( y_{j}, v, F\right) . \ QED \end{aligned}$$

Appendix 2

1.1 Polarization index

We motivate the case of a model where the IF is not available: the DER polarization index (Duclos et al. 2004).

Polarization is an important welfare concept in economics and political science. Intuitively, it measures the tension between individuals in a society, that depends positively on how distant individuals are between groups (alienation) and how close they are within a group (identification). From this perspective, a standard measure of inequality like the Gini index focuses on just the first component. Duclos et al. (2004) provide a full axiomatic framework that leads to a logically coherent measure of polarization. For a detailed empirical study on polarization for the case of Latin America and the Caribbean, see Gasparini et al. (2008).

Let \(y_1, y_2, \ldots , y_n\) be and iid sample of incomes, ordered from lowest to highest. Duclos et al. (2004) propose the following empirical measure of polarization:

$$\begin{aligned} P_\alpha = \frac{1}{n} \sum _{i=1}^n {{\hat{f}}}(y_i)^{\alpha } {{\hat{a}}} (y_{i}) \end{aligned}$$

where \({{\hat{a}}}(y_i) = {{\hat{\mu }}} + y_i\left( n^{-1} (2i-1) - 1\right) - n^{-1} \left( 2 \sum _{j=1}^{i-1} y_j + y_i \right)\), \({{\hat{\mu }}}\) is the sample mean and \({{\hat{f}}} (y_i)\) is an estimate of the density of incomes. The parameter \(\alpha\) is set exogenously and plays a key role in characterizing polarization. As a matter of fact, when \(\alpha =0\) polarization reduces to the Gini index (note that for this particular case the IF is available). Larger values of \(\alpha\) result in the index giving relatively more importance to identification, that is, to how close individuals are ‘surrounded’ by others of similar income. The axiomatic approach of Duclos et al. (2004) imposes lower and upper bounds to the values \(\alpha\) may take in practice.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alejo, J., Montes-Rojas, G. & Sosa-Escudero, W. RIF regression via sensitivity curves. Stat Methods Appl 32, 329–345 (2023). https://doi.org/10.1007/s10260-022-00649-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-022-00649-y

Keywords

JEL Classification

Navigation