# RIF regression via sensitivity curves

• Original Paper
• Published:

## Abstract

This paper proposes an empirical method to implement the recentered influence function (RIF) regression of Firpo et al. (Econometrica 77(3):953–973, 2009), a relevant method to study the effect of covariates on many statistics beyond the mean. In empirically relevant situations where the influence function is not available or difficult to compute, we suggest to use the sensitivity curve (as reported by Tukey in Exploratory Data Analysis. Addison-Wesley, Reading, MA, 1977) as a feasible alternative. This may be computationally cumbersome when the sample size is large. The relevance of the proposed strategy derives from the fact that, under general conditions, the sensitivity curve converges in probability to the influence function. In order to save computational time we propose to use a cubic splines non-parametric method for a random subsample and then to interpolate to the rest of the cases where it was not computed. Monte Carlo simulations show good finite sample properties. We illustrate the proposed estimator with an application to the polarization index of Duclos et al. (Econometrica 72(6):1737–1772, 2004).

This is a preview of subscription content, log in via an institution to check access.

## Subscribe and save

Springer+ Basic
EUR 32.99 /Month
• Get 10 units per month
• 1 Unit = 1 Article or 1 Chapter
• Cancel anytime

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

## Notes

1. Let $$G=2F$$, then $$H_{(t,F,G)}=(1+t)F=cF$$ and then by the invariance to scale

\begin{aligned} \partial v\left( H_{t, F, G}\right) / \partial \left. t\right| _{t=0}=\lim _{t \downarrow 0} \frac{v(c F)-v(F)}{t}=\lim _{t \downarrow 0} \frac{0}{t}=0. \end{aligned}

Moreover,

\begin{aligned} \partial ^{2} v\left( H_{t, F, \Delta _{y}}\right) / \partial \left. t^{2}\right| _{t=0}=0. \end{aligned}
2. A computational example for STATA is available at https://tinyurl.com/3bn63z3x.

## References

• Cowell FA, Flachaire E (2015) Statistical methods for distributional analysis. In: Atkinson AB, Bourguignon F (eds) Handbook of income distribution. Elsevier, Amsterdam

• Duclos J-Y, Esteban J, Ray D (2004) Polarization: concepts, measurement, estimation. Econometrica 72(6):1737–1772

• Durrleman S, Simon R (1989) Flexible regression models with cubic splines. Stat Med 8(5):551–561

• Essama-Nssah B, Lambert PJ (2015) Chapter 6: influence functions for policy impact analysis. In: Bishop JA, Salas R (eds) Inequality, mobility and segregation: essays in honor of jacques silber. Emerald Group Publishing Limited, Bigley, UK, pp 135–159

• Firpo SP, Fortin NM, Lemieux T (2009) Unconditional quantile regressions. Econometrica 77(3):953–973

• Firpo SP, Fortin NM, Lemieux T (2018) Decomposing wage distributions using recentered influence function regressions. Econometrics 6(3):41

• Fortin NM, Lemieux T, Firpo SP (2011) Decomposition methods in economics. In: Ashenfelter O, Card D (eds) Handbook of labor economics. Elsevier, Amsterdam

• Gasparini L, Horenstein M, Molina E, Olivieri S (2008) Income polarization in Latin America: patterns and links with institutions and conflict. Oxf Dev Stud 36:461–484

• Hampel F (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69(346):383–393

• Harrell FE Jr (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York

• Huber P, Ronchetti EM (2009) Robust statistics (2nd edition). Wiley

• Jaeckel LA (1972) Estimating regression coefficients by minimizing the dispersion of the residuals. Ann Math Stat 43:1449–1458

• Lemieux T (2006) Increasing residual wage inequality: Composition effects, noisy data, or rising demand for skill? Am Econ Rev 96(3):461–498

• Nasser M, Alam M (2006) Estimators of influence function. Commun Stat Theor Methods 35(1):21–32

• Newson RB (2012) Sensible parameters for univariate and multivariate splines. Stata J 12:479–504

• Orsini N, Greenland S (2011) A procedure to tabulate and plot results after flexible modeling of a quantitative covariate. Stata J 11:1–29

• Smith PL (1979) Splines as a useful and convenient statistical tool. Am Stat 33:57–62

• von Mises R (1947) On the asymptotic distribution of differentiable statistical functions. Ann Math Stat 18(3):309–348

• Wegman EJ, Wright IW (1983) Splines in statistics. J Am Stat Assoc 78:351–365

## Author information

Authors

### Corresponding author

Correspondence to Javier Alejo.

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: By mistake the title included the peer review manuscript number.

## Appendices

### 1.1 Proof of Proposition 1

Using Eq. (1) with $$F_n$$ and $$F_n^{(j)}$$ for the case of $$t=1$$:

\begin{aligned} v\left( F_{n}\right) =v\left( F_{n}^{(j)}\right) +\int \psi _{n}(y) d\left( F_{n}-F_{n}^{(j)}\right) (y)+r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) , \end{aligned}
(6)

for some $${\tilde{t}}\in [0,1]$$. Note that $$\psi _{n}(y)=I F\left( y, v, F_{n}\right) {\mathop {\rightarrow }\limits ^{p}} \psi (y)$$ by continuity of the probability limit.

Now note that $$n\left[ F_{n}-F_{n}^{(j)}\right] =1\left( y_{j}<y\right) +O_{p}(1)$$ because

\begin{aligned} \begin{array}{l} {F_{n}(y)=\frac{1}{n} 1\left( y_{j} \le y\right) +\frac{n-1}{n} F_{n}^{(j)}(y)}, \\ {F_{n}(y)-F_{n}^{(j)}(y)=\frac{1}{n} 1\left( y_{j} \le y\right) +\frac{n-1}{n} F_{n}^{(j)}(y)-F_{n}^{(j)}(y)}, \\ {F_{n}(y)-F_{n}^{(j)}(y)=\frac{1}{n} 1\left( y_{j} \le y\right) -\frac{1}{n} F_{n}^{(j)}(y)}. \end{array} \end{aligned}

That is,

\begin{aligned} n\left[ F_{n}(y)-F_{n}^{(j)}\right] =1\left( y_{j} \le y\right) -a_{n}, \end{aligned}
(7)

with $$a_{n}=F_{n}^{(j)}(y) {\mathop {\rightarrow }\limits ^{p}} F(y)$$ by the Law of Large Numbers.

Then,

\begin{aligned}&{n\cdot \left[ v\left( F_{n}\right) -v\left( F_{n}^{(j)}\right) \right] =\int \psi _{n}(y) d\left( 1\left( y_{j} \le y\right) -a_{n}\right) (y)+n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) } \nonumber \\&n\cdot \left[ v\left( F_{n}\right) -v\left( F_{n}^{(j)}\right) \right] =\int \psi _{n}(y) d\left( 1\left( y_{j} \le y\right) \right) (y)-\int \psi _{n}(y) d\left( a_{n}\right) (y)\nonumber \\&\quad +n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) \end{aligned}
(8)

Using the fact that $$1\left( y_{j} \le y\right)$$ is the Dirac function, the first term of Eq. (8) is

\begin{aligned} \int \psi _{n}(y) d\left( 1\left( y_{j} \le y\right) \right) (y)=\psi _{n}\left( y_{j}\right) {\mathop {\rightarrow }\limits ^{p}} \psi \left( y_{j}\right) . \end{aligned}

Noting that $$a_{n} {\mathop {\rightarrow }\limits ^{p}} F(y)$$ and $$\psi _{n}(y) {\mathop {\rightarrow }\limits ^{p}} \psi (y)$$, by continuity of the probability limit, the second term of (8) becomes

\begin{aligned} {\text {plim}} \int \psi _{n}(y) d\left( a_{n}\right) (y)=\int \psi (y) d F(y)=0, \end{aligned}

because of property (i). Then,

\begin{aligned} {\text {plim}} \int \psi (y) d\left( 1\left( y_{j} \le y\right) +a_{n}\right) (y)=\int \psi (y) d\left( 1\left( y_{j} \le y\right) \right) (y)=\psi (y). \end{aligned}

It remains to study the third term in (8). From (3),

\begin{aligned} \begin{array}{c} {n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) =n \cdot \frac{{\tilde{t}}^{2}}{2} \iint \psi (y, z) d\left( F_{n}-F_{n}^{(j)}\right) (y) d\left( F_{n}-F_{n}^{(j)}\right) (z)} \\ {n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) =\frac{1}{n} \cdot \frac{{\tilde{t}}^{2}}{2} \iint \psi (y, z) d\left[ n\left( F_{n}-F_{n}^{(j)}\right) \right] (y) d\left[ n\left( F_{n}-F_{n}^{(j)}\right) \right] (z)} \end{array} \end{aligned}

for some $${\tilde{t}}\in [0,1]$$. Then using (7) and property (ii),

\begin{aligned} {\text {plim}} \iint \phi (y, z) d\left[ n\left( F_{n}-F_{n}^{(j)}\right) \right] (y) d\left[ n\left( F_{n}-F_{n}^{(j)}\right) \right] (z)=\phi \left( y_{j}, y_{j}\right) . \end{aligned}

Then it follows that

\begin{aligned} {\text {plim}}\left\{ n \cdot r\left( {\tilde{t}}, F_{n}, F_{n}^{(j)}\right) \right\} =\left( {\text {plim}} \frac{1}{n}\right) \cdot \frac{{\tilde{t}}^{2}}{2} \phi \left( y_{j}, y_{j}\right) =0. \end{aligned}

Then, the result follows,

\begin{aligned} {\text {plim}} \frac{v\left( F_{n}\right) -v\left( F_{n}^{(j)}\right) }{1 / n}=\psi \left( y_{j}\right) =I F\left( y_{j}, v, F\right) . \ QED \end{aligned}

### 1.1 Polarization index

We motivate the case of a model where the IF is not available: the DER polarization index (Duclos et al. 2004).

Polarization is an important welfare concept in economics and political science. Intuitively, it measures the tension between individuals in a society, that depends positively on how distant individuals are between groups (alienation) and how close they are within a group (identification). From this perspective, a standard measure of inequality like the Gini index focuses on just the first component. Duclos et al. (2004) provide a full axiomatic framework that leads to a logically coherent measure of polarization. For a detailed empirical study on polarization for the case of Latin America and the Caribbean, see Gasparini et al. (2008).

Let $$y_1, y_2, \ldots , y_n$$ be and iid sample of incomes, ordered from lowest to highest. Duclos et al. (2004) propose the following empirical measure of polarization:

\begin{aligned} P_\alpha = \frac{1}{n} \sum _{i=1}^n {{\hat{f}}}(y_i)^{\alpha } {{\hat{a}}} (y_{i}) \end{aligned}

where $${{\hat{a}}}(y_i) = {{\hat{\mu }}} + y_i\left( n^{-1} (2i-1) - 1\right) - n^{-1} \left( 2 \sum _{j=1}^{i-1} y_j + y_i \right)$$, $${{\hat{\mu }}}$$ is the sample mean and $${{\hat{f}}} (y_i)$$ is an estimate of the density of incomes. The parameter $$\alpha$$ is set exogenously and plays a key role in characterizing polarization. As a matter of fact, when $$\alpha =0$$ polarization reduces to the Gini index (note that for this particular case the IF is available). Larger values of $$\alpha$$ result in the index giving relatively more importance to identification, that is, to how close individuals are ‘surrounded’ by others of similar income. The axiomatic approach of Duclos et al. (2004) imposes lower and upper bounds to the values $$\alpha$$ may take in practice.

## Rights and permissions

Reprints and permissions

Alejo, J., Montes-Rojas, G. & Sosa-Escudero, W. RIF regression via sensitivity curves. Stat Methods Appl 32, 329–345 (2023). https://doi.org/10.1007/s10260-022-00649-y

• Accepted:

• Published:

• Issue Date:

• DOI: https://doi.org/10.1007/s10260-022-00649-y