1 Introduction

The ratio of quantiles of a distribution is an important measure of distributional features that finds application in several fields, most notably for the study of economic inequalities. Two popular measures are the ratio of the 80th and 20th income percentiles and the ratio of the 90th and 40th income percentiles. See also Cowell (2011), Chancel and Piketty (2021) and Chancel et al. (2022) for other quantile ratios of interest. The quartile ratio is found in biomedical applications (Feinberg et al. 2015), and the decile ratios in cost-effectiveness analysis (Kumar et al. 2018). Other applications of quantile ratios range from environmental (Rustomji et al. 2009) to ceramic materials studies (Das and Maiti 1999). To preempt any misunderstanding, we remark that this article does not deal with quantile share ratios (Beach and Davidson 1983; Langel and Tillé 2011). One such a ratio is the Palma ratio, that is, the ratio of the total income in the hands of the top 10% earners to that of the lowest 40%. As compared to quantile share ratios, quantile ratios are clearly more robust to outliers and, in our opinion, more readily interpretable.

Despite the widespread interest, quantile ratios have been used mostly as descriptive statistics. Few works that deal explicitly with inferential methods for quantile ratios are confined to the unconditional (as opposed to conditional) case. Prendergast and Staudte (2017) developed distribution-free asymptotic confidence intervals for unconditional ratios. In a related work, Prendergast and Staudte (2018) built upon (unconditional) quantile ratios to derive an inequality index that is akin to the Gini coefficient. A general and elegant non-parametric method to study wage inequality by means of an Oaxaca-Blinder-type decomposition is proposed in Firpo et al. (2018). On the other hand, a distribution-free method to make inference on conditional quantile ratios seems to be lacking, despite it being potentially useful in many applications (particularly for the study of economic inequalities); it is the objective of the present work to make such a proposal.

In general, the use of nonparametric bootstrap is quite common in the literature about inequality indices (ratios of income shares, Gini and Theil coefficients, and so on) (Biewen 2002; Buhmann et al. 1988). As far as conditional quantile ratios go, a poor man’s solution for parametric estimation would be to fit a quantile regression model on the log-transformed outcome for the two quantile levels of interest and then take their difference (see Sect. 2.1 for more details). The calculation of standard errors would easily follow from well-established results for regression quantiles (Koenker and Bassett 1978a). However, this strategy would bring little flexibility to model’s specification and, perhaps more importantly, it would always result in a loss of efficiency. The model would in general be misspecified and possibly therefore would not provide a good fit (Feng et al. 2023). Quantile crossing might also arise.

Another possible approach to the modeling of inequality would be a fully parametric one, say via (Generalized) Linear Models (GLMs). In one specification, given a distribution from the exponential family (e.g., gamma or normal), the expectation of the outcome is linked to the linear predictor. Conditional quantiles are subsequently estimated from the fitted GLM and their ratios are regressed against the same linear predictor upon a suitable transformation. In this modeling strategy, which we call GLM-based, bias and loss of efficiency may result from an incorrect distributional assumption. In a different specification, when some form of clustering is present (e.g., spatial), a naïve model would regress sample quantiles on aggregated covariates, potentially resulting in ecological bias. Related to the modeling of inequality via mean regression, see Olkin and Yitzhaki (1992).

In the present paper, we propose a distribution-free model for conditional quantile ratios and develop inferential methods in a parametric setting. Interpretation of the parameters is simple, while formal hypothesis testing is carried out with classical Wald-type tests. Our work is connected to inference for L-statistics, defined as linear combinations of order statistics (e.g., see Koenker 2005). Indeed, robust measures of location, quantile differences for assessing variability (e.g., interquartile range) (Goldman and Kaplan 2018) as well as other quantile-based measures of shape (Groeneveld and Meeden 1984; Groeneveld 1998; Jones et al. 2011) can ideally be linked to quantile ratios upon a logarithmic transformation. We also mention the studies of Dominici et al. (2005) and Cheng and Wu (2010) as unrelated to ours, since they concern the ratio of the same quantile in two different populations. Naturally, quantile regression is a very active area of research in general. We refer the interested reader to some recent contributions (Sherwood and Li 2022; Farcomeni et al. 2022; Merlo et al. 2022; Redivo et al. 2023) and reviews (Koenker 2017; Waldmann 2018).

The present work is motivated by an original application to economic (income) inequality in Europe. The analysis is based on about two million households sampled in several cross-sectional waves of the European Union Survey on Income and Living Conditions (EU-SILC) database. We looked at inequality in 19 countries and how inequality has changed over a time period spanning the years before and after the Global Financial Crisis. We identified outlying countries that experienced more economic inequality than the rest of the European Union. We also investigated the role of macroeconomic indicators in explaining differences between countries. This is the first study to allow inferential statements about the (adjusted) association between these indicators and household-level income inequality.

In the following, we give a formal definition of the Quantile Ratio Regression (QRR) model (Sect. 2.1) and then propose an estimation procedure to fit QRR (Sect. 2.2), along with a scalable algorithm (Sect. 2.3); we derive standard errors for the parameter estimates (Sect. 2.4); we establish the theoretical properties of our methods (Sect. 3); we investigate the performance and behaviour of the proposed procedure via a simulation study (Sect. 4) and an analysis of the EU-SILC data (Sect. 5); and we conclude with final remarks (Sect. 6).

An R implementation of QRR estimation and related methods is freely available in the Qtools package (Geraci 2016). A brief (but exhaustive) tutorial is given in “Appendix”.

2 Quantile ratio regression

2.1 The model

Let Y denote an absolutely continuous and strictly positive random variable with conditional cumulative distribution function (CDF) \(F_{Y}(y|\varvec{x}) = \Pr (Y \le y \ | \varvec{x})\), where \(\varvec{x}\) is a p-dimensional vector of covariates. Let

$$\begin{aligned} H_{Y}(\tau |\varvec{x}) \equiv F_{Y}^{-1}(\tau |\varvec{x}) \end{aligned}$$
(1)

be the corresponding conditional quantile function. In classical quantile regression, the aim is to find a satisfactory model for H as a function of the covariates \(\varvec{x}\) for some \(0< \tau < 1\). Here our interest lies in the ratio of two conditional quantiles at some fixed levels \(0< \tau _2< \tau _1 < 1\), with the assumption that \(H_{Y}(\tau _1|\varvec{x})/H_{Y}(\tau _2|\varvec{x})>1\). We also assume that, for an appropriate known and monotone link function \(g:(1,\infty ) \rightarrow {\mathbb {R}}\), such a ratio is a linear function of the covariates and a parameter \(\varvec{\beta }\), namely

$$\begin{aligned} g\left\{ \frac{H_{Y}(\tau _1|\varvec{x})}{H_{Y}(\tau _2|\varvec{x})}\right\} = \varvec{x}^{\top }\varvec{\beta }. \end{aligned}$$
(2)

Though different choices for g are possible, our preference in this article is for \(g(u)=\log (u-1)\), which corresponds to an inverse logit transform since \(\pm \textrm{logit}(1/u)=\pm \log (1/u / (1-1/u)) = \pm \log (1/(u-1))\). Not only does this transformation ensure that there is no quantile crossing between \(\tau _2\) and \(\tau _1\), but it also allows one to interpret each element of the vector \(\varvec{\beta }\) as an adjusted per unit log fold increase in the quantile ratio. We stress that we do not make any parametric assumption on the distribution of Y and that the model assumptions are local, namely, they involve only the specific quantile ratio but leave unspecified the rest of the conditional CDF.

We note that model (2) entails the following:

$$\begin{aligned} \begin{aligned} H_{Y}(\tau _{1}|\varvec{x})&= H_{Y}(\tau _{2}|\varvec{x})\cdot g^{-1}(\varvec{x}^{\top }\varvec{\beta }),\\ H_{Y}(\tau _{2}|\varvec{x})&= H_{Y}(\tau _{1}|\varvec{x})\cdot \{g^{-1}(\varvec{x}^{\top }\varvec{\beta })\}^{-1}. \end{aligned} \end{aligned}$$
(3)

More specifically, if \(g(u)=\log (u-1)\), then

$$\begin{aligned} H_{Y}(\tau _{1}|\varvec{x}) = H_{Y}(\tau _{2}|\varvec{x})\cdot \{1 + \exp (\varvec{x}^{\top }\varvec{\beta })\} \end{aligned}$$

and

$$\begin{aligned} H_{Y}(\tau _{2}|\varvec{x}) = H_{Y}(\tau _{1}|\varvec{x})\cdot \{1 + \exp (\varvec{x}^{\top }\varvec{\beta })\}^{-1} \end{aligned}$$

are two nonlinear conditional quantile equations (Koenker and Park 1996) that share the same parameter \(\varvec{\beta }\).

To appreciate how our proposal differs from a crude approach where the numerator and denominator are obtained from a common ‘global’ model for the conditional quantile function (1), it is perhaps most natural to consider an exponential model of the type \(H_{Y}(\tau |\varvec{x}) = \exp \{\varvec{x}^{\top }\varvec{\gamma }(\tau )\}\), in which case the transformation equivariance property of quantiles implies that

$$\begin{aligned} \log \left\{ \frac{H_{Y}(\tau _1|\varvec{x})}{H_{Y}(\tau _2|\varvec{x})}\right\} = H_{\log (Y)}(\tau _1|\varvec{x}) - H_{\log (Y)}(\tau _2|\varvec{x}) = \varvec{x}^{\top }\tilde{\varvec{\beta }}, \end{aligned}$$

where \(\tilde{\varvec{\beta }} = \varvec{\gamma }(\tau _{1})-\varvec{\gamma }(\tau _{2})\) is the parameter of interest. Hence, it would be extremely easy to estimate \(\tilde{\varvec{\beta }}\) by fitting the same model on \(\log {Y}\) for \(\tau _{1}\) and \(\tau _{2}\) using standard estimation routines for linear conditional quantiles. However, this model entails twice as many parameters as model (2). Standard errors of the final parameters would invariably be larger than those obtained with our proposal. A smaller issue with this crude approach is that it does not guarantee monotonicity of the quantiles. Should quantile crossing arise, it would need to be addressed using ad hoc solutions (e.g., He 1997; Chernozhukov et al. 2010) that may have limitations in terms of scope, and induce further bias.

2.2 Estimation

Let \(\varvec{y} = (y_1,\ldots ,y_n)^{\top }\) denote a vector of n independent observations, associated with an \(n \times p\) matrix of covariates \(\varvec{x}\) whose generic ith row vector is \(\varvec{x}_i\), \(i=1,\ldots ,n\). Also, define \(\xi _{j} \equiv H_{Y}(\tau _{j}|\varvec{x})\), \(j=1,2\), the true conditional quantile of Y and \(\hat{\xi }_{j}\) its corresponding prediction. The parameter \(\varvec{\beta }\), common to the two equations in (3), can be estimated by either \(H_{Z}(\tau _{1}|\varvec{x}) = g^{-1}\{\varvec{x}^{\top }\varvec{\gamma }(\tau _{1})\}\), where \(Z=Y/\hat{\xi }_{2}\), or \(H_{Z}(\tau _{2}|\varvec{x}) = \{g^{-1}(\varvec{x}^{\top }\varvec{\gamma }(\tau _{2}))\}^{-1}\), where \(Z=Y/\hat{\xi }_{1}\). Say we use the former. At first, the predictions \(\hat{\xi }_{2}\) can be obtained by means of an appropriate method for estimating the conditional quantile function (1) such as linear, nonparametric (Yu and Jones 1998; Racine and Li 2017), or transformation-based (Geraci and Jones 2015) regression. In our experience (see Sect. 4) a linear model provides a satisfactory initial value \(\hat{\xi }_{2}\) in the sense that the resulting estimate \(\hat{\varvec{\gamma }}(\tau _{1})\) from the first model is generally a stationary point for the second model (see further below). If a refinement is needed, i.e., if \(\hat{\varvec{\gamma }}(\tau _{1})\) is a not a solution for \(H_{Z}(\tau _{2}|\varvec{x})\), one could fit the latter with \(Z = Y/\hat{\xi }_{1}\) where \(\hat{\xi }_{1} = g^{-1}\{\varvec{x}^{\top }\hat{\varvec{\gamma }}(\tau _{1})\}\cdot \hat{\xi }_{2}\) is obtained from the first model. The new predictions \(\hat{\xi }_{2} = \{g^{-1}(\varvec{x}^{\top }\hat{\varvec{\gamma }}(\tau _{2}))\}^{-1}\cdot \hat{\xi }_{1}\) are subsequently fed into the first model and a new estimate \(\hat{\varvec{\gamma }}(\tau _{1})\) is obtained. A single refinement step is usually sufficient, when needed.

We now illustrate why the solution \(\hat{\varvec{\gamma }}(\tau _{1})\) is, in general, a stationary point for the fitting of \(H_{Z}(\tau _{2}|\varvec{x}) = \{g^{-1}(\varvec{x}^{\top }\varvec{\gamma }(\tau _{2}))\}^{-1}\). We consider a nonlinear g since not only is it more relevant to our discussion but also because the demonstration is trivial when g is linear. We first need to briefly expound the nonlinear estimation algorithm of Koenker and Park (1996), as adapted in Koenker (2005, p. 211) and implemented in the R function nlrq of the quantreg package (Koenker 2023). Consider the problem

$$\begin{aligned} \hat{\varvec{\gamma }}_{n}(\tau _{2}) = \arg \min _{\varvec{\gamma } \in \Gamma } \sum _{i=1}^{n} \rho _{\tau _{2}}(r_{i}(\varvec{\gamma })), \end{aligned}$$
(4)

where \(\Gamma \subset {\mathbb {R}}^p\) is compact, \(\rho _{\tau }(r) = r(\tau - I(r < 0))\), \(r_{i}(\varvec{\gamma }) = z_{i} - \{g^{-1}(\varvec{x}_{i}^{\top }\varvec{\gamma })\}^{-1}\) and \(z_{i} = y_{i}/\hat{\xi }_{i,1}\). The solution (4) is obtained by solving, with respect to \(\varvec{\delta }_{\gamma }\), a sequence of locally linearized problems of the type \(r(\varvec{\gamma }) =\varvec{j}(\varvec{\gamma })\varvec{\delta }_{\gamma } + \epsilon \), where \(\varvec{j}(\varvec{\gamma }) = \partial r(\varvec{\gamma })/\partial \varvec{\gamma }\). The expression for the step \(\varvec{\delta }_{\gamma }\) is given in Koenker (2005). Then, the algorithm updates \(\varvec{\gamma }\) with \(\varvec{\gamma } + \lambda \varvec{\delta }_{\gamma }\), where the step length \(\lambda \in [0,1]\) minimizes \(\sum _{i=1}^n \rho _{\tau _{2}}\left( r_i\left( \varvec{\gamma } + \lambda \varvec{\delta }_{\gamma }\right) \right) \). The algorithm proceeds until convergence (see Koenker and Park 1996, for the conditions and criteria of convergence).

Now, given the estimate \(\hat{\varvec{\gamma }}(\tau _{1})\) and the predictions \(\hat{\xi }_{i,1} = g^{-1}\{\varvec{x}_{i}^{\top }\hat{\varvec{\gamma }}(\tau _{1})\}\cdot \hat{\xi }_{i,2}\) obtained from \(H_{Z}(\tau _{1}|\varvec{x}) = g^{-1}\{\varvec{x}^{\top }\varvec{\gamma }(\tau _{1})\}\), at the very first iteration of the algorithm to solve the minimization problem in (4), we have

$$\begin{aligned}&\sum _{i=1}^n \rho _{\tau _{2}}\left( r_i\left( \hat{\varvec{\gamma }}(\tau _{1})+\lambda \varvec{\delta }_{\gamma }\right) \right) \nonumber \\&\quad = \sum _{i=1}^n \rho _{\tau _{2}}\left( y_{i}/\hat{\xi }_{i,1} - 1/g^{-1}\{\varvec{x}_{i}^{\top }\hat{\varvec{\gamma }}(\tau _{1}) + \lambda \varvec{x}_{i}^{\top }\varvec{\delta }_{\gamma }\}\right) \nonumber \\&\quad = \sum _{i=1}^n \rho _{\tau _{2}}\left( \dfrac{y_{i}/\hat{\xi }_{i,2}}{g^{-1}\{\varvec{x}_{i}^{\top }\hat{\varvec{\gamma }}(\tau _{1})\}} - \dfrac{1}{g^{-1}\{\varvec{x}_{i}^{\top }\hat{\varvec{\gamma }}(\tau _{1}) + \lambda \varvec{x}_{i}^{\top }\varvec{\delta }_{\gamma }\}} \right) . \end{aligned}$$
(5)

Note that \(H_{Y/\hat{\xi }_{2}}(\tau _{2}|\varvec{x}) = \xi _{2}/\hat{\xi }_{2}\), hence the empirical marginal quantile \(\tau _{2}\) of \(Y/\hat{\xi }_{2}\) is, in general, equal to 1. This implies that (5) is minimized only when \(\lambda = 0\) since

$$\begin{aligned}{} & {} \sum _{i=1}^n \rho _{\tau _{2}}\left( \dfrac{y_{i}/\hat{\xi }_{i,2}}{g^{-1}\{\varvec{x}_{i}^{\top }\hat{\varvec{\gamma }}(\tau _{1})\}} - \dfrac{1}{g^{-1}\{\varvec{x}_{i}^{\top }\hat{\varvec{\gamma }}(\tau _{1})\}} \right) \\{} & {} \quad \propto \sum _{i=1}^n \rho _{\tau _{2}}\left( y_{i}/\hat{\xi }_{i,2} - 1\right) , \end{aligned}$$

in which case the algorithm stops immediately and returns \(\hat{\varvec{\gamma }}(\tau _{2}) = \hat{\varvec{\gamma }}(\tau _{1})\). Of course, if the initial estimate \(\hat{\xi }_{2}\) does not yield \({\hat{H}}_{Y/\hat{\xi }_{2}}(\tau _{2}) = 1\), the estimation will need to be refined as discussed above.

As far as asymptotics are concerned, for fixed \(\xi _{j}\), \(j=1,2\), the estimator \(\hat{\varvec{\gamma }}_{n}(\tau _{j})\), \(j=2,1\), respectively, is consistent and asymptotically normal (Koenker 2005). With slight abuse of notation, let \(h=g^{-1}\) whenever h refers to an expression involving \(\tau = \tau _{1}\) and \(h = 1/g^{-1}\) when it involves \(\tau = \tau _{2}\). Thanks to the Bahadur representation

$$\begin{aligned} \sqrt{n}\left( \hat{\varvec{\gamma }}_n(\tau _j)-\varvec{\gamma }_0(\tau _j)\right)= & {} \varvec{D}_{1}^{-1} \frac{1}{\sqrt{n}} \sum _{i=1}^n {\dot{h}}_i \psi _{\tau _j}\left( r_i(\tau _j)\right) \nonumber \\{} & {} +o_p(1), \end{aligned}$$
(6)

where \(\varvec{D}_{1} = n^{-1} \sum f_i\left( \xi _{i,j}\right) {\dot{h}}_i {\dot{h}}_{i}^{\top }\), \(f_{i}\) is the density of \(Z_{i}\) conditional on \(\varvec{x}_{i}\) and \(\xi _{i,j} = H_{Z_i}(\tau _j|\varvec{x}_i)\), \({\dot{h}}_i=\partial h\left( \varvec{x}_{i}^{\top }\varvec{\gamma }\right) /\left. \partial \varvec{\gamma }\right| _{\varvec{\gamma }=\varvec{\gamma }_0}\), \(r_i(\tau _j)=y_i/\hat{\xi }_{i,j}-h\left( \varvec{x}_{i}^{\top }\varvec{\gamma }_0(\tau )\right) \) and \(\psi _{\tau }(r) = \tau - I(r < 0)\).

2.3 A scalable estimation strategy

Given our choice for g, the estimation procedure introduced in the previous section requires solving a nonlinear problem, whose computation may represent a bottleneck. In order to scale inference to millions of observations we can linearize the problem by applying the equivariance property of quantiles. This is convenient due to the existence of fast computation algorithms for linear problems such as Frisch–Newton-type algorithms (Koenker 2005) and algorithms based on smoothing (e.g., Bottai et al. 2015; Fernandes et al. 2021; He et al. 2023). In our real data example, which involves a sample of almost two million, we used the algorithm proposed by He et al. (2023).

For the linearly transformed estimation problem, we propose an estimation algorithm based on fitting the linear quantile model \(H_{g(Z)}(\tau _{1}|\varvec{x}) = \varvec{x}^{\top }\varvec{\gamma }(\tau _{1})\), where \(Z=Y/\hat{\xi }_{2}\) given the current value of \(\hat{\xi }_{2}\), to obtain an estimate \(\hat{\varvec{\gamma }}(\tau _{1})\) and predictions \(\hat{\xi }_{1} = g^{-1}\{\varvec{x}^{\top }\hat{\varvec{\gamma }}(\tau _{1})\}\cdot \hat{\xi }_{2}\). As with the estimation procedure in Sect. 2.2, one should check (and possibly, refine) the starting solution by fitting the linear quantile model \(H_{g(Z)}(1 - \tau _{2}|\varvec{x}) = \varvec{x}^{\top }\varvec{\gamma }(1 - \tau _{2})\), where \(Z=\hat{\xi }_{1}/Y\) given the current value of \(\hat{\xi }_{1}\), with estimates \(\hat{\varvec{\gamma }}(1-\tau _{2})\) and predictions \(\hat{\xi }_{2} = \{g^{-1}(\varvec{x}^{\top }\hat{\varvec{\gamma }}(1-\tau _{2}))\}^{-1}\cdot \hat{\xi }_{1}\). For the latter estimation we have used the property \(1/H_{Y}(\tau ) = H_{1/Y}(1-\tau )\) for a positive, continuous Y. Note that existence of the estimators is not guaranteed as it could happen that \(Z \le 1\). Luckily, population quantiles are invariant to censoring from below up to the quantile of interest. Therefore we can censor observations whenever \(Y/\hat{\xi }_{2} \le 1\) and \(\hat{\xi }_{1}/Y \le 1\). An implicit assumption here is that the (population) quantiles at \(\tau _{1}\) and \(\tau _{2}\) are sufficiently separated, which is the rule in empirical applications.

The advantage of this approximate estimation resides in the fact that, rather than (4), we now deal with

$$\begin{aligned} \hat{\varvec{\gamma }}_{n}(\tau ) = \arg \min _{\varvec{\gamma } \in {\mathbb {R}}^{p}} \sum _{i=1}^{n} \rho _{\tau }\{g(z_{i}) - \varvec{x}^{\top }\varvec{\gamma }\}, \end{aligned}$$
(7)

where \(z_{i} = y_{i}/\hat{\xi }_{i,2}\) for \(\tau = \tau _{1}\) or \(z_{i} = \hat{\xi }_{i,1}/y_{i}\) for \(\tau = 1-\tau _{2}\). Solving (7) can be accomplished by means of fast interior point algorithms. When this is computationally prohibitive, an alternative route is smoothing the loss function \(\rho _{\tau }\) to achieve differentiability and thus apply gradient-based methods. For example, He et al. (2023) use convolution smoothing, which, among other things, has the advantage of yielding a strongly convex function (as opposed to other smoothing techniques such as that proposed by, e.g., Horowitz 1998).

If common regularity conditions for the linear quantile model hold (e.g., see Koenker 2005, p. 120), then

$$\begin{aligned} \sqrt{n}\{\hat{\varvec{\gamma }}_{n}(\tau _{1}) - \varvec{\gamma }_{0}(\tau _{1})\} {\mathop {\rightarrow }\limits ^{d}} {\mathcal {N}}(0, \varvec{\Omega }_{1}), \end{aligned}$$
(8)

where \(\varvec{\Omega }_{1} = \tau _{1}(1-\tau _{1}) \varvec{D}_{1}^{-1}\varvec{D}_{0}\varvec{D}_{1}^{-1}\), \(\varvec{D}_{0} = \lim _{n \rightarrow \infty } n^{-1} \sum \varvec{x}_i \varvec{x}_i^{\top }\),

\(\varvec{D}_{1} = \lim _{n \rightarrow \infty } n^{-1} \sum f_i\left( \xi _{i,1}\right) \varvec{x}_i \varvec{x}_i^{\top }\), \(f_{i}\) is the density of \(Z_{i}\) conditional on \(\varvec{x}_{i}\) and \(\xi _{i,1} = H_{Z_i}(\tau _{1}|\varvec{x}_i)\), with \(Z_{i}=Y_{i}/\hat{\xi }_{i,2}\). Similarly, for \(\hat{\varvec{\gamma }}(1-\tau _{2})\) we have

$$\begin{aligned} \sqrt{n}\{\hat{\varvec{\gamma }}_{n}(1-\tau _{2}) - \varvec{\gamma }_{0}(1-\tau _{2})\} {\mathop {\rightarrow }\limits ^{d}} {\mathcal {N}}(0, \varvec{\Omega }_{2}), \end{aligned}$$
(9)

where \(\varvec{\Omega }_{2} = \tau _{2}(1-\tau _{2}) \varvec{D}_{1}^{-1}\varvec{D}_{0}\varvec{D}_{1}^{-1}\), \(\varvec{D}_{0} = \lim _{n \rightarrow \infty } n^{-1} \sum \varvec{x}_i \varvec{x}_i^{\top }\),

\(\varvec{D}_{1} = \lim _{n \rightarrow \infty } n^{-1} \sum f_i\left( \xi _{i,2}\right) \varvec{x}_i \varvec{x}_i^{\top }\), \(f_{i}\) is the density of \(Z_{i}\) conditional on \(\varvec{x}_{i}\) and \(\xi _{i,2} = H_{Z_i}(1-\tau _{2}|\varvec{x}_i)\), with \(Z_{i}=\hat{\xi }_{i,1}/Y_{i}\).

2.4 Standard errors

Standard error estimation can be carried out by means of an ordinary xy-pair bootstrap, a method we investigated in the simulation study of Sect. 4 and which proved to work well even at smaller sample sizes. Alternatives include smoother approaches, like the Bayesian bootstrap (e.g., Hahn 1997). However, bootstrap may not be feasible when the sample size is in the order of the hundreds of thousands or more. We derive a method for direct estimation of the standard errors that scales well and has a satisfactory performance, especially at larger sample sizes, as illustrated in Sect. 4.

Starting from the linearized version of our algorithm (Sect. 2.3), we smooth the objective functions through an appropriate kernel so as to make them differentiable and, ultimately, lay the groundwork for scalability. Here we show expressions based on a Gaussian kernel, though in principle other kernels can be used. Conditionally on \(\hat{\xi }_{2}\), we have that the gradient of the smoothed objective function is given by

$$\begin{aligned}{} & {} \dfrac{\partial }{\partial \varvec{\gamma }}\rho ^{(s)}_{\tau _{1}}(Z - \varvec{x}^{\top }\varvec{\gamma }) \nonumber \\{} & {} \quad =\varvec{x}^{\top }\Phi \left( \frac{\varvec{x}^{\top }\varvec{\gamma }(\tau _{1})-\log (Y/\hat{\xi }_{2}-1)}{b}\right) , \end{aligned}$$
(10)

where \(\Phi \) denotes the CDF of the standard normal and b is a smoothing parameter (or bandwidth). Analogously, the gradient of the smoothed objective function conditionally on \(\hat{\xi }_{1}\) is given by

$$\begin{aligned}{} & {} \dfrac{\partial }{\partial \varvec{\gamma }}\rho ^{(s)}_{1-\tau _{2}}(Z - \varvec{x}^{\top }\varvec{\gamma })\nonumber \\{} & {} \quad =\varvec{x}^{\top }\Phi \left( \frac{\varvec{x}^{\top }\varvec{\gamma }(1-\tau _{2})-\log (\hat{\xi }_{1}/Y-1)}{b}\right) . \end{aligned}$$
(11)

The fact that \(\hat{\varvec{\gamma }}\) is a stationary point for both equations allows us to approximate the smoothed gradient as the sum of (10) and (11). The information matrix is then obtained through numerical differentiation of this approximated p-dimensional gradient, which is rather quick. Standard errors are obtained as the square root of the diagonal elements of its inverse. As for the bandwidth, in our implementation we set \(b=((\log (n)+p)/n)^{0.4}\), which minimizes the Gaussian approximation error at each step according to He et al. (2023).

3 Theoretical properties

In this section we show consistency and asymptotic normality of the estimator. With the same abuse of notation as before, we let \(h=g^{-1}\) whenever h refers to an expression involving \(\tau = \tau _{1}\) and \(h = 1/g^{-1}\) when it involves \(\tau = \tau _{2}\). Also, let \(\varvec{X}\) denote the design matrix for all observations. For reasons of space we restrict our theoretical results to the nonlinear algorithm in Sect. 2.2, where the linear approximation with convolution smoothing in Sect. 2.3 can be accommodated with some slight changes.

Theorem 1

Assume the usual regularity conditions for cross-sectional quantile regression models, and that \(\varvec{X}^\top \varvec{X}/n\) is positive definite and converges to a positive definite matrix \(\varvec{C}\). Assume initial predictions are consistent. Then, \(\hat{\varvec{\gamma }}\) converges in probability to \(\varvec{\beta }\).

Proof

By definition,

$$\begin{aligned} \varvec{\beta }= & {} \lim _n \arg \min _{\varvec{\gamma }} \frac{\sum _{i=1}^n \rho _{\tau _2}\left( \frac{y_i}{\xi _{i,1}}-h(\varvec{x}_i,\varvec{\gamma })\right) }{n}\nonumber \\= & {} \lim _n \arg \min _{\varvec{\gamma }} \frac{\sum _{i=1}^n \rho _{\tau _1}\left( \frac{y_i}{\xi _{i,2}}-h(\varvec{x}_i,\varvec{\gamma })\right) }{n}. \end{aligned}$$
(12)

Similarly, for all n,

$$\begin{aligned} {\hat{\gamma }}= & {} \arg \min _{\varvec{\gamma }} \frac{\sum _{i=1}^n \rho _{\tau _2}\left( \frac{y_i}{\hat{\xi }_{i,1}}-h(\varvec{x}_i,\varvec{\gamma })\right) }{n}\nonumber \\= & {} \arg \min _{\varvec{\gamma }} \frac{\sum _{i=1}^n \rho _{\tau _1}\left( \frac{y_i}{\hat{\xi }_{i,2}}-h(\varvec{x}_i,\varvec{\gamma })\right) }{n}. \end{aligned}$$
(13)

To see consistency, note that the regularity conditions imply that \(n^{-1}\sum _{i=1}^n \rho _{\tau _2}\left( \frac{y_i}{\hat{\xi }_{i,1}}-h(\varvec{x}_i,\hat{\varvec{\gamma }})\right) \) uniformly converges in probability to a finite constant. The uniform convergence theorem then guarantees that there exists a vector \(\varvec{\eta }\) such that \(\hat{\varvec{\gamma }}\) converges in probability to \(\varvec{\eta }\).

Suppose now that \(\varvec{\eta }\ne \varvec{\beta }\). This would imply that \(\exists \ {\tilde{n}}\) such that for \(n>{\tilde{n}}\)

$$\begin{aligned} \sum _{i=1}^n \rho _{\tau _2}\left( \frac{y_i}{\hat{\xi }_{i,1}}-h(\varvec{x}_i,\varvec{\beta })\right) > \sum _{i=1}^n \rho _{\tau _2}\left( \frac{y_i}{\hat{\xi }_{i,1}}-h(\varvec{x}_i,\hat{\varvec{\gamma }})\right) , \end{aligned}$$

which is not possible due to consistency of initial predictions, (12) and (13). Therefore, \(\varvec{\eta }=\varvec{\beta }\), which completes the proof. \(\square \)

Theorem 2

Assume the usual regularity conditions for quantile regression models, and that \(\varvec{X}^\top \varvec{X}/n\) is positive definite and converges to a positive definite matrix \(\varvec{C}\). Assume initial predictions are consistent. Then, there exists a positive definite matrix \(\varvec{\Omega }\) such that \(\sqrt{n}(\hat{\varvec{\gamma }}-\varvec{\beta })\) converges in distribution to a zero-centered multivariate Gaussian random variable with covariance matrix \(\varvec{\Omega }\).

Proof

The statement follows from the fact that, for \(j,k=1,2\) and \(j \ne k\), \((\hat{\varvec{\gamma }}(\tau _{j}),\hat{\xi }_{i,j})\) are asymptotically jointly normal conditional on \(\hat{\xi }_{i,k}\), as argued in the previous section. Equality of \(\hat{\varvec{\gamma }}(\tau _1)\) and \(\hat{\varvec{\gamma }}(\tau _2)\) guarantees that \(\hat{\varvec{\gamma }}\) is asymptotically normal also unconditionally. \(\square \)

As a general comment, it can be seen that theoretical properties of quantile ratio regression, including the rate of convergence, are very similar to those of classical quantile regression.

4 Simulation study

We conducted a simulation study to investigate the finite-sample properties of our methods. We considered the conditional quantile ratio model (2) with two response distributions: Weibull and lognormal. The former has quantile function \(H_{Y}(\tau ) = \sigma \{-\log (1-\tau )\}^{1/\alpha }\) for \(Y \ge 0\), \(\sigma > 0\), \(\alpha > 0\), while the latter \(H_Y(\tau ) = \exp \left\{ \mu + \sigma \Phi ^{-1}(\tau )\right\} \) for \(Y \ge 0\), \(\mu \in {\mathbb {R}}\), \(\sigma > 0\). Each Weibull response \(Y_{i}\), \(i=1,\ldots ,n\), was generated conditionally on \(\eta _{i} = \beta _{0} + \beta _{1}x_{i} + \beta _{2}w_{i}\), with \(x_{i} \sim \textrm{Unif}(-0.5,0.5)\) and \(w_{i} \sim \textrm{Bin}(1, 0.4)\), by sampling a distribution with shape parameter \(\alpha _{i}\) satisfying

$$\begin{aligned} \dfrac{H_{Y_{i}}(\tau _{1})}{H_{Y_{i}}(\tau _{2})} = \left\{ \dfrac{\log (1-\tau _{1})}{\log (1-\tau _{2})}\right\} ^{1/\alpha _{i}} = 1 + \exp (\eta _{i}); \end{aligned}$$

while \(\sigma \) was set equal to 1. Analogously, the ith lognormal response was obtained with a value of \(\sigma _{i}\) so as to satisfy

$$\begin{aligned} \dfrac{H_{Y_{i}}(\tau _{1})}{H_{Y_{i}}(\tau _{2})} = \exp \left\{ \sigma _{i}\left( \Phi ^{-1}(\tau _{1})-\Phi ^{-1}(\tau _{2})\right) \right\} = 1 + \exp (\eta _{i}), \end{aligned}$$

with \(\mu \) set equal to 0. Hence, since \(\alpha _{i}\) and \(\sigma _{i}\) depend on \(\eta _{i}\), the location, scale and shape of either distribution are affected by the covariates \(\varvec{x}_{i}\). We used two sets of values for the regression coefficients, \(\varvec{\beta } = (0.5,0,0)^{\top }\) and \(\varvec{\beta } = (0.5,-0.5,0.5)^{\top }\), four quantile ratios, \(0.8\!:\!0.2\), \(0.9\!:\!0.1\), \(0.9\!:\!0.4\) and \(0.99\!:\!0.5\), and three sample sizes, \(n \in \{100,1000,10000\}\), for a total of 24 combinations for each distribution. Both the Weibull and lognormal distribution are right-skewed for any of the values \(\eta _{i}\) resulting from these settings. Each combination was replicated 500 times.

The estimation algorithms discussed in Sects. 2.2 and 2.3 were assessed in terms of bias and mean squared error (MSE) of \(\hat{\varvec{\beta }}\), while the performance of the approximate standard error estimation approach of Sect. 2.4 was evaluated by calculating the ratio between the average estimated standard errors and the standard deviation of the estimates (empirical standard errors), along with the observed coverage of the corresponding 95% confidence intervals. For comparison, we also estimated standard errors by means of ordinary bootstrap with 200 replications. Given the large number of results, we report detailed results obtained from the linearized algorithm in Sect. 2.3 for the quantile ratios \(0.8\!:\!0.2\) and \(0.9\!:\!0.4\) only, and briefly summarize the others.

Due to lack of direct competitors to our proposal, we considered two parametric alternatives as briefly introduced in Sect. 1, namely (i) a GLM-based estimation and (ii) a Naïve linear regression model applied to artificially grouped data. In the former case (i), the conditional quantiles \({\hat{H}}_{MLE}(\tau _{1})\) and \({\hat{H}}_{MLE}(\tau _{2})\) were obtained from the inversion of \({\hat{F}}_{MLE}\), a parametric distribution in the GLM family fitted by means of maximum likelihood. Then \(\log ({\hat{R}}_{MLE} - 1)\), where \({\hat{R}}_{MLE} = {\hat{H}}_{MLE}(\tau _{1})/{\hat{H}}_{MLE}(\tau _{2})\), was regressed on \(\varvec{x}\) to estimate \(\varvec{\beta }_{MLE}\), which can be considered a parametric estimate of \(\varvec{\beta }\). In our simulation study, we used gamma and normal GLMs. In the second case (ii), by partitioning the observations into 20 groups of size n/20, each with index set \({\mathcal {G}}_{g}\), \(g = 1,\ldots ,20\), the sample was aggregated so as to regress \(\log ({\hat{R}}_{g} - 1)\) on \(\bar{\varvec{x}}_{g}\), where \({\hat{R}}_{g} = {\hat{H}}_{g}(\tau _{1})/{\hat{H}}_{g}(\tau _{2})\) is the ratio of the sample quantiles for group g and \(\bar{\varvec{x}}_{g} = 20 \sum _{i \in {\mathcal {G}}_{g}}\varvec{x}_{i}/n\). An ecological regression of this sort is clearly bound to be unsatisfactory, but it is arguably a ‘quick and dirty’ expedient one could put together to obtain an indirect estimate of \(\varvec{\beta }\), say \(\varvec{\beta }_{g}\). Note anyway that our real data example in Sect. 5 does have a grouping structure.

As shown in Tables 1 and 2, the bias and MSE of the QRR estimator under the Weibull distribution were reasonably small at the smallest sample size and practically null at higher sample sizes in all considered scenarios. Approximate-to-empirical average standard error ratios were also close to 1 (Table 3), with observed confidence interval coverage close to the nominal 95% or slightly conservative at most for the 0.8:0.2 quantile ratio. Tables 45 and 6 confirm the good behaviour of QRR also under the lognormal distribution, with the exception of the 0.9:0.4 quantile ratio. For this scenario, the approximate standard errors were slightly below the empirical standard errors and, consequently, observed confidence interval coverage below the nominal level. This lesser performance at this quantile ratio is most likely due to the markedly lower density that the lognormal distribution has at its 90th percentile for a given \(\eta _{i}\), compared to the density of the Weibull at its 90th percentile for the same value of \(\eta _{i}\). On the other hand, the bootstrap standard errors were accurate estimates of the empirical standard errors, consistently under both distributions.

Table 1 Weibull—Bias for crude generalized linear model (gamma, GLMg, and normal, GLMn), naïve linear regression (LM), and quantile ratio regression (QRR)
Table 2 Weibull—Mean squared error for crude generalized linear model (gamma, GLMg, and normal, GLMn), naïve linear regression (LM), and quantile ratio regression (QRR)
Table 3 Weibull—Standard errors for quantile ratio regression (QRR)
Table 4 Lognormal—Bias for crude generalized linear model (gamma, GLMg, and normal, GLMn), naïve linear regression (LM), and quantile ratio regression (QRR)
Table 5 Lognormal—Mean squared error for crude generalized linear model (gamma, GLMg, and normal, GLMn), naïve linear regression (LM), and quantile ratio regression (QRR)
Table 6 Lognormal—standard errors for quantile ratio regression (QRR)

The results (not shown) for the quantile ratios \(0.9\!:\!0.1\) and \(0.99\!:\!0.5\) were in line with those for the other quantile ratios in terms of bias and MSE. However, the performance of the approximate estimation of the standard errors tended to deteriorate especially in the most extreme case \(0.99\!:\!0.5\).

As expected, neither the GLM-based estimator nor the naïve linear regression estimator performed well, as evidenced by the much larger MSE values in all settings. It is worth noting that the apparent optimality of GLM-based estimation in some of the instances of the scenario with \(\varvec{\beta } = (0.5,0,0)^{\top }\) is actually the result of parallel estimated quantiles \({\hat{H}}_{MLE}(\tau )\) that give constant ratios across the values of \(\varvec{x}\). This behavior persists in the other scenario, \(\varvec{\beta } = (0.5,-0.5,0.5)^{\top }\), resulting in an evident bias that does not give any sign to vanish with increasing sample sizes.

The nonlinear estimation procedure for QRR, as described in Sect. 2.2, yielded bias and MSE comparable to those reported above. As anticipated, the initial value \(\hat{\xi }_{2}\), which was obtained using a standard linear model, resulted in an estimate of \(\varvec{\gamma }(\tau _{1})\) that was also a stationary point for the estimation of \(\varvec{\gamma }(\tau _{2})\) in 90–100% of the cases across scenarios. In less than 2% of the cases, a single refinement step was not sufficient.

Table 7 Number of households (n), four sample quantiles of equivalized disposable income (EDI) and two quantile ratios by country for the EU-SILC data

In summary, QRR has little bias and MSE in all the scenarios and under all distributions considered in this simulation, whereas approximate standard error estimation as proposed in Sect. 2.4 performs best when the upper quantile of the ratio is not too high on the tail. In general, bootstrap standard errors are a valid alternative.

5 Data analysis

We describe in this section an original application of our methods to economic inequality in Europe. We pooled data from several waves of the cross-sectional component of the European Union Survey on Income and Living Conditions (EU-SILC), where the equivalized disposable income (EDI) was collected at household level. For this analysis, we extracted observations on EDI for the 14-year time period going from 2004 to 2017. We then linked these data to several country-time specific macroeconomic indicators obtained from the World Bank data repository, namely unemployment rate, trade (sum of imports and exports values) as percentage of GDP; and from the Fraser Institute data repository, namely standardized indexes of hiring and firing regulations, and government transfers and subsides. The latter two variables range between zero and ten, with larger values corresponding to less regulations for hiring and dismissal of workers, and less government subsides, respectively. For details see Gwartney and Lawson (2003). There were several other variables available from these data sources that we considered, but that we did not include in the reported analysis, as they were either irrelevant to inequality or too strongly associated with other variables more readily interpretable. Moreover, we had to exclude some EU countries as they did not have income data (Cyprus, Estonia, Latvia, Lithuania, Luxembourg, Malta, Slovenia) or had income data limited to only 1 year (Greece). Finally, out of the remaining 19 countries included in the analysis, only 5 had data for the entire time period, while 5 had data for 8 years or less, thus making the assessment of temporal trajectories somehow limited.

Table 8 Estimates, standard errors (SE) and p-values from the 0.8:0.2 quantiles ratio regressions for the EU-SILC data
Table 9 Estimates, standard errors (SE) and p-values from the 0.9:0.4 quantiles ratio regressions for the EU-SILC data

It is important to remark that the data set is a repeated cross-sectional survey, since each household has been sampled only once (in a specific year). There is no repeated sampling, which makes the assumption that data are independent reasonable.

Number of households and selected sample quantiles for the outcome EDI, along with sample quantile ratios 0.8:0.2 (quintile ratio) and 0.9:0.4 (which, for practical purposes, we refer to as Palma ratio) are shown for each country and overall in Table 7. After exclusions, the sample consisted of 1,876,367 households in total. Croatia, Poland, Serbia and Portugal were among the countries with stronger inequality according to either quantile ratio; while Denmark, Norway, Netherlands and Czechia those with lower inequality. The mean (standard deviation) of unemployment rate, trade, hiring and dismissal regulations, and transfers and subsides were, 8.1 (3.4), 97.1 (40.4), 4.1 (1.4), and 4.5 (1.1), respectively.

We ran a QRR analysis on the quintile and Palma ratios using Eq. (2) with link function \(g(u) = \log (u - 1)\). For each ratio, we fitted two models: Model 1, which included 18 dummy variables \(x_{j}\), \(j = 1,\ldots ,18\), for country (reference: Belgium) and a second degree polynomial for year t (centered at 2004); Model 2, which, in addition to the covariates of the first model, included the macroeconomic indicators introduced above, \(w_{j}\), \(j = 1, \ldots ,4\). Let \(\varvec{x}\) denote the vector collecting all the aforementioned variables. Then Model 2 can be written as

$$\begin{aligned}{} & {} \log \left\{ \frac{H_{Y}(\tau _1|\varvec{x})}{H_{Y}(\tau _2|\varvec{x})}-1\right\} \nonumber \\{} & {} \quad =\beta _{0} + \sum _{j=1}^{18} x_{j}\beta _{j} + \beta _{19}t+\beta _{20}t^{2} + w_{1}\beta _{21} \nonumber \\{} & {} \qquad + w_{2}\beta _{22} + w_{3}\beta _{23} + w_{4}\beta _{24}, \end{aligned}$$

while Model 1 can be specified as above with \(\beta _{k}=0\), \(k=21,\ldots ,24\). The quantity \(\exp (\beta _{k})\), \(k = 1, \ldots , 24\) thus represents the adjusted multiplicative effect of the kth covariate on the relative inequality \(H_{Y}(\tau _1|\varvec{x})/H_{Y}(\tau _2|\varvec{x}) - 1\), while \(\exp (\beta _{0})\) is the baseline relative inequality. Standard errors were estimated as detailed in Sect. 2.4. (We also computed bootstrap standard errors and there were no appreciable differences with approximate standard errors, probably due to the very large sample size.)

The results are reported in Tables 8 (quintile ratio) and 9 (Palma ratio). We will now make general statements about the results with no specific reference to either ratio, as the results are often similar. We will then point out and discuss differences.

There were clear differences between countries according to Model 1, with higher inequality in Croatia, Portugal, Romania, and Serbia, but lower inequality in Czechia, Netherlands, Slovak Republic and Nordic countries. Some differences persisted after adjusting for the macroeconomic indicators (e.g., Ireland), while other differences decreased in magnitude (e.g., Italy and Poland), or increased in magnitude (e.g., Denmark and Sweden). That is, differences in income inequality among countries are partly confounded by country-level fundamentals.

Unemployment is significantly associated with inequality, as one would expect. On the other hand, higher levels of trade seem to be associated with lower income inequality. Interestingly, higher levels of transfers and subsides by the government and stricter regulations for hiring and dismissal of employees are both independently associated with lower income inequality.

As for the time trend, we note that both models suggest that inequality initially declined and then regained ground. To determine, approximately, at what point in time this occurred, we can simply consider that

$$\begin{aligned} \dfrac{\partial }{\partial t}\frac{H_{Y}(\tau _1|\varvec{x})}{H_{Y}(\tau _2|\varvec{x})} = (\beta _{19} + 2\beta _{20}t)\exp (\varvec{x}^{\top }\varvec{\beta }). \end{aligned}$$

If we use the estimates from Model 1 in Table 8, the expression above is zero at \(t = 7.75\), that is, towards the end of 2011. So while income inequality as measured by these quantile ratios had been improving in the EU area as of 2004, the conditions somewhat worsened after 2011, which we speculate to be partly explained by the lagged effects of the financial and economic crisis that started in 2007–2008.

6 Final remarks

To our knowledge, this work represents the first formal proposal for Quantile Ratio Regression (QRR). The methods developed herein are distribution free and can be scaled up to analyse large collections of data, as those obtained from multi-country surveys, for example. We believe that QRR is sufficiently versatile to find application and generate interest in many research areas.

We have proposed two strategies for estimation. The first, presented in Sect. 2.2, starts from the nonlinear specification of the model and works well for small to moderately sized data sets. An alternative strategy based on linearization is presented in Sect. 2.3 and is suitable for very large data sets (with a sample size in the order of millions). Both strategies performed well in a simulation study.

In our motivating application the data were obtained from a series of cross-sectional surveys, and the assumption that observations are independent is tenable. A possible venue for future research is the extension of our methods to panel data, as are the extensions to censored and discrete outcomes (e.g., Geraci and Farcomeni 2022). Given the recent interest and developments in causal inference for conditional quantiles (e.g., see Koenker et al. 2018), we note that instrumental variables (Chernozhukov and Hansen 2005, 2006) seem to us a natural candidate for conducting causal analysis in QRR, although we believe that the identification of causal effects in this highly non-linear setting might not be elementary.

We conclude by circling back to our opening statements, saying that another open route for further work is devising a methodology for performing quantile share ratio regression, where the summary statistic is not a ratio of quantiles (as in this work), but a ratio of shares above specific quantiles.