## Abstract

We propose a new method to approximate income distribution dynamics at the micro level using only macro data on aggregate moments of the income distribution. Under the assumption that individual incomes follow a lognormal autoregressive process, we show that the evolution of the mean and standard deviation of log income across individuals provides sufficient information to bound the degree of mobility. We estimate mobility bounds for 46 countries, using time series data on aggregate moments of the income distribution available in the World Inequality Database and the World Bank’s PovcalNet database. This new data allows us to study the correlates of mobility, and to document churning in the top and bottom of the income distribution, in a much larger set of countries than was previously possible.

### Similar content being viewed by others

## Notes

See for example Basu (2013).

The different datasets we work with in this paper have different units of observation at the micro level, including individuals, households, and tax units. They also differ in whether they measure consumption or income. For terminological convenience we will refer to

*income*distribution dynamics at the*individual*level wherever it is possible to do so without confusion.By way of comparison, the largest existing study we are aware of covers 26 European economies, based on household panel data between 2005 and 2006 (Van Kerm and Alperin (2013)). In contrast, our method allows us to cover 47 countries, of which 28 are developing countries, and of which 36 are not included in Van Kerm and Alperin (2013). In contrast, measures of intergenerational mobility (often approximated by education mobility) are available for many more countries, see for example Hertz et al. (2007) and Narayan et al. (2018).

We are by no means the first to notice that anonymous and non-anonymous growth rates can in principle diverge widely in the presence of mobility. See Jenkins and Van Kerm (2006, 2016), Grimm (2007), Van Kerm (2009), and Bourguignon (2011) for discussions of the difference between anonymous growth incidence curves and their non-anonymous counterparts. We are however the first to provide estimates of the gap between the two for a large cross-section of countries.

A survey summarizes 18 leading specifications for individual income dynamics that have been proposed in the literature since 1978 (Meghir and Pistaferri (2011), Table 2). Our specification is most closely related to Holtz-Eakin et al. (1988) who model hourly earnings as an autoregressive process with individual effects.

See Lopez and Serven (2006) for cross-country evidence that the lognormal distribution matches well the reported data on quintile shares for a large compilation of household surveys across countries and over time. Battistin, Blundell and Lewbel (2009) focus on US microdata and show that the distribution of income is close to, but not exactly, lognormal, and that the distributions of permanent income and consumption are very close to lognormal. See also Cowell and Flachaire (2015), Sect. 6.3.1.2.

See for example Aitchison and Brown (1966). As discussed in more detail below, in one of our two datasets, we have time series data on the income share of the bottom \(p\) percent, denoted \({s}_{t}(p)\), for various values of \(p\), instead of data on the Gini coefficient. Again relying on the properties of the lognormal distribution, we can infer that \({\sigma }_{t}\left(p\right)={\Phi }^{-1}\left(p\right)-{\Phi }^{-1}\left({s}_{t}\left(p\right)\right)\), where \({\sigma }_{t}\left(p\right)\) is the time series of the standard deviation of log income implied by the time series of the income share of the bottom \(p\) percent.

There is a variety of measures of relative and absolute mobility in the literature. Since in our setting the joint distribution of log income in periods \(t\) and \(t-1\) is normal, \({\beta }_{t-1}\) together with the mean and variance of log income in the two periods fully characterizes the entire joint distribution of incomes in the two periods. This means that any measure of mobility can be calculated given \({\beta }_{t-1}\) and the means and variances of log incomes.

In the working paper version of this paper we also took advantage of the fact that the cross-sectional variance of log income also follows an AR(1) process with autoregressive coefficient \({\rho }^{2}\) to obtain a second estimate of \(\rho \) which we combined with the estimate based on the dynamics of the mean of log income in Eq. (3). While in principle appealing, this additional information did not yield significantly more precise estimates of \(\rho \) and in addition requires further assumptions on the dynamics of \({\sigma }_{\varepsilon t}^{2}\) that are difficult to justify. For these reasons, here we adopt the simpler approach of using only Eq. 3 to estimate \(\rho \) and we refer the reader to the working paper version for the alternative estimates based on both Eqs. 3, 4

The bias correction in Andrews (1993) requires i.i.d. normal innovations, as we have specified in Assumption A1.

Grant: 5-R01AG040213-10 and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (Grants: 1-R03HD091871-01, 1-R03HD100924-01). While the CNEF guided our search for long-running panels, we retrieved the microdata directly from each of the individual survey websites. We are grateful to an anonymous referee for drawing this dataset to our attention.

In the data we received from the Swiss panel survey the sampling weights are equal to zero for approximately 30 percent of the observations. The Survey of Labour and Income Dynamics (SLID) for Canada could only be accessed remotely and for a fee. Mexico stands out as the only country for which the survey data denotes a rotating panel where individual incomes are recorded for five subsequent quarters but not beyond that.

Note that in this context we do not face the familiar econometric difficulties that arise when estimating dynamic panel data models with unobserved individual effects that have spawned a large literature of econometric tools to address these difficulties (Holtz-Eakin, Newey and Rosen (1988), Arellano and Bond (1991) and Bun and Carree (2005) to name just a few). This is because we are not trying to separately identify the autoregressive coefficient \(\rho \) in the panel micro data. Instead we need only estimate the overall mobility parameter \({\beta }_{t-1}\) which is by definition the slope coefficient in a simple OLS regression of income on lagged income, as noted in the discussion of Proposition 1 in the main text.

For selected panels we do not use the maximum number of rounds available. In the case of Germany for example, where the survey started in 1984, we use data from 1992 onwards to avoid the structural break due to the fall of the Berlin wall. In the case of the United States we work with the annual data up until 1997. After 1997, the PSID switches to a biannual frequency. We considered working with a biannual version of the PSID from 1977 until the latest round available. Over this extended period, however, there is clear evidence of a structural break in the time series for the standard deviation of log income. Accommodating this trend break in a series of only 17 biannual observations led to estimates that were considerably less precise.

To assess the sensitivity of our macro estimates with respect to trimming choices, we considered a variety of different thresholds ranging between 0% (no trimming) and 1.5% of observations at which we trim the bottom tail. There is less trimming at the top end, which is why we focused this sensitivity analysis on the bottom-end of the income distribution (where there is greater need for trimming).

Our default is to use the bottom 90 percent share when available. We use data on bottom 99 percent for five countries (India, Japan, Mauritius, Singapore, and United Kingdom), for which data on bottom 90 percent is either limited or not available in the WID database.

Two countries in our sample, China and Indonesia, have separate surveys for rural and urban populations, resulting in a total of 30 household survey based time-series of aggregate moments from PovcalNet. Most of the countries we selected from the PovcalNet database have annual household surveys, but a few have regular surveys once every two or three years. We annualize our estimates of \(\rho \) and \({\beta }_{t-1}\) for these countries to make them comparable to those based on annual data. Expressions for the irregularly-spaced versions of our main results are detailed in Appendix A.

While the top incomes data in the WID are based on tabulated tax records, mean income is an estimate of income of all individuals including those who do not file tax returns, often based on national accounts measures of household income (see e.g. Atkinson et al. (2011) for details). The WID report data for income shares higher than the top 10% that we use here, and the very top income shares are based on fitting a Pareto distribution to the highest observed income groups. We use the top 10% share since it is least likely to reflect the Pareto imputation of the top tail of the income distribution, and therefore is more likely to be consistent with our lognormality assumption.

Three countries in our sample (France, Germany, and the United States) also have time series data prior to World War II. We include this data in our estimation sample and treat the pre-1939 data as one distinct period with a separate trend. For the post-World War II period, we allow the data to select a single structural break in the time trends.

The study by Van Kerm and Alperin (2013) stands out with a comparatively large cross-sectional coverage, providing estimates of mobility for up to 26 European countries using a rotating panel carried out by the EU between 2003 and 2007.

Note that the data in Tables 2 and 3 consist of 49 surveys covering 46 countries (counting the separate rural and urban surveys for China and Indonesia, as well as the additional WID data point for China, as separate observations). For the cross-country analysis, we (a) average together the rural and urban mobility estimates for China and Indonesia, and (b) include the estimates of mobility based on panel microdata for the nine countries discussed in Sect. 3 resulting in a total of 56 observations, covering 46 countries (as we have panel microdata for Korea and the Netherlands which are not included in our WID sample).

We are by no means the first to notice this distinction – see Jenkins and Van Kerm (2006, 2016), Grimm (2007), Van Kerm (2009), and Bourguignon (2011) for discussions of the difference between anonymous growth incidence curves and their non-anonymous counterparts. The novelty in this section of our paper is that we are able to compute estimates of the difference between anonymous and non-anonymous growth rates for a large sample of countries, using our estimates of mobility based only on aggregate moments of the income distribution.

This feature of our data generating process is nothing more than Galtonian “regression to the mean”, and is a property of any two correlated random variables.

This is because the positive semi-definiteness of the covariance matrix in Eq. (2) requires \({\sigma }_{t}^{2}\ge {\beta }_{t-1}^{2}{\sigma }_{t-1}^{2}\).

## References

Aaberge, R., Bjorklund, A., Jäntti, M., Palme, M., Pedersen, P., Smith, N., & Wennemo, T. (2002). Income inequality and income mobility in the scandinavian countries compared to the United States.

*Review of Income and Wealth**48*(4), 443–460.Aitchison, J., & Brown, J. A. C. (1966).

*The Lognormal Distribution*. Cambridge University Press.Andrews, D. (1993). Exactly median-unbiased estimation of first order autoregressive/unit root models.

*Econometrica**61*, 139–165.Antman, F., & McKenzie, D. (2007). Earnings mobility and measurement error: A pseudo-panel approach.

*Economic Development and Cultural Change**56*, 125–161.Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations.

*Review of Economic Studies**58*, 277–297.Armingeon, K., Wenger, V., Wiedemeier, F., Isler, C., Knopfel, L., Weisstanner, D., & Engler, S. (2018).

*Comparative Political Data Set 1960–2016.*Institute of Political Science, University of Berne.Atkinson, A., & Bourguignon, F. (1982). The Comparison of Multidimensional Distributions of Economic Status.

*Review of Economic Studies**49*, 183–201.Atkinson, A., & Piketty, T. (Eds.). (2007).

*Top incomes over the twentieth century: A contrast between continental European and English speaking countries.*Oxford University Press.Atkinson, A., & Piketty, T. (Eds.). (2010).

*Top incomes: A global perspective*. Oxford University Press.Atkinson, A., Piketty, T., & Saez, E. (2011). Top incomes in the long run of history.

*Journal of Economic Literature**49*, 3–71.Attanasio, Orazio, Erik Hurst, and Luigi Pistaferri (2015). “The evolution of income, consumption, and leisure inequality in the United States, 1980–2010”. In “Improving the Measurement of Consumer Expenditures”. University of Chicago Press.

Auten, G., & Gee, G. (2009). Income mobility in the United States: New evidence from income tax data.

*National Tax Journal**62*, 301–328.Auten, G., Gee, G., & Turner, N. (2013). Income inequality, mobility, and turnover at the top in the US, 1987–2010.

*American Economic Review: Papers & Proceedings**103*, 168–172.Ayala, L., & Sastre, M. (2008). The structure of income mobility: Empirical evidence from Five EU Countries.

*Empirical Economics**35*, 451–473.Banerjee, A., & Piketty, T. (2005). Top Indian incomes, 1922–2000.

*World Bank Economic Review**19*, 1–20.Basu, K. (2013). Shared prosperity and the mitigation of poverty”. World Bank Policy Research Department Working Paper No. 6700.

Battistin, E., Blundell, R., & Lewbel, A. (2009). Why is consumption more lognormal than income? Gibrat’s Law Revisited.

*Journal of Political Economy**117*(6), 1140–1154.Bourguignon, F. (2010). Non-anonymous growth incidence curves, income mobility, and social welfare dominance.

*Journal of Economic Inequality**9*, 605–627.Bourguignon, Francois and Hector Moreno (2015). “On the Construction of Synthetic Panels”. Manuscript, Paris School of Economics.

Bun, M., & Carree, M. (2005). Bias-Corrected Estimation in Dynamic Panel Data Models.

*Journal of Business and Economic Statistics**23*(2), 200–211.Burkhauser, R., Holtz-Eakin, D., & Rhody, S. (1997). Labour Earnings Mobility and Inequality in the United States and Germany During the Growth Years of the 1980s.

*International Economic Review**38*(4), 775–794.Burkhauser, R., & Poupore, J. (1997). A cross-national comparison of permanent inequality in the United States and Germany.

*Review of Economics and Statistics**79*, 10–17.Carr, M., Emily W. (2017). Recent trends in the variability of men’s earnings: Evidence from administrative and survey data”. Mimeo

Caselli, F., & Ventura, J. (2000). A representative consumer theory of distribution.

*American Economic Review**90*(4), 909–926.Chanda, A., & Unel, B. (2021). Do attitudes toward risk taking affect entrepreneurship? Evidence from second-generation Americans.

*Journal of Economic Growth**26*, 385–413.Chen, W.-H. (2009). Cross-national difference in income mobility: evidence from Canada, the United States, Great Britain, and Germany.

*The Review of Income and Wealth**55*(1), 75–100.Chetty, R., Hendren, N., Kline, P., & Saez, E. (2014). Where is the land of opportunity? The geography of intergenerational mobility in the United States.

*Quarterly Journal of Economics**129*, 1553–1623.Chetty, R., Grusky, D., Hell, M., Hendren, N., Manduca, R., & Narang, J. (2017). The fading American dream: Trends in absolute income mobility since 1940.

*Science**356*, 398–406.Collado, D. (1997). Estimating dynamic models from time series of independent cross-sections.

*Journal of Econometrics**82*, 37–62.Cook, R. D. (1979). Influential Observations in Linear Regression.

*Journal of the American Statistical Association American Statistical Association**74*(365), 169–174.Corak, M. (2013). Income inequality, equality of opportunity, and inter-generational mobility.

*Journal of Economic Perspectives**27*, 79–102.Cowell, F. & Emmanuel F. (2015). Statistical Methods for Distributional Analysis. In Francois Bourguignon and Anthony Atkinson, eds. The Handbook of Income Distribution. Elsevier.

Creedy, J. (1974). Income changes over the life cycle.

*Oxford Economic Papers**26*, 405–423.Creedy, J. (1993).

*Dynamics of Income Distribution*Basil Blackwell.Dang, H.-A., Lanjouw, P., Luoto, J., & McKenzie, D. (2014). Using repeated cross-sections to explore movements in and out of poverty.

*Journal of Development Economics**107*, 112–128.Deaton, A. (1985). Panel data from time series of cross-sections.

*Journal of Econometrics.,**30*, 109–126.Dyan, K., Douglas E., Daniel S. (2012). The evolution of household income volatility. The B.E. Journal of Economic Analysis and Policy. 12(2)

Falk, A., Becker, A., Dohmen, T., Enke, B., Huffman, D., & Sunde, U. (2018). Global evidence on economic preferences.

*Quarterly Journal of Economics**133*(4), 1645–1692.Feenstra, R.C., Robert I., Marcel P. T. (2015) The Next Generation of the Penn World Table American Economic Review, 105(10), 3150–3182, available for download at www.ggdc.net/pwt.

Forni, M., & Lippi, M. (1997).

*Aggregation and the Microfoundations of Dynamic Macroeconomics*. Oxford University Press.Forni, M., & Lippi, M. (1999). Aggregation of linear dynamic microeconomic models.

*Journal of Mathematical Economics**31*, 131–158.Gottschalk, P., & Spolaore, E. (2002). On the evaluation of economic mobility.

*Review of Economic Studies**69*(1), 191–208.Gottschalk, P., & Moffitt, R. (2009). The rising instability of U.S. earnings.

*Journal of Economic Perspectives**23*, 3–24.Granger, C. (1980). Long memory relationships and the aggregation of dynamic models.

*Journal of Econometrics**14*(2), 227–238.Grimm, M. (2007). Removing the anonymity axiom in assessing pro-poor growth.

*Journal of Economic Inequality**5*, 179–197.Hacker, J., Elisabeth J. (2008). The rising instability of American family incomes, 1969–2004: Evidence from the Panel Study of Income Dynamics. EPI Briefing Paper 213.

Hertz, T., Jayasundera, T., Piraino, P., Selcuk, S., Smith, N., & Verashchagina, A. (2007). The inheritance of educational inequality: International comparisons and fifty-year trends. The B.E.

*Journal of Economic Analysis & Policy**7*, 1–48.Holtz-Eakin, D., Newey, W., & Rosen, H. (1988). Estimating vector autoregressions with panel data.

*Econometrica**56*(6), 1371–1395.Inoue, A. (2008). Efficient estimation and inference in linear pseudo-panel data models.

*Journal of Econometrics**142*, 449–466.Jäntti, M., Stephen J. (2015). Income Mobility in Francois Bourguignon and Anthony Atkinson, eds. The Handbook of Income Distribution. Elsevier.

Jenkins, S., & Van Kerm, P. (2006). Trends in income inequality, pro-poor income growth, and income mobility.

*Oxford Economic Papers**58*, 531–548.Jenkins, S., & Van Kerm, P. (2016). Assessing individual income growth.

*Economica**83*, 679–703.Kopczuk, W., Saez, E., & Song, J. (2010). Earnings inequality and mobility in the United States: Evidence from social security data since 1937.

*Quarterly Journal of Economics**125*, 91–128.Krebs, T., Krishna, P., & Maloney, W. (2019). Income mobility, income risk, and Welfare.

*World Bank Economic Review**33*(2), 375–393.Lane, J.-E., McKay, D., & Newton, K. (1997).

*Political Data Handbook: OECD countries*. Oxford University Press.Leigh, A. (2007). How closely do top income shares track other measures of inequality?

*Economic Journal**117*, 619–633.Levy, S (2009). Can social programs reduce productivity and growth? A hypothesis for Mexico. In: Labor Markets and Economic Development, Ravi Kanbur and Jan Svejnar, eds. Routledge.

Lewbel, A. (1994). Aggregation and Simple Dynamics.

*American Economic Review**84*(4), 905–918.Long, J., & Ferrie, J. (2013). Intergenerational occupation mobility in great Britain and the United States Since 1850.

*American Economic Review**103*(4), 1109–1137.Lopez, H., Luis S. (2006). A Normal Relationship? Poverty, Growth, and Inequality. World Bank Policy Research Department Working Paper No. 3814.

Maasoumi, E., & Trede, M. (2001). Comparing income mobility in Germany and the United States using generalized entropy mobility measures.

*Review of Economics and Statistics**83*, 551–559.Moffitt, R. (1993). Identification and estimation of dynamic models with a time series of repeated cross-sections.

*Journal of Econometrics**59*, 99–123.Moffitt, R., & Zhang, S. (2018). The PSID and income volatility: Its record of seminal research and some new findings.

*The Annals of the American Academy**680*, 48–81.Narayan, A., Roy van der W., Alexandru C., Christoph L., Silvia R., Daniel M., Rakesh R., Stefan T. (2018). Fair progress? Economic mobility across generations around the world”. Equity and Development, Washington DC: World Bank.

Ravallion, M., & Chen, S. (2003). Measuring pro-poor growth.

*Economics Letters**78*, 93–99.Roine, J., & Waldenstrom, D. (2008). The evolution of top incomes in an egalitarian society: Sweden, 1903–2004.

*Journal of Public Economics**92*, 366–387.Roine, J., & Waldenstrom, D. (2011). Common trends and shocks to top incomes: A Structural Breaks Approach.

*Review of Economics and Statistics**93*(3), 832–846.Shin, D., & Solon, G. (2011). Trends in men’s earnings volatility: What does the Panel Study of Income Dynamics show?

*Journal of Public Economics**95*, 973–982.Sinha, Rishabh (2017). “Closer, But No Cigar: Intergenerational Mobility Across Caste Groups in India”. Manuscript, World Bank.

Kerm, V., & Philippe and Maria Noel Pi Alperin,. (2013). Inequality, growth, and mobility: the intertemporal distribution of income in European Countries 2003–2007.

*Economic Modelling**35*, 931–939.Kerm, V., & Philippe,. (2009). Income mobility profiles.

*Economics Letters**102*, 93–95.Verbeek, M., & Vella, F. (2005). Estimating dynamic models from repeated cross-sections.

*Journal of Econometrics**127*, 83–102.Zaffaroni, P. (2003). Contemporaneous aggregation of linear dynamic models in large economies.

*Journal of Econometrics**120*(1), 75–102.

## Acknowledgements

We are grateful to Aureo de Paula, Stephen Jenkins, David McKenzie, Bob Rijkers, and Luis Servén for helpful comments. An earlier version of this paper was circulated with the title “Approximating Income Distribution Dynamics Using Aggregate Data”. The views expressed here are the authors’ and do not reflect those of the World Bank, its Executive Directors, or the countries they represent.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Electronic supplementary material

Below is the link to the electronic supplementary material.

## Appendices

### Appendix

### 1.1 Appendix A: Proofs

In the main text we assumed for notational convenience that survey data are available in consecutive periods \(t\) and \(t-1\). In reality however, survey data often are available at irregular frequencies that differ over time and across countries, and in the empirical part of the paper we work with such irregularly-spaced data. In this appendix we provide proofs for the case of irregularly-spaced data, i.e. for two surveys available in periods \(t\) and \(t-k\). The propositions as stated in the main text obtain for the special case of \(k=1\).

### 1.2 Preliminaries

Iterating the data generating process in Eq. 1 backwards for \(k\) periods results in:

where \({\tilde{\delta }}_{t}\equiv \sum_{s=0}^{k-1}{\rho }^{s}{\delta }_{t-s}\) and \({\tilde{\varepsilon }}_{it}\equiv \sum_{s=0}^{k-1}{\rho }^{s}{\varepsilon }_{it-s}\). Evaluating this expression at \(t-k\), iterating back \(t-k\) periods to the initial period zero, and using the assumption on initial income in Assumption A1, Eq. (9) becomes:

### 1.3 Proof of Proposition 1

Given Eq. 10 which states that \({y}_{it-k}\) is a linear combination of \({\varepsilon }_{i0}, \dots ,{\varepsilon }_{it-k}\) and \({\lambda }_{i}\), and Assumption A1 that these shocks are jointly normally distributed, it follows that \({y}_{it-k}\) is normally distributed for all \(t-k\ge 0\). Equation 10 also implies that \(COV\left[{y}_{it-k},{\lambda }_{i}\right]=\frac{{\sigma }_{\lambda }^{2}}{1-\rho }\). To complete the proof of Proposition 1 we need to find \(COV\left[{y}_{it},{y}_{it-k}\right]\). Using Eq. 9 we have:

Setting \(k=1\) and adopting the more compact notational convention that \({\beta }_{t-\mathrm{1,1}}={\beta }_{t-1}\) gives \({\beta }_{t-1}=\rho +\frac{{\sigma }_{\lambda }^{2}}{1-\rho }\frac{1}{{\sigma }_{t-1}^{2}}\) as in the main text.

### 1.4 Proof of Proposition 2

Taking unconditional expectations of both sides of Eq. 9 gives the following irregularly-spaced analog of Eq. 3.

Taking unconditional variances of both sides of Eq. 9 gives the following irregularly-spaced analog of Eq. 4:

where \({\tilde{\sigma }}_{\varepsilon t}^{2}\equiv \sum_{s=0}^{k-1}{\rho }^{s}{\sigma }_{\varepsilon t-s}^{2}\). Setting \(k=1\) retrieves the result in the main text.

#### 1.4.1 Proof of Proposition 3

To prove Proposition 3 we first show that the anonymous growth rate of group average incomes is a weighted average of the anonymous growth incidence curves within that group:

where \({Y}_{t}\left(p,q\right)\) is mean income between the \({p}^{th}\) and \({q}^{th}\) percentile of the income distribution for \(p\le q\), and \({w}_{t-k}(s)\equiv \frac{1}{q-p}\frac{{Y}_{t-k}\left(s\right)}{{Y}_{t-k}\left(p,q\right)}\) is the share of percentile \(s\) in the total income of the group at time \(t-k\). Similarly, the non-anonymous growth rate of group average incomes is the same weighted average of the non-anonymous growth incidence curves:

The expression for the difference between the anonymous and non-anonymous group average growth rates in the main text follows from subtracting Eqs. 14 and (15).

To complete the proof we need to evaluate and sign the integral

To evaluate this remaining integral, recall that \({Y}_{t-k}\left(p,q\right)\equiv \frac{1}{q-p}\underset{p}{\overset{q}{\int }}{e}^{{y}_{t-k}\left(s\right)}ds=\frac{1}{q-p}\underset{p}{\overset{q}{\int }}{e}^{{\mu }_{t-k}+{\sigma }_{t-k}{\Phi }^{-1}\left(s\right)}ds\). Differentiating \({Y}_{t-k}\left(p,q\right)\) with respect to \({\sigma }_{t-k}\) yields:

This means that we can find the integral we need simply by differentiating group average income with respect to \({\sigma }_{t-k}\). Combining Eqs. 16 and 17 and using the property of the truncated lognormal distribution that \({Y}_{t-k}\left(p,q\right)={e}^{{\mu }_{t-k}+\frac{{\sigma }_{t-k}^{2}}{2}}\left(\Phi \left({\Phi }^{-1}\left(q\right)-{\sigma }_{t-k}\right)-\Phi \left({\Phi }^{-1}\left(p\right)-{\sigma }_{t-k}\right)\right)/(q-p)\), we obtain:

For the particular case of average incomes in the bottom \(q\) percent this expression simplifies to

while for the particular case of average incomes in the top \(p\) percent this expression simplifies to

Finally, note that \({g}_{t}^{i}\left(p,q\right)\), \(i=A,NA\), can also be represented as: \({g}_{t}^{i}\left(p,q\right)={g}_{t}^{i}\left(z(p,q,{\sigma }_{t-1})\right)\), where \(z(p,q,\sigma )\) satisfies: \(p\le z\left(p,q,\sigma \right)\le q\). This follows directly from the fact that \({g}_{t}^{i}\left(z\right)\) is a monotonic function of *z*.

### Appendix B: Details of OLS, bias-corrected, and MSE-minimizing estimates of \({\varvec{\rho}}\)

This appendix describes our approach to estimating the autoregressive coefficient for log individual incomes using aggregate data. Let \({\widehat{\rho }}\) denote the OLS estimator of \(\rho \) based on Eq. 3. Given that the available time series is short for many of the countries in our sample, this estimator will exhibit downwards finite-sample bias. We therefore also generate a corresponding bias-corrected estimator \({\widehat{\rho }}_{BC}\) using the procedure suggested in Andrews (1993). At the core of this procedure is the fact that the distribution of the OLS estimator is exclusively a function of the true autoregressive parameter and the sample size, and thus is independent of the parameters that describe the distribution of the error term and the time trend. We refer the interested reader to Andrews (1993) for a proof. We take advantage of this result, as suggested by Andrews (1993), by computing the median bias of the OLS estimator as a function of \(\rho \) for each sample size separately (using numerical simulations), and then inverting this function to obtain a median-unbiased estimator for \(\rho \).

Comparing the OLS and bias-corrected estimators highlights a tradeoff between bias and variance: while the OLS estimator is substantially downward biased, the bias-corrected estimator is much less precisely estimated. We address this tradeoff by taking a weighted average of the OLS and bias-corrected estimators with weight \(\omega \) on the OLS estimator:

The MSE of a weighted average of the two estimators is:

Setting the derivative of this expression with respect to \(\omega \) equal to zero, and using the fact that \(Bias\left[{\widehat{\rho }}_{BC}\right]=0\) results in this expression for the MSE-minimizing weight:

We approximate the bias of the OLS estimator as \(Bias\left[{\widehat{\rho }}_{OLS}\right]\approx {\widehat{\rho }}_{OLS}-{\widehat{\rho }}_{BC}\). Since the bias-corrected estimator is a function of the OLS estimator, we can linearize to approximate \(V\left[{\widehat{\rho }}_{BC}\right]\approx {\nabla }^{2}V\left[{\widehat{\rho }}_{OLS}\right]\) and \(COV\left[{\widehat{\rho }}_{OLS},{\widehat{\rho }}_{BC}\right]\approx \nabla V\left[{\widehat{\rho }}_{OLS}\right]\), where \(\nabla \) is the gradient of the bias corrected estimator as a function of the OLS estimator evaluated at the OLS estimate, that we compute numerically. Inserting these into Eq. (23) results in this MSE-minimizing weight:

See Tables

7 and

## Rights and permissions

## About this article

### Cite this article

Kraay, A., Van der Weide, R. Measuring intragenerational mobility using aggregate data.
*J Econ Growth* **27**, 273–314 (2022). https://doi.org/10.1007/s10887-021-09200-2

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10887-021-09200-2