1 Introduction

It is a known problem that household surveys do not give an accurate picture of the rich. To correct for the missing rich in surveys, top income methodologies are developed. These methods complement surveys with information from tax data at the top of the distribution. In all their forms these methods are found to have a significant upward effect on inequality indicators (Aitken and Weale 2018; Alvaredo 2011; Anand and Segal 2015, 2017; Atkinson and Jenkins 2019; Blanchet et al. 2018; Bourguignon 2018; Lustig 2018; Medeiros et al. 2018). In the UK the Family Resource Survey (FRS) and the Survey of Personal Income (SPI), household survey and tax administrative source respectively, have been frequently used in the UK to implement such top income methodologies (Atkinson and Jenkins 2019; Burkhauser et al. 2017a, b; Jenkins 2017). These studies have so far not focused on capital income measurement in particular. Medeiros et al. (2018) have recently identified the problem even after the application of top income methodology to surveys, inequality indicators are is still underestimating the role of non-labour income. The objective of this paper is to explore the underestimation of capital incomes in top methodologies further.

The UK makes for a compelling case study to take this argument further. This research finds that the FRS has reported a significant decline in capital income measurement over the past 20 years, compared to tax administrative data. Differences between the aggregate level of capital income reported in the survey and the tax data are referred to as missing income throughout the paper. Conceptually, capital income missing from household surveys is expected to produce a downward bias on survey-based estimates of inequality for two reasons. First, this income flow is disproportionately held by the rich who are not accurately captured household surveys (under-coverage). Second, researchers have long acknowledged that household surveys do not give a full picture of capital incomes at any point of the distribution (under-reporting).

Second, empirically it remains an open question to what extent the observed increases in missing capital income affects measures of economic inequality. The methodological contribution of the paper is the following. First, it examines at how existing top income methodology used in the UK corrects for capital income over the period 1997–2016, and explains why existing top income methodologies fail to account fully for the missing capital income. Second, as a response a multi-step procedure is proposed correcting for the remainder of the capital income. Targeting different types of underestimation error (under-coverage and under-estimation). Finally, I show how inequality indicators (Gini) are affected by the top by imputing this missing capital onto the household survey applying these multiple steps.

The proposed methodology has a wider application outside the UK context. First, as the research community continues to develop top harmonisation methodologies there are two different approaches to better incorporate capital incomes in inequality indicators. These include incorporating estimation error directly into novel top income methodologies or use existing methodologies and add capital income corrections. The latter approach is taken in this paper. Methodological choices are motivated by the latest debates arising from the top income harmonisation literature and the structure of the UK data. Data availability and the nature of the underestimation error for capital incomes (under-coverage and under-reporting) will vary from country to country. The multiple stages allow for flexibility and a tailored approach to apply the methodology in different national context. Wage inequality is known to be the key driver over overall levels of inequality. Depending on the purpose of the analysis it might not always be desirable to carry out multiple corrections, as in their current form they have only marginal effects on overall inequality. In the UK case where capital income measurement in surveys has been declining rapidly it worth exploring this option further. In addition, it is becoming increasingly known that the tax administrative benchmark taken in this analysis is very much a lower bound (Piketty et al. 2018). There has been a surge in research exploiting novel data bases which broaden the definition of capital incomes to include, for example, capital gains, undistributed profits and earnings on capital held offshore (Advani and Summers 2020; Alstadsaeter et al. 2017; Alstadsæter et al. 2016). A broadening of the capital income definition, compared to the benchmark used in this paper, has significant upward impact on non-survey-based measures of inequality measures. To my knowledge, no attempts have been made so far to integrate these components into survey-based inequality measures.

Second, at present top income methodologies are often applied to total income and are not decomposable. We know little about the role of capital incomes driving overall inequality using top adjusted statistics. This analysis shows there is more work to be done in integrating capital incomes into this debate by allowing for decomposability in harmonisation methodologies and examining the role of capital incomes in these adjustments. In many countries, the Gini coefficient calculated from surveys are used by governments statistics offices and policy makers and any bias in these indicators is therefore expected to have a societal impact. What is not picked up by inequality indicators structurally falls out of policy debates, which has arguably been the case for capital incomes.

2 Underestimation Error in Household Surveys

This problem of the missing rich is the main cause of underestimation error at the top of the distribution in household surveys (Atkinson and Piketty 2007, 2010; Lustig 2020). As a consequence, all inequality indicators constructed using only household surveys are known to underestimate levels of inequality (Bourguignon 2018). Underestimation takes place through both under-sampling and under-reporting (Anand and Segal 2015; Blanchet et al. 2018; Bourguignon 2018; Jenkins 2017; Lustig 2020). The former relates to households which are not sampled. The latter relates to sampled households not responding or misreporting income levels.

Surveys rely on a sample to draw broader inferences for the entire population. Among the households sampled and responding to the survey, underestimation errors can also arise through the data collection process. Participants may decide to not reply to all variables (item non-response) and responses might be misreported. After the information has been gathered, data preparation such as anonymisation, rounding, truncating etc. often introduced additional underestimation error in the reported income estimates (Blanchet et al. 2018; Bourguignon 2018; Jenkins 2017; Lustig 2020). A schematic overview of these errors is presented in Fig. 1.

Fig. 1
figure 1

Underestimation error in household surveys arises through under-sampling and under-reporting. Note: Based on references (Anand and Segal 2015; Blanchet et al. 2018; Bourguignon 2018; Jenkins 2017; Lustig 2020). This paper uses the terminology adopted in these papers

There are various reasons why measurement error is at the top is a particular problem. There is evidence that in particular high-worth individuals are typically under sampled in surveys and that misreporting of income increases as one moves higher up the income distribution (Bourguignon 2018; Schröder et al. 2018). In addition, top incomes sampled in the survey are often volatile because of the small amount of households/individuals sampled at the very top (Burkhauser et al. 2017c, b).

Top income adjustments use information from tax administrative data to supplement the shortcomings of the household surveys at the top of the income distribution. Tax administrative sources provide a better coverage of top incomes as it is a legal requirement to fill in tax returns. Top income methodology exists in various forms a comprehensive overview can be found in (Lustig 2020). The most common parameters applied to the top income correction are the eligibility threshold (e.g. top 1%), income replacements and weight calibrations. Income replacements and weight calibrations can be applied equally to all individuals within the eligibility threshold, however, usually individuals within the eligibility threshold are divided into bins allowing for a more refined analysis (Burkhauser et al. 2017a; Medeiros et al. 2018). Conceptually, income replacements are used to correct for under-reporting and weight calibrations correct for under-reporting. This recalibration places more weight on high income individuals typically falling out of the household survey.

Tax legislation determines what is observed in fiscal sources creating a gap between what is ideally observed and what is actually observed (Atkinson 2007; Jenkins 2017). Since household surveys and tax sources both provide an incomplete picture of the distribution harmonisation methods are therefore always an approximation (Lustig 2020). Without an identifiable link between the household survey and individuals observed in the tax data assumptions have to be made. Depending on the type of estimation error in the income sources methodologies require the use of all three parameters or a combination of the parameters. In practice, surveys and tax administrative data are not fully comparable and data limitations differ across countries. The preferred methodology is dependent on these constraints (Medeiros et al. 2018).

The current analysis brings to the centre the underestimation of capital incomes (dividends, interest and rents). There are only few studies looking at capital incomes and inequality in surveys (Aitken and Weale 2018; Fräßdorf et al. 2011; García-Peñalosa and Orgiazzi 2013; Green et al. 2008). Most likely because of severe underestimation of this income source in surveys the topic has been largely neglected. Given that the household surveys are still often the dominant survey to construct the Gini coefficients by government it should deserve closer scrutiny.

The general underestimation errors described above are expected to disproportionally impact capital income measurement. First, under-sampling of rich individuals poses a particular problem to capital incomes because these individuals are disproportionally the recipients of capital income. Second, top income adjustments are generally applied to total income and often lack the property of decomposability. As a consequence, existing top income adjustments do not give a clear picture of how well they correct for income sub-components. Third, under-reporting of capital income takes place across the entire distribution not only among those above the eligibility threshold (e.g. top 1%).

Medeiros et al. (2018) are among the few who allow for decomposition in their methodology and conclude that a better collection of non-labour incomes is needed to fully come to grips with how to adequately correct for underestimation of capital income in household surveys. The next section provides a more in-depth analysis of non-labour income in UK sources and proposes a methodological advancement allowing to tackle this issue.

3 UK Case Study

This section provides a description of the survey and the tax administrative data. Followed, by an analysis of capital income measurement in the survey compared to the fiscal source. Improving our understanding of which underestimation bias arises and how this changes at different parts of the distribution. Subsequently, the survey and tax data are used to carry out existing UK top harmonisation methodology (Burkhauser et al. 2017a) to investigate how much capital income is still missing from the survey-based estimates of inequality after applying top corrections. This provides the basis for the methodological contribution of this paper presented in the next section.

3.1 Data

The household survey used in this analysis is the Family Resources Survey (FRS). The FRS is a representative sample of private households in the UK of over 20.000 households (and individuals within the households) carried out on a yearly basis. The current analysis uses total income which includes wage, self-employment, pension and capital income. The capital income variable is aligned with the Canberra Handbook (Canberra Group U 2011) to the best extent possible. The capital income variable includes interest, dividends, rent, silent partners income and IP rights (e.g. land). Capital income reporting in household surveys is generally considered as poor, variables suffer from e.g. misreporting of incomes and item non-response. The FRS statisticians make some imputations for misreporting of capital incomes. Dividends and interests are grouped together and are hard to disentangle. The FRS includes information on non-taxable interest and dividend received from tax efficient savings vehicles such as National Savings and Investments (NS&I) and Individual Savings Accounts (ISAs).

The tax administrative data used for the analysis is the Survey of Personal Income (SPI) Public Tape available through the UK Data Service. This dataset comprises a stratified sample of tax records from HMRC’s self-assessment (SA) and payroll (PAYE) administrative systems. The UK tax unit is the individual level, it is not possible to carry out household analysis and the SPI is not available for the tax year 2008. Total fiscal income is used for the current analysis, minus benefits. Fiscal data is heavily influenced by changes in the tax code. The UK income tax year starts in April and does not give a full picture of the calendar year. Fiscal reporting is sensitive to income-forestalling or income-delaying in response to changes in the tax rate. A small number of records in the public tape data at the very top of the distribution have been anonymized using a standard procedure documented in the SPI documentation. Artificial fluctuations in income reporting in response to the tax system are large enough to visibly affect reported inequality trends after applying top income methodologies. Tax reforms which are known to have had a measurable impact on capital income measurement in the tax administrative source (Atkinson 2012; Atkinson and Ooms 2015; OBR 2017; Pope and Waters 2016; Seely 2015). First, A sequence of tax reforms, taking place in April 2000 and April 2002, have gradually lowered the corporate tax rate to 10% and 0%, respectively. The tax reform produced an incentive for self-employed to register as a company to reduce tax liabilities and pay themselves out in dividends rather than in wage. Second, in March 2009 the Labour Government announced a rise in the top marginal rate from 40 to 50% to take place in April 2010. High income individuals above £500,000 brought forward their tax liability to 2009–10 from 2010 to 11, to legally avoid the top 50% tax rate on incomes above £150,000. Third, in March 2012 the Conservative Government announced a reduction in the top rate from 50 to 45% in April 2013. This created an incentive for high income individuals [+£500,000] to delay the receipt of income from 2012–13 to 2013–14. Around £16-18 billion has been brought forward, about 0.5% of total income of which £6 billion was dividends (35%) among incomes above £500,000. Fourth, April 2016 the Dividend Tax Credit was replaced by a dividend allowance which allows the first £5,000 in dividend income to remain untaxed, this scheme raises the overall tax on dividends resulting in dividend payment being brought forward to 2015–16. It is estimated that £7,6–£10,7 billion of the income has been brought forward. Behaviour responses triggered by these reforms create fluctuations in capital income measurement in the fiscal source.

The academic community largely relies on the FRS and the SPI to apply top income methodologies (Atkinson and Jenkins 2019; Burkhauser et al. 2016, 2017c; Jenkins 2017). The unit of measurement used for these reconciliation exercises are gross individual income which can be observed in both sources for the period 1996-97 to 2016-2017 for the population +15 (Atkinson and Jenkins 2019; Burkhauser et al. 2016, 2017c; Jenkins 2017). Capital incomes are mainly held at the top of the distribution, benefits with a more prominent effect further down the distribution have been excluded from both data sources. Appendix I provides an overview of key differences in capital income measurement in the FRS and SPI. These differences are not enough to account for increases in capital income underreporting described shortly. This analysis uses capital income reported in the SPI as benchmark to estimate the amount of missing income from inequality indicators. The SPI benchmark is in fact a lower bound. Both sources exclude information on capital gains (both realised and unrealised) and capital incomes not reported because of tax avoidance and evasion. As a result, all estimates presented in this paper are lower bounds and more research in this area is encouraged.

3.2 Underestimation of Capital Incomes in the UK

As discussed in Sect. 2, there are two main reasons, under-coverage and under-reporting, for capital income to go missing from inequality indicators. The current analysis refers to underestimation as the sum of under-coverage and under-reporting, in practice it is hard to disentangle their relative magnitude. In line with existing literature, this section looks at the underlying data structure to find out how this underestimation is arising Medeiros et al. (2018). Figure 2 finds an increased underestimation of capital incomes over the past 20 years in the survey, taking the SPI data as benchmark. At no point in time does the survey capture all the capital income as observed in the fiscal source. In 1997, the survey captured half of the capital income compared to the fiscal source. This capture has declined over time reaching an all-time low in recent years where only 1/3 is captured.

Fig. 2
figure 2

Ratio of aggregate level of capital income observed in the FRS divided by the aggregate level of capital income observed in the SPI for the period 1997–2016. Note: Author’s estimates from the Family Resources Survey (FRS) and Survey of Personal Income (SPI), the UK tax administrative data. The SPI Public Tape micro files are not released for 2008. Populations shares are fixed in both databases, using the WID control totals, to improve comparability between both databases as described in Burkhauser et al. (2017a). The peaks observed in 2010 and 2016 are the result of income retiming strategies as a result of changes to top marginal tax rates affecting the years 2009/2010 and changes to dividend taxation affecting the years 2015/2016

Figure 3 shows that the underestimation takes place across the entire capital income distribution. The underestimation increases with income and is particularly a problem for the top 0.1% where almost none of the capital observed in the fiscal source is picked up by the survey. The downward trend observed in the aggregate is largely driven by the top 10% (minus top 0.1%). Increasing underestimation is not just an issue at the top of the distribution. This even takes place among the bottom 90%, but this groups holds relatively little capital income in the aggregate. Top income adjustments are often applied to the top 1%-5% but even below these eligibility thresholds for the top adjustment more than half of the capital income is missing and increasingly so.

Fig. 3
figure 3

Declining capital income across the distribution for gross individual income. Note: Author’s estimates from the Family Resources Survey (FRS) and Survey of Personal Income (SPI), the UK tax administrative data. The SPI Public Tape micro files are not released for 2008. Populations shares are fixed in both databases, using the WID control totals, to improve comparability between both databases as described in Burkhauser et al. (2017a). The peaks observed in 2010 and 2016 are the result of income retiming strategies as a result of changes to top marginal tax rates affecting the years 2009/2010 and changes to dividend taxation affecting the years 2015/2016

Figure 4 shows that this problem is largely a capital income phenomenon. This decline only takes place for capital income compared to the other income components grouped together (wage, self-employment and pensions). The income definition used to produce household surveys is broader than the fiscal definition of income as it includes non-taxpayers and forms of non-taxable income, in an ideal case over 100% of fiscal income should be picked up by the survey. Appendix I provides a comparison of capital income variables (definitions, measurement etc) but given the scale of the underestimation and increase it is unlikely that definitional differences alone drive the story.

Fig. 4
figure 4

FRS/SPI gaps for capital income and other components of total income. Note: FRS/SPI aggregate capital income and other components of total income (wage, self-employed and pensions). Author’s estimates from the Family Resources Survey (FRS) and Survey of Personal Income (SPI), the UK tax administrative data. The SPI Public Tape micro files are not released for 2008. Populations shares are fixed in both databases, using the WID control totals, to improve comparability between both databases as described in Burkhauser et al. (2017a). The peaks observed in 2010 and 2016 are the result of income retiming strategies as a result of changes to top marginal tax rates affecting the years 2009/2010 and changes to dividend taxation affecting the years 2015/2016

With the existing data is it not possible to accurately determine which part of this underestimation is driven by under-coverage and under-reporting. Under-reporting is expected to arise for all survey-based measures of inequality through, for example, item non-response, misreporting and data preparation. It is not clear if this bias arises at the same magnitude across the entire distribution. In terms of under-coverage, it is known that the rich are often under sampled in surveys. A less studied fact is the income composition of this under-sampled group. The World Inequality Database (WID) control totalsFootnote 1 can be used compare both data sources at the top of the distribution in some form. In line with Burkhauser et al. (2017a), I group individuals in bins representing 0.1% of the total (weighted) population in both the FRS and the SPI. The results are presented in Table 1.

Table 1 Average individual capital shares per bins J = 50 (p95.0-p95.1), J = 10 (p99-p99.1) and J = 1 (p99.9-p100) in the FRS and the SPI

For example, bin 1 represents the top 0.1% or (p99.9-p100), bin 2 represents the top 0.2% (minus 0.1%) or (p99.8-p99.9) etc. On average, bin 50 (p95.0-p95.1) has an average capital share of 3% in the FRS compared to 8% in the SPI. For bin 1 these figures are 4% and 19% respectively. This exercise reveals that the average capital share per bin is substantially higher in the SPI compared to the FRS and it increases as we move up the distribution. This provides an initial indication that the income composition of people observed at the top might vary in the survey and the tax admin data. The next section returns to this point.

In sum, underestimation of capital incomes has increased over the past 20 years. This implies capital income is increasingly missing from household survey-based estimates of inequality. In practice it is hard to determine which part of the story is driven by under-reporting and under-coverage. Problems of under-reporting of capital incomes (item non-response, misreporting and data preparation) are most likely taking place across the entire distribution. As capital incomes are concentrated among high income individuals it is plausible these errors become stronger moving up the distribution. The figures show that the top 0.1% in the survey is notoriously bad at capturing capital incomes. At the same time, capital income underestimation is increasingly observed further down the distribution among the top 10% (minus 0.1%). One potential explanation for observed increases in underestimation, is that the survey has failed to pick up the growth in capital income among this group. Comparing the downward trend to other income components included in total income, it is clear that this underestimation is very particular to capital incomes. There are definitional differences and timing differences between the FRS and the SPI (Appendix I) but these are unlikely to explain the entire picture. There appear to be differences in income composition between the survey and fiscal source, capital shares as measured across various points in the distribution, are substantially higher in the SPI compared to the FRS. This provides a preliminary indication that the income composition of people observed at the top differ in the survey and the tax admin data.

3.3 Capital Incomes in Top Income Methodology

The harmonisation methodology proposed in the UK context has been developed by (Burkhauser et al. 2017a, b).Footnote 2 This methodology assumes under-reporting is the main cause of income underestimation and uses income replacements to correct for this under-reporting. A visualisation of the methodology is presented in Fig. 5. As mentioned in Medeiros et al. (2018), there is no particular rule to establish the size of replacement bins or eligibility thresholds. These are established in accordance with observations of the underlying data structure.

Fig. 5
figure 5

Visualisation Burkhauser et al (2017a) top income methodology. Note: Each of the figurines can be seen individually but can also represent a group of individuals. Income is indicated on the vertical axis and increases as one moves up the axis. The lower horizontal dotted line, indicated by α, represents the eligibility threshold the income threshold at the population cut-off (e.g. 1% or 5%). The horizontal axis represents the population total where the final vertical dotted line represents the total population. For those eligible for the adjustment, this income correction depends on the location within the income distribution. The Figure shows the intuition of the adjustment for two bins, with individuals ranked by their position in the income distribution in the FRS and SPI. The income level of individuals within each bin in the FRS is scaled up (or down) to the income mean of that bin from the SPI. In the example this is y*BH1 and y*BH2 for bin 1 and bin 2, respectively

Individuals in the survey and tax admin data are ranked according to their position in the distribution of total income. To ensure comparability at the top of the distribution, population control totals are used taken from the World Inequality Database (WID)Footnote 3 to construct these bins representing 0.1% of the total (weighted) population. Based on the ranking within the survey and the fiscal source these individuals are allocated into the corresponding bin. For example, if the eligibility threshold is the top 1%, income replacements are made for 10 bins each representing 0.1% of the weighted population. Individual income observed in the FRS within each bin, is replaced by the mean income of the corresponding bin in the SPI. Subsequently, individuals within the survey in bin 1 are all given the mean income observed in the SPI in bin 1.

A general limitation of top income methodology is that decomposability is not always possible. This has implications for the study of capital incomes. For the purpose of the current analysis, decomposability is needed in order to understand how well the existing top harmonisation methodology corrects for capital incomes. I assume fixed income composition within each population bin eligible for the top income correction. The SPI mean income for bin 1, can be decomposed into the sub-components of total income used in the analysis (wage, capital income, self-employment and pensions). This implicitly assumes the percentage share of income of these sub-components is the same for all individuals within the bin. When applying the top correction to the sub-components of total income, levels of inequality as measured by the Gini coefficient remain the same. This assumption can be refined in future analysis.

It is now possible to calculate the percentage capital income which is adjusted for compared to fiscal data. There is a clear downward trend in capital income measurement under all eligibility thresholds, this is not suprising given that capital income is starting to reach futher down the distribution. Table 2 provides an indication that top income corrections alone are not enough to correct for underestimation of capital income in household surveys. As example, after adjusting for the top 0.1% only 50% of total capital income as reported in the tax data is included in top adjusted inequality indicators in 2016. In other words, 50% of total capital income is missing and not incorporated in survey-based measures of inequality.

Table 2 Capital income adjusted compared to tax administrative data, under different eligibility thresholds

4 Proposed Capital Income Correction

We have seen that top adjustments do not fully correct for the missing capital income, their effect has become less over the past 20 years as capital income is reaching further down eligibility thresholds. This provides the foundation for the methodology proposed. A multi-step procedure which can be applied to existing top harmonisation methodology. The proposed extension is informed by three observations which have been made so far. First, capital incomes have gained importance further down the distribution top 10% (minus top 0.1%) which is not fully picked up by the existing top income methodology. Second, capital shares are substantially higher in the SPI compared to the FRS. This provides a strong indication that the individuals in the FRS are not representative in terms of income composition at the top compared to the SPI. There might be a particular under-coverage of rich individuals with substantial capital income. Third, underestimation occurs across the entire distribution and is not purely a top phenomenon. There is need to apply a correction factor across the entire distribution to correct for 100% of the capital income as observed in the fiscal source.

The proposed methodological advancement is of explorative nature. The method is a mix of income and weight replacement since both under-reporting and under-coverage is a problem for capital income measurement. The multiple steps included in the methodological advancement are:

  • Step 1: apply the exiting top income methodology which corrects for missing rich in surveys (income replacement)

  • Step 2 correct for under-coverage of high capital share individuals (weight re-calibration)

  • Step 3 correct for under-reporting below the eligibility threshold (income replacement)

The results section includes a discussion of how much missing income is corrected for in each step and the effects on the Gini coefficient in each step. I also comment on how changing the ordering of these steps affects the results and overlap between the steps that need more scrutiny in future work.

As the research community continues to develop top harmonisation methodologies there are two different approaches to better incorporate capital incomes in inequality indicators. The first approach is to better integrate capital income underestimation directly into novel top income harmonisation methodologies. The second approach, used in this paper, is to use existing top income methodologies and add a capital income correction. A limitation of this approach is that the results are depended on assumptions imposed by existing top methodologies. In addition, it is not possible to precisely determine the relative magnitude of under-reporting and under-coverage in capital income measurement. It should be noted that the empirical application varies according to the methodology. There is a trade-off in the methodology between applicability and accuracy. More sophisticated methodologies can provide more accurate results but they can have higher data requirements. The current methodology is more ad hoc and determined by the underlying data structure, and more easily adopted by non-academic actors such as governments. The purpose of this analysis is to improve survey-based estimates of inequality used by such actors, which justifies the current approach.

Despite these limitations, broader implications of this analysis are (1) to highlight the need for decomposability of top adjusted inequality indicators to bring out the role of capital incomes and (2) a discussion of the various sources of capital underestimation underpinning the multi-step procedure will allow for a tailored approach in different national contexts.

  • Step 1 Application of existing top income methodology

    Application of Burkhauser et al (2017a) methodology applied to the UK data as described in Sect. 3.3

  • Step 2 Correcting for the under coverage of high capital share individuals eligible for the top income correction

    Table 1 provided some indication that the income composition of people observed at the top varies in the survey and the tax admin data. This will now be explored further by allowing for variability within this group. I use average capital shares per bin (j) per year (t) found in top adjusted data under step 1 (\(CSagg_{j,t}\)) to identify high and low capital share individuals in the tax administrative data and household survey. Individuals (x) in both the FRS and the SPI, are classified as either high or low capital share individuals in the following way:

    $$High\,capital\,share\,individual:CSagg_{j,t} \ge CSind_{j,x,t}$$
    $$Low\,capital\,share\,individual:CSagg_{j,t} < CSind_{j,x,t}$$

Table 3 provides an indication of the percentage of (weighted) high capital share individuals found in the FRS and the SPI for different bins. Less high capital share individuals are observed as we move up the income distribution in the survey. On the other hand, more high capital share individuals are observed in the tax data as we move up the distribution. In other words, it appears that the FRS under samples high capital share individuals and this gap increases as we move up the distribution. Note that high capital share individuals, as defined in this section, still largely receive their income through earnings.

Table 3 Percentage high capital share individuals/total weighted population within bin for bins (j) 50, 10 and 1

From this it can be concluded that the survey suffers from under-coverage of high capital share individuals compared to the fiscal source. A weight calibration factor is calculated which inserts more high capital share individuals in the survey (Nhifrs) so that the equation below holds:

$$\frac{{Nhi_{frs, j} }}{{Nlo_{frs,j} }} = \frac{{Nhi_{spi,j} }}{{Nlo_{spi,j} }}$$

With Nhifrs and Nhispi representing the number of weighted individuals with a high capital share, in bin J, in the FRS and SPI respectively. Low capital share individuals are indicated with Nlofrs and Nlospi. Table 4 presents an overview of the applied calibration ratios within different bins (j). A high calibration factor indicates that the high capital share individual is almost not observed in the FRS for that year, compared to the SPI.

Table 4: Weight calibrations factors used to scale up high capital share individuals in the survey

Blank observations indicate that in these years in the corresponding bin there are no high capital share individuals observed in the survey. This problem starts to occur after 2001 and becomes more pronounced after 2011 onwards. In order to correct for this, I use the average weight calibration factor value per bin (j) and per year (t) and apply this to the individuals with the highest individual capital share individual within the missing bin in the FRS in the following way. I rank individuals within the FRS bins according to the height of the individual capital share as observed in the unadjusted survey data. If on average 10 observations within bin 1 have a high capital share value, in years where the is no missing value, 10 people will be classified as high capital share individuals in years with blank observations. There are some fluctuations which are most likely driven by outliers and sample variation at the top paired with the volatility of capital incomes. In rare occasions, more high capital share individuals are observed in the survey, the calibration factor is negative in these cases.

Using the calibration factors in Table 4, the weights of high capital share individuals in the survey are scaled up to match the proportion of high capital share individuals observed within the corresponding bin of gross income in the tax data. The capital income correction places a higher weight on individuals with a relatively high capital share (per bin and per year), so that it matches the information from the tax administrative data. A visualisation is given in Fig. 6, The before picture is the starting point after applying step 1.

Fig. 6
figure 6

Visual representation capital income correction. Note: The capital income adjustment classifies people within each bin into a high and low capital share individual. High capital share individuals within each bin are indicated with a*. The dotted horizontal line is at the height of the replaced bin mean in the BH method (y*BH1 and y*BH2). The weight of the high-income individuals (*) is recalibrated using the appropriate recalibration factor for each bin. The weight recalibration accounts for the under sampling of individuals within relatively high individual capital shares at the top of the distribution. Put differently, the effect of the recalibration on the figurine is that the individuals marked with a * expand in width. This width varies per bin and pushes the population total of the bin up plus the total weighted population expands beyond the boundary of the total population. After applying the weight calibrations the population is no larger than approximately 1% using the highest eligibility threshold of 5%, which is in line with previous research (Medeiros et al. 2018)

Step 3: Correcting for underestimation of capital income across the entire distribution

Capital incomes have become increasingly important for the top 10% (minus top 0.1%) so they reach further down the distribution than the eligibility threshold for existing top adjustments. Underestimation is a problem across the entire income distribution. There are various ways to allocate this income, for simplicity the current exercise the remaining missing capital income is scaled up proportionately across the entire distribution among individuals with positive capital income. This approach is commonly adopted in the literature (Bourguignon 2018).

5 Results

Table 5 shows how much capital income is adjusted for and the effect on overall inequality as measured by the Gini. For illustrative purposes the results are presented for the top 5%, similar patterns are observed under different eligibility thresholds related to the relative contribution of each step in reducing missing capital income and increase in the Gini.

Table 5 Top 5% capital income corrected and Gini estimates under different steps

The income replacements and weight calibrations have an effect on income and population totals. Top methodologies have different approaches on how to deal with this side effect. For the current purposes, I point out these changes and describe how reordering of the steps influences these variables. Income totals mechanically increase to correct for the missing income, the average increase includes some form of overcorrection as described below. Figure 7 presents a visualisation of Gini point increases after applying each of the steps.

Fig. 7
figure 7

Gini point increase after applying each of the steps. Note: Gini point increases after applying the different capital income correction steps using the SPI to complement the household survey-based estimates of inequality

Step 1 (the existing top adjustment) has the largest effect on overall inequality as seen in Figure 7. We also have seen that capital incomes play an important role in this step 1, it reduces missing income by the largest amount of (73-39) 34% percentage points on average. However, it leaves on average 27% of total capital income uncorrected for. Top adjustments applied further down the distribution correct for a larger fraction of missing capital income, for example an eligibility threshold of 1% will leave more capital income unaccounted for. Conceptually, this step corrects for missing rich in the survey. Increases in total income as a result of the top adjustment (income replacements) is 4% on average, this includes a correction for both capital as well as other income components included in total income. Step 2 represents a correction for the under-coverage of high capital share individuals and takes the form of weight replacements. This step raises inequality by a further 0.01 Gini point on average, and reduces missing income by a further 10 percentage points. Population totals are increased by a total of 0.6% on average, which is an acceptable range (Medeiros et al. 2018). Placing more weight on high capital share individuals on the already top adjusted series raises income totals by a further 5%, the ordering of the steps is likely to produce some form of overcorrection as high capital individuals are added in after applying the income replacements in step 1. Step 3 allocates the remainder of the income according to the proportionate allocation formula. This allocation reduces missing capital income to 0% and increases total income by a further 2% after applying Step 1 and 2. This step has become more important over time as capital income underestimation has grown.

The results fluctuate over the entire period. The largest upward effect of the capital income correction is noticeable in particular after changes in tax rates creating behavioural responses affecting capital income reporting upward such as after behavioural responses to changes in top marginal rates affecting the 2009 and 2013 observations and dividend payment being brought forward to 2015 after changes to dividend taxation. A better incorporation of capital incomes observed in fiscal sources into indicators of economic inequality, along the lines proposed in this work, raises the questions to what extent artificial fluctuations are imported into inequality statistics. The fluctuations in the capital adjusted series are larger than on the top adjusted series, because of the transformative nature of capital incomes they are a useful tool in retiming strategies to minimize the personal tax burden. Capital incomes are much more sensitive to retiming strategies than earnings

Various sensitivity checks have been carried out but without substantial changes to the results. In step 2 after applying the weight calibration the total population within each bin is altered and adjusted upwards. I have readjusted the bins size to represent 0.1% of the re-calibrated total population. This exercise does not change the results but they are conceptually different.

Given the overestimation of step 2 the reordering and overlap of steps is one issue to be examined more closely. The underlying idea of the ordering of the steps is to choose them such that they minimise carrying over inaccuracies produced by assumptions in previous steps. I reorder the steps to first allow for the re-weighting of under-coverage high capital share individuals (step 2), followed by the top income correction (step 1) and the proportionate allocation (step 3). But this does not change the relative contribution of each step in raising overall inequality. Inevitably because of data limitations we can’t distinguish between under-coverage and under-estimation error in capital income measurement, and it is hard to determine overestimation driven by the top adjustments through income replacements (step 1) and inserting high-capital share individuals (weigh-replacements). More research is needed in this area in the area of how to combine components used in different top income methodologies to refine these assumptions.

The multiple step adjustment adds only marginally to overall estimates of inequality. However, the broader picture is a bit more nuanced. Despite the overcorrection in step 2, the estimates presented in this paper are very much lower bounds using the fiscal source as benchmark. The SPI benchmark does not include components such as capital gains, which are estimated to raise UK inequality (top shares) the top 1% shares by 3 percentage points (Advani and Summers 2020). Furthermore, tax exempt income and income not reported as a consequence of tax avoidance and evasion is not accounted for in the current analysis. Quantifying the effect of such items are gaining more attention in the literature of economic inequality. If these items were to be included in the analysis the effect of the additional steps would have been higher.

A key area where the multiple step adjustment can add value is the policy debate. The UK capital income narrative told in cross-country comparative perspective has long been that of low (and stable) contribution to inequality (Fräßdorf et al. 2011; OECD 2011). Lack of decomposability of top adjusted statistics have not allowed this indicator to enter public debate. The relative contribution of capital incomes to overall inequality is highly sensitive to the methodology adopted (Medeiros et al. 2018). Despite limitations in the current analysis, it is likely that the shortcomings in the UK household survey highlighted in this analysis have produced a biased picture of inequality in cross-country comparative analysis as shown in Fig. 8. The UK capital income narrative should be changed to capital incomes having a moderate (and increasing) effect on inequality.

Fig. 8
figure 8

Relative contribution of capital incomes to overall inequality (% contribution Gini coefficient) Note: relative contribution (%) of capital incomes to overall inequality as measured by the Gini. Using the FRS unadjusted series, and each of the different steps which include the different capital income adjustments imputing missing income taking the SPI as benchmark

6 Conclusion

This analysis has shown that existing top income methodologies do not correct entirely for capital incomes and has proposed a methodological extension to existing methods. Because capital incomes are disproportionally held at the top of the distribution, capital income adjustments have an upward effect on indicators of economic inequality. Capital income measurement in the UK has deteriorated over the past 20 years, serving as good case study to carry out an exploration on how to go about this type of corrections.

First, the top income adjustment is correcting for less capital income over time. This is driven by the fact that capital income is increasingly reached further down the top 10% (minus top 0.1%), missed by the top correction. Second, the correction needs to adjust for under-coverage of high capital share individuals in household survey compared to tax data. High capital share individuals have increasingly gone missing from the survey and this under-coverage grows when moving up the distribution. Third, capital incomes are increasingly reaching individuals below the eligibility threshold of top income corrections and an adjustment needs to me made for under-reporting of capital incomes among those not eligible for the top income correction.

These findings suggest that, in the UK case a capital income correction needs to adjust for both under-coverage and under-reporting of capital incomes. This implies the correction should include both income and weight replacements. In practice, it is hard to distinguish the relative importance of under-coverage and under-reporting which both contribute to the underestimation. Hence, the proposed multiple-step procedures can give rise to some form of overestimation it is advisable to apply the steps in the order which carries over the least amount of estimating error, so leaving assumption heavy steps for last such as the proportionate allocation. As the research community continues to develop top harmonisation methodologies there are different approaches, to better integrate capital income estimation error directly into the top income harmonisation approaches or to use existing top methodologies and add a capital income correction. In the latter case results are depended on assumptions imposed by the existing methodologies.

Capital income underestimation is a known problem in surveys and it is very likely that similar biases arise in other countries to varying degrees. A multi-step procedure can be used to tailor the capital income corrections to a different country context with different data and methodological limitations. More research in different country case studies are encouraged to come to grips with the relative importance of under-estimation and under-coverage in survey-based capital income estimates and how this varies across the distribution.

A novel finding in the UK context is that the additional capital income correction pushes the average Gini coefficient up by approximately 0.01 points compared to already top adjusted series. The approach further allows for decomposability of top adjusted statistics, arguably an indicator missing form public debate. In the UK case where capital income underestimation is of growing concern, the multi-step imputations can have potential large implications for policy informed by decomposed statistics even if the effect on the Gini coefficient is only marginal. Findings suggest that the UK capital income narrative should be adjusted from having a low (and stable) contribution to inequality to a moderate (and increasing) effect on inequality. There is reason to believe that the shortcomings in the household survey, as described in this analysis, have led to a biased picture in cross-country comparative analysis and a distort policy message in this area.

The results presented in this analysis are lower bound estimates, despite the data and methodological limitations. Tax administrative data used as benchmark in this analysis largely captures income included in the Personal Income Tax (PIT) which excludes non-taxpayers, tax exempt income, capital gains, retained earnings and tax avoidance/evasion practices etc. The emerging literature quantifying the impact of the beforementioned capital income flows on tax admin-based measures of inequality, reports significant increase in top shares after accounting for these income flow not included in the current analysis. Furthermore, fluctuations are driven by specific tax changes which has triggered large behavioural responses in capital income reporting in the tax data. This finding suggests that top harmonisation methods can make it harder to separate real trends in inequality from artificial fluctuations imported through the tax system especially when accounting for capital incomes.

A growing body of literature has been arguing for the replacement of household surveys for fiscal sources in the analysis of economic inequality. In the meantime, household surveys are still used to produce official inequality indicators providing the basis for policy and decision making in many countries, including the United Kingdom. What is not captured by inequality indicators structurally falls out of policy debates, this has arguably been the case for capital incomes in UK household survey-based measures of inequality. The drastic decline of capital income measurement makes correcting for missing capital incomes a timely matter, and with lack of current alternatives which are necessary to adequately inform policy. In future work it is worth exploring in greater detail how to refine the proposed lower bound estimates and comparing the results across the different top income methodologies developed in recent years.