This section provides a description of the survey and the tax administrative data. Followed, by an analysis of capital income measurement in the survey compared to the fiscal source. Improving our understanding of which underestimation bias arises and how this changes at different parts of the distribution. Subsequently, the survey and tax data are used to carry out existing UK top harmonisation methodology (Burkhauser et al. 2017a) to investigate how much capital income is still missing from the survey-based estimates of inequality after applying top corrections. This provides the basis for the methodological contribution of this paper presented in the next section.
Data
The household survey used in this analysis is the Family Resources Survey (FRS). The FRS is a representative sample of private households in the UK of over 20.000 households (and individuals within the households) carried out on a yearly basis. The current analysis uses total income which includes wage, self-employment, pension and capital income. The capital income variable is aligned with the Canberra Handbook (Canberra Group U 2011) to the best extent possible. The capital income variable includes interest, dividends, rent, silent partners income and IP rights (e.g. land). Capital income reporting in household surveys is generally considered as poor, variables suffer from e.g. misreporting of incomes and item non-response. The FRS statisticians make some imputations for misreporting of capital incomes. Dividends and interests are grouped together and are hard to disentangle. The FRS includes information on non-taxable interest and dividend received from tax efficient savings vehicles such as National Savings and Investments (NS&I) and Individual Savings Accounts (ISAs).
The tax administrative data used for the analysis is the Survey of Personal Income (SPI) Public Tape available through the UK Data Service. This dataset comprises a stratified sample of tax records from HMRC’s self-assessment (SA) and payroll (PAYE) administrative systems. The UK tax unit is the individual level, it is not possible to carry out household analysis and the SPI is not available for the tax year 2008. Total fiscal income is used for the current analysis, minus benefits. Fiscal data is heavily influenced by changes in the tax code. The UK income tax year starts in April and does not give a full picture of the calendar year. Fiscal reporting is sensitive to income-forestalling or income-delaying in response to changes in the tax rate. A small number of records in the public tape data at the very top of the distribution have been anonymized using a standard procedure documented in the SPI documentation. Artificial fluctuations in income reporting in response to the tax system are large enough to visibly affect reported inequality trends after applying top income methodologies. Tax reforms which are known to have had a measurable impact on capital income measurement in the tax administrative source (Atkinson 2012; Atkinson and Ooms 2015; OBR 2017; Pope and Waters 2016; Seely 2015). First, A sequence of tax reforms, taking place in April 2000 and April 2002, have gradually lowered the corporate tax rate to 10% and 0%, respectively. The tax reform produced an incentive for self-employed to register as a company to reduce tax liabilities and pay themselves out in dividends rather than in wage. Second, in March 2009 the Labour Government announced a rise in the top marginal rate from 40 to 50% to take place in April 2010. High income individuals above £500,000 brought forward their tax liability to 2009–10 from 2010 to 11, to legally avoid the top 50% tax rate on incomes above £150,000. Third, in March 2012 the Conservative Government announced a reduction in the top rate from 50 to 45% in April 2013. This created an incentive for high income individuals [+£500,000] to delay the receipt of income from 2012–13 to 2013–14. Around £16-18 billion has been brought forward, about 0.5% of total income of which £6 billion was dividends (35%) among incomes above £500,000. Fourth, April 2016 the Dividend Tax Credit was replaced by a dividend allowance which allows the first £5,000 in dividend income to remain untaxed, this scheme raises the overall tax on dividends resulting in dividend payment being brought forward to 2015–16. It is estimated that £7,6–£10,7 billion of the income has been brought forward. Behaviour responses triggered by these reforms create fluctuations in capital income measurement in the fiscal source.
The academic community largely relies on the FRS and the SPI to apply top income methodologies (Atkinson and Jenkins 2019; Burkhauser et al. 2016, 2017c; Jenkins 2017). The unit of measurement used for these reconciliation exercises are gross individual income which can be observed in both sources for the period 1996-97 to 2016-2017 for the population +15 (Atkinson and Jenkins 2019; Burkhauser et al. 2016, 2017c; Jenkins 2017). Capital incomes are mainly held at the top of the distribution, benefits with a more prominent effect further down the distribution have been excluded from both data sources. Appendix I provides an overview of key differences in capital income measurement in the FRS and SPI. These differences are not enough to account for increases in capital income underreporting described shortly. This analysis uses capital income reported in the SPI as benchmark to estimate the amount of missing income from inequality indicators. The SPI benchmark is in fact a lower bound. Both sources exclude information on capital gains (both realised and unrealised) and capital incomes not reported because of tax avoidance and evasion. As a result, all estimates presented in this paper are lower bounds and more research in this area is encouraged.
Underestimation of Capital Incomes in the UK
As discussed in Sect. 2, there are two main reasons, under-coverage and under-reporting, for capital income to go missing from inequality indicators. The current analysis refers to underestimation as the sum of under-coverage and under-reporting, in practice it is hard to disentangle their relative magnitude. In line with existing literature, this section looks at the underlying data structure to find out how this underestimation is arising Medeiros et al. (2018). Figure 2 finds an increased underestimation of capital incomes over the past 20 years in the survey, taking the SPI data as benchmark. At no point in time does the survey capture all the capital income as observed in the fiscal source. In 1997, the survey captured half of the capital income compared to the fiscal source. This capture has declined over time reaching an all-time low in recent years where only 1/3 is captured.
Figure 3 shows that the underestimation takes place across the entire capital income distribution. The underestimation increases with income and is particularly a problem for the top 0.1% where almost none of the capital observed in the fiscal source is picked up by the survey. The downward trend observed in the aggregate is largely driven by the top 10% (minus top 0.1%). Increasing underestimation is not just an issue at the top of the distribution. This even takes place among the bottom 90%, but this groups holds relatively little capital income in the aggregate. Top income adjustments are often applied to the top 1%-5% but even below these eligibility thresholds for the top adjustment more than half of the capital income is missing and increasingly so.
Figure 4 shows that this problem is largely a capital income phenomenon. This decline only takes place for capital income compared to the other income components grouped together (wage, self-employment and pensions). The income definition used to produce household surveys is broader than the fiscal definition of income as it includes non-taxpayers and forms of non-taxable income, in an ideal case over 100% of fiscal income should be picked up by the survey. Appendix I provides a comparison of capital income variables (definitions, measurement etc) but given the scale of the underestimation and increase it is unlikely that definitional differences alone drive the story.
With the existing data is it not possible to accurately determine which part of this underestimation is driven by under-coverage and under-reporting. Under-reporting is expected to arise for all survey-based measures of inequality through, for example, item non-response, misreporting and data preparation. It is not clear if this bias arises at the same magnitude across the entire distribution. In terms of under-coverage, it is known that the rich are often under sampled in surveys. A less studied fact is the income composition of this under-sampled group. The World Inequality Database (WID) control totalsFootnote 1 can be used compare both data sources at the top of the distribution in some form. In line with Burkhauser et al. (2017a), I group individuals in bins representing 0.1% of the total (weighted) population in both the FRS and the SPI. The results are presented in Table 1.
Table 1 Average individual capital shares per bins J = 50 (p95.0-p95.1), J = 10 (p99-p99.1) and J = 1 (p99.9-p100) in the FRS and the SPI For example, bin 1 represents the top 0.1% or (p99.9-p100), bin 2 represents the top 0.2% (minus 0.1%) or (p99.8-p99.9) etc. On average, bin 50 (p95.0-p95.1) has an average capital share of 3% in the FRS compared to 8% in the SPI. For bin 1 these figures are 4% and 19% respectively. This exercise reveals that the average capital share per bin is substantially higher in the SPI compared to the FRS and it increases as we move up the distribution. This provides an initial indication that the income composition of people observed at the top might vary in the survey and the tax admin data. The next section returns to this point.
In sum, underestimation of capital incomes has increased over the past 20 years. This implies capital income is increasingly missing from household survey-based estimates of inequality. In practice it is hard to determine which part of the story is driven by under-reporting and under-coverage. Problems of under-reporting of capital incomes (item non-response, misreporting and data preparation) are most likely taking place across the entire distribution. As capital incomes are concentrated among high income individuals it is plausible these errors become stronger moving up the distribution. The figures show that the top 0.1% in the survey is notoriously bad at capturing capital incomes. At the same time, capital income underestimation is increasingly observed further down the distribution among the top 10% (minus 0.1%). One potential explanation for observed increases in underestimation, is that the survey has failed to pick up the growth in capital income among this group. Comparing the downward trend to other income components included in total income, it is clear that this underestimation is very particular to capital incomes. There are definitional differences and timing differences between the FRS and the SPI (Appendix I) but these are unlikely to explain the entire picture. There appear to be differences in income composition between the survey and fiscal source, capital shares as measured across various points in the distribution, are substantially higher in the SPI compared to the FRS. This provides a preliminary indication that the income composition of people observed at the top differ in the survey and the tax admin data.
Capital Incomes in Top Income Methodology
The harmonisation methodology proposed in the UK context has been developed by (Burkhauser et al. 2017a, b).Footnote 2 This methodology assumes under-reporting is the main cause of income underestimation and uses income replacements to correct for this under-reporting. A visualisation of the methodology is presented in Fig. 5. As mentioned in Medeiros et al. (2018), there is no particular rule to establish the size of replacement bins or eligibility thresholds. These are established in accordance with observations of the underlying data structure.
Individuals in the survey and tax admin data are ranked according to their position in the distribution of total income. To ensure comparability at the top of the distribution, population control totals are used taken from the World Inequality Database (WID)Footnote 3 to construct these bins representing 0.1% of the total (weighted) population. Based on the ranking within the survey and the fiscal source these individuals are allocated into the corresponding bin. For example, if the eligibility threshold is the top 1%, income replacements are made for 10 bins each representing 0.1% of the weighted population. Individual income observed in the FRS within each bin, is replaced by the mean income of the corresponding bin in the SPI. Subsequently, individuals within the survey in bin 1 are all given the mean income observed in the SPI in bin 1.
A general limitation of top income methodology is that decomposability is not always possible. This has implications for the study of capital incomes. For the purpose of the current analysis, decomposability is needed in order to understand how well the existing top harmonisation methodology corrects for capital incomes. I assume fixed income composition within each population bin eligible for the top income correction. The SPI mean income for bin 1, can be decomposed into the sub-components of total income used in the analysis (wage, capital income, self-employment and pensions). This implicitly assumes the percentage share of income of these sub-components is the same for all individuals within the bin. When applying the top correction to the sub-components of total income, levels of inequality as measured by the Gini coefficient remain the same. This assumption can be refined in future analysis.
It is now possible to calculate the percentage capital income which is adjusted for compared to fiscal data. There is a clear downward trend in capital income measurement under all eligibility thresholds, this is not suprising given that capital income is starting to reach futher down the distribution. Table 2 provides an indication that top income corrections alone are not enough to correct for underestimation of capital income in household surveys. As example, after adjusting for the top 0.1% only 50% of total capital income as reported in the tax data is included in top adjusted inequality indicators in 2016. In other words, 50% of total capital income is missing and not incorporated in survey-based measures of inequality.
Table 2 Capital income adjusted compared to tax administrative data, under different eligibility thresholds