Drawing a Line: Comparing the Estimation of Top Incomes between Tax Data and Household Survey Data


The paper uses the flexibility of household survey data to align their income categories and recipient units with the income categories and units found in data produced by tax authorities. Our analyses, based on a standardized definition of fiscal income, allow us to locate, for top-income groups, the sources of discrepancy. We find, using the cases of the United States, Germany, and France, that the results from survey-based and tax data correspond extremely well (in terms of total income, mean income, composition of income, and income shares) above the 90th percentile and up to the top 1% of the distribution. Information about income composition, available in the US, allows us to investigate the determinants of this gap in the US. About three-fourths of the tax/survey gap is due to differences in non-labor incomes, especially self-employment (business) income. The gap itself may be due to tax-induced re-classification of income from corporate to personal or/and to lower ability of surveys to capture top 1% incomes.

Data Availability Statement

A. The survey data analyzed for the current study are available from “LIS, The Cross-National Data Center in Luxembourg”. More specifically, the data that we used are contained in the “Luxembourg Income Study (LIS) Database”, one of the two large micro-databases available via LIS. The LIS URL is [].

The LIS microdata are publicly-accessible, and very widely used, but there are three restrictions: 1) The microdata available from LIS may be used only for research; they may not be used for “commercial purpose”. Applicants specify their intended use of the data and those applications are reviewed by the staff, in according to the LIS bylaws created by the participating data providers. Users who are cleared at the application stage are registered and given a LIS ID and password, renewable annually. 2) The microdata available from LIS may not be downloaded; they are accessed via a remote-execution tool (using code written in SAS, SPSS, Stata, or R). Code is submitted via a Job Submission Interface (JSI), and results are returned to the user electronically. 3) All microdata users sign a pledge, committing to make the results of their research publicly available – via Working Papers, journal articles, books, and the like.

Note that while the LIS microdata, per se, are subject to these three restrictions, many other LIS products and services are entirely public/open-access. That includes two tools (DART and the Key Figures) that provide country-level aggregated indicators based on the microdata. That also includes all of the data documentation, the learning tools, and the extensive Working Paper series, which includes full-texts of papers based on the data.

B. The tax data that we used are entirely available in the studies listed in Table A.1.

C. The country-level policy data that we used are also entirely publicly accessible. The source is the website of the Mutual Information System on Social Protection (MISSOC): [].

  Income
  Inequality
  Survey data
  Tax data

