A Cautionary Tale of Using Data From the Tail

Understanding the levels and trends in deep and extreme poverty in the United States is of great interest to both policymakers and researchers. Social safety net programs are designed in part to prevent such extreme deprivation, and evidence that individuals and families slip through the cracks informs debates on how to improve social policy.

To measure deep and extreme poverty, Brady and Parolin (in an article published in this issue of Demography) use data from the Current Population Survey (CPS) to estimate the fraction of individuals in the United States that live in households with income below 20% of median income (about $7,300 in 2016), which they call deep poverty, and below 10% (about $3,600 in 2016), which they call extreme poverty. They estimated that 5.2 to 7.2 million Americans (1.6% and 2.2%) were in deep poverty and 2.6 to 3.7 million (0.8% and 1.2%) were in extreme poverty in 2016, and that the rates of deep and extreme poverty have risen sharply over the past 20 years. In addition, they conclude that the expansion of Supplemental Nutrition Assistance Program (SNAP) benefits has led to declines in deep and extreme poverty for households with children.

Brady and Parolin make an important contribution by bringing greater attention to this issue of extreme deprivation in the United States. However, a large and growing literature using linked survey and administrative data has shown quite convincingly that income is significantly underreported in large national surveys, particularly for individuals and families in the far left tail of the reported income distribution.^{Footnote 1} This underreporting leads to an overestimation of extreme and deep poverty and an underestimation of the impact of government programs. Brady and Parolin acknowledge concerns about measurement error and address them by imputing some income from government programs. Efforts to address underreporting using microsimulation models, however, do not accurately allocate imputed benefits to true recipients. Moreover, recent studies that relied on linked survey and administrative data have shown that many income sources besides government program income are underreported for those at the bottom of the reported income distribution (Meyer et al. 2019). Accounting for these other income sources would lead to lower estimates of extreme poverty.

Evidence on Underreporting of Transfer Income

Many studies over the past 25 years have documented with both indirect and direct evidence that survey income is significantly underreported. We know from these studies that reported family income is often far below reported family expenditures for those with few resources, even for those with little or no assets or debts (Meyer and Sullivan 2003, 2011). Other studies have shown that the total dollars of transfer income reported in large national surveys falls well short of the total dollars that is actually distributed according to administrative program data, and this problem has only worsened over time (Meyer et al. 2015). For the CPS, the ratio of survey reported dollars to administrative totals is below 0.6 for SNAP and below 0.5 for Temporary Assistance for Needy Families (TANF) in recent years (Meyer et al. 2015).^{Footnote 2}

Recent studies with a credible measure of true transfer income (i.e., those that have administrative data on actual receipt linked to individuals in surveys) have corroborated this evidence on severe underreporting of government transfers. For example, using New York administrative data on transfer income linked to the CPS, Meyer and Mittag (2019) showed that for those with reported income below 50% of the federal poverty line, only one-half of actual transfer dollars (SNAP, TANF, general assistance, and housing assistance) are reported. The unreported transfer dollars account for a large fraction of total reported cash income for those in the left tail. As shown in Fig. 1, which reproduces results from Meyer and Mittag (2019), this fraction rises sharply as income relative to the poverty line falls. For those with income below 50% of the federal poverty line, these unreported transfer dollars are more than double the reported cash income, but they are only 28% of the reported cash income for those between 50% and 100% of the poverty line.^{Footnote 3} This pattern is not unique to New York. Other studies have used administrative data for SNAP and other programs linked to survey data to show severe underreporting in the left tail in many other states (Meyer et al. 2018; Shantz and Fox 2018; Stevens et al. 2018).

Addressing Underreporting Using Micro-Simulation Models

Brady and Parolin acknowledge the concerns with underreporting of transfer income in surveys, and they take steps to address it by imputing some government program transfers, such as SNAP and TANF, using a microsimulation model (TRIM3).^{Footnote 4} They argue that this approach improves accuracy, noting that the weighted totals of these imputed values come close to matching the amount of assistance provided as reported in administrative data. However, in addition to making adjustments to allocate the correct amount of these benefits in total (microsimulations such as TRIM3 do this by construction), it is critically important for our understanding of deep and extreme poverty and the effects of these transfers on poverty to allocate imputed benefits to the right people. The evidence presented here clearly shows that these imputations misallocate a substantial share of benefits to the wrong parts of the distribution. In particular, they overallocate to those with very low income.

Microsimulation models such as TRIM3 adjust for underreporting by allocating imputed benefits to those families with the highest predicted probability of receipt based on observable characteristics. This imputation relies on the relationship between observable characteristics and the reported receipt of benefits, but the ideal approach is to impute benefits to families according to the actual probability of true receipt conditional on reporting no receipt (Mittag 2019). This sort of imputation cannot be done without information on actual receipt of benefits for survey respondents. In other words, one would need to be able to link the survey data to administrative microdata on who actually receives benefits from SNAP or other programs.

Fortunately, several recent studies do just that. Mittag (2019) compared the total dollars of SNAP benefits that CPS respondents from New York report receiving to the actual dollars of SNAP benefits received by them according to administrative records, as well as to the imputed value of SNAP dollars based on the TRIM3 microsimulation. As shown in Fig. 2, which summarizes these results, SNAP benefits are significantly underreported in the CPS, but TRIM3-adjusted SNAP benefits do not accurately correct for this underreporting. They underadjust the dollar amounts for those above 200% of the poverty line: for this group, TRIM3-adjusted SNAP benefits are only about one-fifth of actual benefits received. They also sharply overadjust benefits for lower-income households: for those below 50% of the poverty line, TRIM3-adjusted SNAP benefits are 176% of actual receipt based on the administrative microdata. This pattern strongly suggests that the already troubling problem of overallocating that is evident for those below 50% of the poverty line would only be more troubling for those in the bottom 1% or 2% of the reported income distribution, which is where analyses of deep and extreme poverty are focused. Mittag (2019:161) summarized the key takeaway succinctly: “TRIM sharply overcorrects below the poverty line, making it particularly problematic for studies of extreme poverty.”

Two closely related studies (Shantz and Fox 2018; Stevens et al. 2018) provided evidence very consistent with that from Mittag (2019) using CPS data linked to administrative records for SNAP and TANF for seven states.^{Footnote 5} Findings of Shantz and Fox (2018) are particularly relevant. They showed results not just for those below 50% of the poverty line but also for those with zero reported income—that is, those likely to be classified as extremely poor. They showed that SNAP receipt for individuals in the CPS with zero reported household income is significantly underreported and that TRIM3 significantly overadjusts imputed SNAP receipt for these households. According to the administrative data, the actual rate of SNAP receipt for those in zero reported income households is 58%. The reported rate in the survey falls well short of this (33%), but the TRIM3-adjusted rate is much higher (80%).

Shantz and Fox (2018) also showed that TRIM3 overimputes SNAP dollar amounts conditional on receipt, and this overstatement is sharpest for those at the bottom: for those in zero-income households that are recorded as receiving SNAP both by TRIM3 and in the administrative data, the average SNAP dollar amount from TRIM3 is 45% greater than the average amount from the administrative data. They also showed that TRIM3 overstates both receipt and dollar amounts of TANF for these households. For those with positive TANF benefits in both the administrative data and the TRIM3-adjusted data but zero reported income in the CPS, TRIM3 overstates TANF amounts by 235%.

Other Concerns With Outliers in the Distribution of Survey Income

Concerns about the accuracy of estimates of deep or extreme poverty that rely on survey data extend well beyond the underreporting of transfer income or the overallocation of imputed benefits. In fact, evidence from linked administrative and survey data has indicated that underreported earnings are by far the most important reason why survey-based estimates of extreme poverty are biased upward.^{Footnote 6} Using the Detailed Earnings Records (DER) from the Social Security Administration, Meyer et al. (2019) demonstrated that earnings are significantly underreported for households with very low reported income. Their results showed that using administrative data to correct for underreported earnings cuts estimates of extreme poverty by more than half. This effect on extreme poverty is far greater than the adjustment for underreported SNAP benefits. Meyer et al. (2019) also showed, using administrative data, that many other sources of income are underreported for those at the very bottom of the reported income distribution in surveys: adjusting for underreported retirement income and Social Security, for example, both lead to further reductions in the estimates of extreme poverty.^{Footnote 7} Moreover, there are several other programs for which Meyer et al. (2019) did not have administrative data, including unemployment insurance and workers’ compensation, and these programs have been shown to be significantly underreported in surveys (Meyer et al. 2015). These results indicate that if Brady and Parolin were to account for underreporting of earnings and other income sources, their estimates of extreme poverty would be lower.

The observable characteristics of households with very low reported income provide additional evidence that survey income fails to accurately identify those households that are the worst off. If survey income data accurately identified the most economically disadvantaged, then the extreme poor should look clearly worse off than households with more resources. When one examines the observable characteristics of the extreme poor based on survey income, however, this group does not appear to be worse off than all poor households. Indeed, for some characteristics, the extreme poor appear to be better off. For example, Meyer et al. (2019) showed that, on average, the education of the household head is greater among the extreme poor than it is for all poor households, which is surprising given that education is one of the strongest indicators of permanent income. Similarly, these authors found that the extreme poor reported fewer material hardships than the official poor, although the difference was not significant. Further, the extreme poor who were reclassified as no longer extreme poor after underreported earnings were accounted for reported material hardships at rates comparable with the U.S. average.

Conclusion

Understanding the extent and changes in deep and extreme poverty in the United States is critically important for designing effective social safety net policies. Recent studies have estimated deep and extreme poverty based on survey income, but a large and growing literature shows that because of underreporting at very low levels of income, survey data can lead to very misleading results. Underreporting of income at the bottom of the distribution could lead to an overstatement of deep and extreme poverty, a mischaracterization of changes in deep and extreme poverty if the reporting rates change over time, and an understatement of the impact of safety net programs. To address concerns about underreporting, researchers often use microsimulation models to impute the value of missing or underreported income. However, with the benefit of micro-level administrative data, multiple recent studies have shown that adjusting for underreported income through imputation can be inaccurate. Moreover, recent studies relying on linked survey and administrative data showed that many income sources besides government program income are underreported for those at the bottom of the reported income distribution. Accounting for these other income sources would lead to lower estimates of extreme poverty.

Despite these challenges, obtaining accurate measures of income in the far left tail is not hopeless. Administrative microdata for many key sources of income are now accessible to researchers, and efforts are well underway to link these data to many of the large national surveys that are our primary source of income statistics. These administrative microdata have the potential to provide more accurate measures of income in the left tail of the income distribution. However, these data are often limited in scope (i.e., only for a few states or a narrow period) and require special arrangements to access. One attractive feature of using microsimulation models and survey data is that they can be available for many years and for a representative sample of the entire country. As noted earlier, a key limitation of these models is that they typically rely on predicting income components to those who report no receipt without any information on who actually receives benefits. With survey data linked to administrative microdata, one could improve the quality of these simulations by using information on true receipt conditional on reporting no receipt. Mittag (2019) proposed a methodology for using administrative data to improve microsimulations (for an application, see Davern et al. 2019). Hokayem et al. (2015b) proposed a similar approach. As administrative microdata become easier to access and available for more components of income in more states and for longer periods, researchers will be able to provide a more comprehensive understanding of the extent of extreme deprivation in this country, how it has changed over time, and the degree to which our social safety net mitigates such deprivation.

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Notes

Other studies that relied on extreme outliers of survey income data include Fox et al. (2015) and Edin and Shaefer (2015).
Underreporting of income is evident in many national surveys, including the CPS, the Consumer Expenditure Survey, the Survey of Income and Program Participation (SIPP), the Panel Study of Income Dynamics (PSID), and the American Community Survey (ACS). See Meyer et al. (2015) for a summary.
For these results, 5.5% of individuals have reported income below 50% of the federal poverty line. See Meyer and Mittag (2019: table 2).
Many studies have used TRIM3 to address concerns about underreporting of transfers in the CPS. For example, see Sherman and Trisi (2015) or Shaefer and Edin (2018).
In section 5 of the online appendix to their article, Brady and Parolin conduct a sensitivity analysis to account for the possibility that TRIM3 overcorrects SNAP benefits at the bottom. In this exercise, they adjust down SNAP receipt for some zero-income households. For example, they note that for 2015, using the TRIM3-adjusted SNAP benefits, zero-income households account for 5% of all SNAP households in the CPS. Using the unadjusted amount of SNAP benefits, this rate is 3%. The authors then assume that the rate based on unadjusted SNAP benefits represents a lower bound and the rate based on TRIM3-adjusted benefits is an upper bound, and they adjust benefit receipt for the zero-income households downward to match the midpoint of this range. This approach, however, fails to fully correct for the overstatement of TRIM3-adjusted benefits at the bottom. The rate based on unadjusted SNAP benefits need not be a lower bound because although SNAP receipt is underreported at the bottom in the CPS, the reporting rate is even lower further up the distribution, as reported in Fig. 2. In fact, the results from Shantz and Fox (2018) indicated that based on administrative data, zero-income SNAP recipients account for 2.8% of all SNAP recipients, which is very close to the rate using their unadjusted CPS data for SNAP receipt (2.7%).
Other studies have found similar evidence on underreported earnings at the bottom of the distribution based on comparisons of survey and administrative data (Davies and Fisher 2009). Brady and Parolin cite several studies that concluded that earnings are overreported at the bottom of the income distribution (Bollinger 1998; Bollinger et al. 2014; Hokayem et al. 2015a). However, to assess the evidence on reported earnings, it is important to clarify whether the evidence is for the bottom of the reported income distribution or the bottom of the administrative earnings distribution. For their analyses, Brady and Parolin use the far left tail of the distribution of reported income. For this group, it would be nearly impossible to overreport earnings, particularly for those classified as extreme poor, because most of these individuals have zero reported earnings. Moreover, the evidence on overreporting at the bottom in the works that Brady and Parolin cite rests on the fact that many households report positive income in the CPS but have zero or low income according to the Detailed Earnings Records (DER), so the overreporting is for those at the bottom of the earnings distribution in the DER. These studies treat the DER as the truth in these cases. Recent evidence from Meyer et al. (2020) showed that the DER misses substantial earnings that are reported on W-2s, on 1040s, or in the CPS because the DER often misses entire jobs, misses the millions of unauthorized workers who file taxes, and includes only a portion of self-employment income.
Using IRS tax data and Social Security Administration data, Bee and Mitchell (2017) showed that retirement income is also significantly underreported; more than 40% of those who receive pension income fail to report it. The problem of underreported income is not unique to U.S. surveys. Studies of survey income data in Canada and Great Britain have also found evidence of underreporting of income (Brewer et al. 2006, 2017; Brzozowski and Crossley 2011).

References

Bee, A., & Mitchell, J. (2017). Do older Americans have more income than we think? (SESHD Working Paper #2017-39). Washington, DC: U.S. Census Bureau.
Bollinger, C. R. (1998). Measurement error in the Current Population Survey: A nonparametric look. Journal of Labor Economics, 16, 576–594.
Article Google Scholar
Bollinger, C., Hirsch, B., Hokayem, C., & Ziliak, J. (2014, May). Trouble in the tails? Earnings nonresponse and response bias across the distribution using matched household and administrative data. Paper presented at the annual meetings of the Society of Labor Economists, Arlington, VA.
Brewer, M., Etheridge, B., & O’Dea, C. (2017). Why are households that report the lowest incomes so well-off? Economic Journal, 127, F24–F49. https://doi.org/10.1111/ecoj.12334.
Article Google Scholar
Brewer, M., Goodman, A., & Leicester, A. (2006). Household spending in Britain: What can it teach us about poverty? Bristol, UK: Policy Press.
Google Scholar
Brzozowski, M., & Crossley, T. F. (2011). Viewpoint: Measuring well-being of the poor with income or consumption: A Canadian perspective. Canadian Journal of Economics, 44, 88–106.
Article Google Scholar
Davern, M. E., Meyer, B. D., & Mittag, N. K. (2019). Creating improved survey data products using linked administrative-survey data. Journal of Survey Statistics and Methodology, 7, 440–463.
Article Google Scholar
Davies, P. S., & Fisher, T. L. (2009). Measurement issues associated with using survey data matched with administrative data from the Social Security Administration (Social Security Bulletin, Vol. 69, No. 2). Washington, DC: Social Security Administration.
Edin, K. J., & Shaefer, H. L. (2015). $2.00 a day: Living on almost nothing in America. New York, NY: Mariner Books.
Google Scholar
Fox, L., Wimer, C., Garfinkel, I., Kaushal, N., Nam, J., & Waldfogel, J. (2015). Trends in deep poverty from 1968 to 2011: The influence of family structure, employment patterns, and the safety net. Russell Sage Foundation Journal of the Social Sciences, 1(1), 14–34.
Article Google Scholar
Hokayem, C., Bollinger, C., & Ziliak, J. P. (2015a). The role of CPS nonresponse in the measurement of poverty. Journal of the American Statistical Association, 110, 935–945.
Article Google Scholar
Hokayem, C., Raghunathan, T., & Rothbaum, J. (2015b). Sequential regression multivariate imputation in the Current Population Survey Annual Social and Economic Supplement. In JSM Proceedings, Survey Research Methods Section (pp. 1329–1343). Alexandria, VA: American Statistical Association. Retrieved from http://www.asasrms.org/Proceedings/y2015f.html.
Meyer, B. D., & Mittag, N. (2019). Using linked survey and administrative data to better measure income: Implications for poverty, program effectiveness, and holes in the safety net. American Economic Journal: Applied Economics, 11(2), 176–204.
Google Scholar
Meyer, B. D., Mittag, N., & Goerge, R. M. (2018). Errors in survey reporting and imputation and their effects on estimates of food stamp program participation (NBER Working Paper No. 25143). Cambridge, MA: National Bureau of Economic Research.
Meyer, B. D., Mok, W. K. C., & Sullivan, J. X. (2015). Household surveys in crisis. Journal of Economic Perspectives, 29(4), 199–226.
Article Google Scholar
Meyer, B. D., & Sullivan, J. X. (2003). Measuring the well-being of the poor using income and consumption. Journal of Human Resources, 38(Special issue), 1180–1220.
Article Google Scholar
Meyer, B. D., & Sullivan, J. X. (2011). Viewpoint: Further results on measuring the well-being of the poor using income and consumption. Canadian Journal of Economics, 44, 52–87.
Article Google Scholar
Meyer, B. D., Wu, D., & Medalia, C. (2020). Understanding poverty by linking survey, tax and program data (Working paper). Chicago, IL: Harris School of Public Policy, University of Chicago.
Meyer, B. D., Wu, D., Mooers, V. R., & Medalia, C. (2019). The use and misuse of income data and extreme poverty in the United States (NBER Working Paper No. 25907). Cambridge, MA: National Bureau of Economic Research.
Mittag, N. (2019). Correcting for misreporting of government benefits. American Economic Journal: Economic Policy, 11(2), 142–164.
Google Scholar
Shaefer, H. L., & Edin, K. (2018). Welfare reform and the families it left behind. Pathways, 2018(Winter), 22–27.
Google Scholar
Shantz, K., & Fox, L. E. (2018). Precision in measurement: Using state-level Supplemental Nutrition Assistance Program and Temporary Assistance for Needy Families administrative records and the Transfer Income Model (TRIM3) to evaluate poverty measurement (SEHSD Working Paper #2018-30). Retrieved from https://www.census.gov/content/dam/Census/library/working-papers/2018/demo/SEHSD-WP2018-30.pdf.
Sherman, A., & Trisi, D. (2015). Safety net more effective against poverty than previously thought: Correcting for underreporting of benefits reveals stronger reductions in poverty and deep poverty in all states (Center on Budget and Policy Priorities report). Retrieved from https://www.cbpp.org/research/poverty-and-inequality/safety-net-more-effective-against-poverty-than-previously-thought.
Stevens, K., Fox, L. E., & Heggeness, M. L. (2018). Precision in measurement: Using state-level SNAP administrative records and the Transfer Income Model (TRIM3) to evaluate poverty measurement (SEHSD Working Paper #2018-15). Retrieved from https://www.census.gov/content/dam/Census/library/working-papers/2018/demo/SEHSD-WP2018-15.pdf.

Download references

Acknowledgments

I thank Liana Fox, Jeehoon Han, Bruce Meyer, and Nikolas Mittag for helpful comments and suggestions.

Author information

Authors and Affiliations

Department of Economics and the Wilson Sheehan Lab for Economic Opportunities (LEO), University of Notre Dame, Notre Dame, IN, 46556, USA
James X. Sullivan

Authors

James X. Sullivan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James X. Sullivan.

Ethics declarations

Conflict of Interest

The author declares that he has no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sullivan, J.X. A Cautionary Tale of Using Data From the Tail. Demography 57, 2361–2368 (2020). https://doi.org/10.1007/s13524-020-00926-z

Download citation

Published: 15 October 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s13524-020-00926-z

Use our pre-submission checklist

Avoid common mistakes on your manuscript.