1 Introduction

The non-take-up of benefit programs is an important aspect of state support programs; if eligible households do not take up transfers, such programs are ineffective and the basic needs of population groups may remain unaddressed. Eurofound (2015) shows that benefit non-take-up is internationally pervasive and frequently affects more than 40% of the eligible population. This has motivated a long-standing discussion of the extent and determinants of take-up behavior in the international literature (e.g., Moffitt 1983; Blundell et al. 1988; Hernandez et al. 2007). We contribute to that literature and address the research question of how measurement error in reported benefit receipt biases survey-based analyses of take-up behavior. Our unique data provide unusually precise information on true program participation.

A key challenge in the empirical analysis of non-take-up is its measurement. A correct measurement of non-take-up requires valid information on both program eligibility and program take-up. Most non-take-up studies must rely on survey data to measure both concepts. However, several factors can generate measurement errors in survey data and lead to imprecise estimations of program eligibility as well as program take-up. Several studies discussed the relevance of misreporting for the reliability of survey data, see, e.g., Meyer et al. (2015), Meyer and Goerge (2011), Taeuber et al. (2004), or Card et al. (2001). Regarding mismeasurement in program take-up, Meyer et al. (2015) pointed out that data from household surveys missed the measurement of approximately half of all welfare and food stamp payments in major household surveys in the USA. Given the importance of mismeasurement in reported program take-up, our paper focuses on its consequences for the analysis of benefit non-take-up.

Several factors contribute to the mismeasurement of benefit receipt (Bound et al. 2001). Surveys often ask respondents whether they have received benefits during a certain period in the past. Respondents may have completely forgotten past benefit receipt (recall bias), or they may not remember the exact dates of receipt. For example, events can be reported as more recent than they actually occurred, which is known as the “forward telescoping bias” in the survey literature (Bradburn et al. 1994). This form of bias could lead to mismeasurement in the form of misreporting of benefit take-up. Additionally, if different benefits are available or benefits can be claimed simultaneously, beneficiaries might report incorrectly, claiming specific benefit(s) that they did not receive while inadvertently omitting the benefit(s) that they did receive, which ultimately leads to misreporting (Hancock and Barker 2005; Krafft et al. 2015). Another source of misreporting is the “social desirability bias” (Bound et al. 2001). In particular, the receipt of means-tested social welfare benefits is often perceived as stigmatizing and thus respondents may underreport their receipt of these benefits.

The literature on program non-take-up and program participation has proposed several approaches to deal with potential measurement biases. We contribute to the set of studies that apply external validation samples for the study of measurement errors in benefit receipt. Other approaches rely on structural modeling (e.g., Duclos 1995, 1997; Pudney 2001; Bollinger and David 2001; Hernandez and Pudney 2007) or focus on careful data handling (Hancock and Barker 2005).

External validation samples can be used to assess the extent of measurement errors in survey data and to mitigate their effects in analyses. Studies that employ this approach are scarce because this requires linking survey data to administrative data. Linked data are often not available or only available for specific groups, periods, or regions. One example of this approach is Bollinger and David (1997). They take advantage of data with information on true participation and survey responses. Information on response errors in the validation sample is then considered in the likelihood function for the primary sample. The authors find that modeling response errors generates large differences in the estimation of program participation even when the validation data are gathered on a sample that differs from the survey data. Other examples are Mittag (2016) and Meyer and Mittag (2017a). Both investigate different methods to account for misclassification in survey data if linked data are not available to the researcher. They use validation data to evaluate the effectiveness of their formulas for bias reduction. The studies closest to our approach are Meyer and Mittag (2017b) and Meyer et al. (2018). The former use linked data to correct survey data from the Current Population Survey (CPS) on reported benefit receipt from different programs and cover the state of New York. They find that the poverty-reducing effect of benefit programs is nearly doubled using the corrected data but do not study take-up behavior. In contrast, Meyer et al. (2018) only focus on the food stamp program, but link data from three different surveys (American Community Survey, Current Population Survey, and Survey of Income and Program Participation) with administrative data covering the states of Illinois and Maryland. The authors find underreporting rates of between 23% and 50% depending on the survey. They study the reasons for misreporting and investigate the impact of misreporting on program take-up estimations. The authors find that particularly single parents, non-whites, and the elderly understate participation which generates biases in program receipt regressions: Underreporting is part of the explanation for low benefit receipt among the elderly.

In this study, we similarly link survey data to administrative data, which informs us about the true program take-up of survey respondents. This allows us to determine precisely when survey information differs from actual benefit receipt. Thus, we can determine the presence of measurement errors and misreporting directly. We consider a general income support program that is available for the working-age population in Germany (Unemployment Benefit II). This general benefit is less subject to the risk of benefit confusion than specific transfers available, e.g., for retirees only, which have been discussed in the literature (e.g., Duclos 1997; Hancock and Barker 2005). We investigate whether the extent and pattern of program non-take-up differ after correcting for the misreporting of survey respondents. To the best of our knowledge, our study is the first to use linked data to investigate the impact of mismeasurement on program non-take-up.Footnote 1

Relative to the extant literature, the value added by our contribution relates to several aspects. First, we offer evidence from a European country where social norms and the cost–benefit setting with respect to public transfers may differ from that in the USA, the origin of the previous literature. Second, we can take advantage of a nationwide program with nationwide validation data from administrative sources. Prior studies in the USA are typically based on evidence from regional units. Third, we refer to prior studies on take-up behavior and their common empirical specification and investigate to what extent the results may be biased. This differs from the existing literature in that we limit our analysis to the eligible population identified using simulation techniques. Thus, our analysis does not examine program participation within the general population, i.e., program targeting, but rather the claiming behavior of the eligible population. Finally, we offer evidence on the direction of the bias introduced by misreporting and study the correlation patterns of misreporting itself.

We find that correcting for misreporting in the data modifies the results of take-up regressions. The marginal effects of characteristics associated with benefit take-up often deviate by more than 30% after correcting for measurement error in the outcome. These results are robust to various changes in procedure. Furthermore, we find evidence that the patterns determining misreporting of program take-up are reflected in the sensitivity of marginal effects to the data correction. In particular, we observe that reporting behavior varies with age, income, and immigrant status as in Meyer et al. (2018); those who are more likely to rely on benefits in a less permanent way are most likely to underreport. Therefore, uncorrected estimations may misleadingly lead us to overestimate the stability of welfare receipt and suggest that welfare is less a protection against income shocks and volatility and more a source of subsistence income.

Our results are important for several reasons. First, they show that survey data can yield biased results in the study of take-up behavior based on self-reported information; this confirms prior studies which investigated the role of misreporting on program take-up in the overall population (e.g., Meyer et al. 2018). Second, our findings are more reliable than prior contributions that are not based on linked survey and administrative data. Third, we cover a general and well-known nationwide benefit program. Fourth, we show that the patterns of misreporting and the estimation biases can be related. The coefficients that are the most biased in take-up equations are those associated with the misreporting groups’ characteristics. Thus, even though misreporting rates may be lower than in other settings studied, the biases are nevertheless sizeable.

We structure our paper in six sections. In the next section, we briefly characterize the benefit program considered in our analysis. We lay out our empirical approach in Sect. 3. Section 4 describes the nature of our data and provides descriptive statistics. Section 5 presents our empirical results and robustness tests. We draw conclusions in Sect. 6.

2 Institutional background

We study the take-up of the German minimum income support program Unemployment Benefit II (UB II). The transfer is available for working-age individuals who are able to work and their families. Alternative programs cover persons who have reached retirement age or are unable to work. UB II eligibility exists if a household’s net income is below the legally determined minimum; in this situation, the benefit covers the difference. The benefits are means tested. Almost all types of income are considered in the means test. The minimum income deemed sufficient to guarantee an acceptable minimum living standard for a household is calculated based on the number of household members and—for minors—their age. In 2018, the standard benefit for an adult is 416 euros per month. Expenses for rent, heating, and health care are paid in addition to the standard benefit; benefits can be higher in special circumstances (e.g., for single-parent families, pregnant women, or those with special food requirements). Households with more than a maximum amount of wealth are not eligible; wealth comprises financial assets plus the value of owned property minus mortgage liabilities. Self-occupied residential property is ignored in the wealth check in most cases. Eligibility is not conditional on unemployment. In 2016, approximately 41% of regular benefit recipients were unemployed and 28% received the benefit to top up (insufficient) earnings from employment. Others were temporarily unable to work, e.g., because of child care obligations (Statistik der Bundesagentur für Arbeit 2018).Footnote 2

The UB II program follows federal regulations and is administered either by the employment services of the Federal Employment Agency or by the municipality. In 2016, the program covered approximately 6.2 million individuals in 3.3 million households and paid out approximately 35.2 billion euros (STBA 2018). Thus, UB II is a well-known program that is generally available to the entire working-age population which is able to work. As a large part of the German population receives UB II, the program is comparable to programs such as Medicaid and SNAP (see Table 14), which are the subject of recent studies on benefit misreporting based on linked data (Meyer et al. 2018; Davern et al. 2019). In contrast to these programs, UB II provides a means-tested basic income that is the most important source of income for many households. The UB II benefit level is thus also significantly higher than, for example, for SNAP (Table 14). This may lead to a relatively small measurement error for UB II in survey data if the household is well informed about the benefit.

Recent studies on take-up of UB II using survey data from the German Socioeconomic Panel (SOEP) show that based on monthly data, between 46 and 58% of eligible households did not take up the benefit in the years 2005–2007 (e.g., Bruckmeier and Wiemers 2012). The authors find that take-up varies with the potential benefit amount and the expected duration of eligibility expressed in proxy variables such as education and region of residence.

3 Empirical approach

In recent decades, a large number of empirical studies on the determinants of (non-)take-up have been conducted for a wide range of means-tested benefits (see, e.g., Blundell et al. 1988; Blank and Ruggles 1996; Riphahn 2001; Wilde and Kubis 2005; Whelan 2010; Bruckmeier and Wiemers 2012). All survey-based studies of take-up behavior have to address the problem that the data do not provide information about benefit eligibility. The studies therefore simulate welfare eligibility for every household in the dataset using a microsimulation model. Then, given a model of welfare eligibility, the literature typically defines benefit non-take-up as being eligible according to the simulation model while reporting non-receipt of the benefit in the survey data.

Following Blundell et al. (1988), we model the take-up decision in a discrete choice framework to analyze determinants of take-up behavior. This approach assumes that benefits are taken up if the household’s net utility from claiming exceeds the utility when not claiming the benefit. The net utility takes into account that claiming a benefit is associated with non-monetary costs, which occur, e.g., because of insufficient knowledge of entitlement rules, the claiming process, the administrative procedures, and stigma costs, i.e., the fear of stigmatization and negative societal attitudes toward welfare dependence (see van Oorschot 1991).

Since neither the utility nor the costs associated with claiming a benefit can be directly observed, empirical analyses of take-up behavior use proxy variables to capture both. We build on the existing empirical literature for choosing the proxy variables that influence the take-up decision (Riphahn 2001; Wilde and Kubis 2005; Frick and Groh-Samberg 2007; Whelan 2010; Bruckmeier and Wiemers 2012) to enhance the external validity of our analysis. The most obvious factor affecting utility when claiming UB II is the household’s benefit entitlement (see, e.g., Blundell et al. 1988). We follow the standard procedure in the literature and simulate the level of benefit entitlements for all households in the survey data, i.e., for households reporting benefit receipt and for households not reporting benefit receipt. This ensures that the level of entitlement for both types of households is determined according to a uniform measurement concept.Footnote 3

In addition, utility and costs of claiming may vary by household type, i.e., whether it is a single or a couple household and whether or not children are present. Therefore, we consider indicators of household type. As general sociodemographic indicators, we account for age, education, an indicator for first- or second-generation immigrants, and disability of the household head. We also control for actual rent paid and home ownership. We expect higher take-up rates in Eastern Germany due to higher unemployment there and thus control for residence in Eastern Germany.

We follow the literature (e.g., Blundell et al. 1988; Bollinger and David 1997, 2001; Duclos 1995; Pudney 2001) in assuming an i.i.d. standard normal error term to capture the unobserved part of the take-up decision.Footnote 4 Furthermore, we account for the potential endogeneity of the simulated benefit. The simulated benefit amount is endogenous if unobserved factors that influence the take-up decision are correlated with unobserved factors determining the household’s labor supply. In this case, the simulated benefit amount would be endogenous because it depends on the household’s simulated net income, which is a function of earned income.

Thus, our first specification for the empirical take-up equation is an IV probit estimation (see, e.g., Whelan 2010 and Bruckmeier and Wiemers 2012). The estimation of the IV probit requires the choice of credible instruments for the level of entitlement. Following Bruckmeier and Wiemers (2012), we use (a) the level of the regular benefit entitlement before own income is deducted, (b) non-earned public transfers, i.e., income independent of current labor supply including public and private pensions, and (c) private transfers from persons outside of the household. These are important determinants of the simulated UB II entitlement, and thus, these instruments satisfy the requirement that the instrument must be (strongly) partially correlated with the endogenous variable.

In a second specification, we utilize the panel structure of our data by adding random effects to the IV probit model (RE IV); this accounts for unobserved heterogeneity at the household level.Footnote 5

The endogeneity of the benefit level for the take-up decision is often ignored in the literature on empirical models of take-up behavior, either because of the lack of credible instruments or because accounting for endogeneity complicates the estimation of the model substantially. Thus, to demonstrate the impact of misreporting on take-up models that do not consider the benefit level as endogenous, we also estimate a pooled probit model and a RE probit model (see Table 9).

4 Data and sample

4.1 Data

We use data from the household panel study “Labour Market and Social Security” (“Panel Arbeitsmarkt und soziale Sicherung,” PASS), a survey designed for research on unemployment and poverty (Trappmann et al. 2010, 2013; Berg et al. 2014 for technical documentation). The first survey of this study interviewed more than 12,000 respondents in 2006–2007. The seventh survey wave was completed in 2013 (for interview periods, see Table 5). Because the survey instruments and interview program were revised after the first wave (Gebhardt et al. 2009), we only use surveys 2–7.

The data consist of two subsamples. The first subsample considers UB II recipients, while the second subsample covers the overall German population, oversampling those with low socioeconomic status. The UB II sample is randomly drawn from the administrative records of the Federal Employment Agency. To retain a representative character for the population of UB II recipients, subsample one is refreshed each year to include new recipients of UB II (benefit-inflow-sample).

The general population sample is a random draw from a database of addresses of private households in Germany. It is provided by a commercial provider in wave one and is taken from municipality population registers in wave five (refreshment sample). For a detailed description of the sampling design, see Gebhardt et al. (2009).Footnote 6 The final weights we use in the descriptive analysis balance distortions arising from the sample design and reflect the entire German population. All regression analyses use unweighted data; a robustness test discussed in Sect. 5.3 evaluates whether this affects the results.

The PASS data are particularly suitable for our analyses because they focus on potential beneficiaries living in low-income households. Beste et al. (2018) find that the income distribution in the PASS data (starting with wave 2) is similar to that of two other data sources (SOEP and Mikrozensus). Furthermore, PASS interviews respondents about their current welfare receipt, and it allows us to link survey data with administrative records on welfare receipt. The interviewers ask the head of household whether and for how long the household has received UB II since the last survey or in the last two years, and whether the household currently still receives UB II. Interviewers determine the head of the household during the household’s first participation in the survey as the person who is best informed about the household finances. The PASS gathers information on UB II receipt via “dependent interviewing,” i.e., interviewers remind the head of the household of the answer in the previous interview prior to asking about current receipt (Berg et al. 2012). This form of interviewing should result in reduced misreporting of program take-up (Lynn et al. 2012).

In our analysis sample, we consider household observations with realized personal interviews. We drop household observations whose heads are above age 65 years or in receipt of retirement benefits, students, and individuals pursuing apprenticeship training because these groups benefit from alternative transfer programs. We require that the household responds to the question on current welfare receipt, that there is only one benefit-receiving unit (“community of need”) in the household, and that there is valid information on earnings (see Appendix 1 for details). Across waves 2–7, our sample covers 30,878 annual household-level observations overall and approximately 5000 observations per year. For each household-year observation, we simulate UB II benefit eligibility and calculate potential benefit amounts. This yields 17,585 UB II eligible household-year observations.

4.2 Data linkage

The opportunity to link survey with administrative data is rare in the literature. In particular, we are able to link the PASS survey data to the administrative records of the Federal Employment Agency. The data, originally collected at local job agencies (“job centers”), contain information on claims for UB II. The data perfectly reflect official payments. The Institute for Employment Research (IAB) and the Research Data Center (FDZ) of the German Federal Employment Agency (BA) at the IAB have access to these administrative data and are responsible for processing, anonymizing, and providing it for empirical research. For our analysis, we link the survey data to administrative data of the “Unemployment Benefit II Recipient History” (“Leistungshistorik Grundsicherung,” LHG, version 11.01.01-150220) of the IAB (Antoni et al. 2016). The administrative UB II data contain information on socioeconomic variables of eligible individuals and regional variables. To measure UB II over- and underreporting, we compare the information the respondents provide on UB II receipt at the month of the interview with the information on UB receipt in this month gained from the LHG.

Because of legal constraints, the survey information can only be linked to the administrative data if the participant consented to linkage in the survey. Therefore, interviewers ask participants during the course of the interview for consent to merge their survey data to their administrative data that are available at the IAB (for details, see Appendix 2). The consent rate in the PASS is approximately 80%, which is comparable to other survey studies (Berg et al. 2014; Sakshaug and Kreuter 2012). In our sample of simulated eligible households, we have a consent rate of 83.4% (see Table 6). Because respondents who do not agree to the data linkage are asked again in the next wave, the proportion of observations for which an approval is available is significantly higher. Overall, we could not use only 4% of all household-year observations of eligible households because of missing consent to data linkage. This leaves us with 16,874 household-year observations of simulated eligible respondents who agreed to the linkage.

Next, we merge these 16,874 observations with a key file generated by the German Record Linkage Center (Antoni and Schnell 2019). This key file is based on the identification of the PASS respondents in administrative data of the IAB. To identify respondents in the administrative data, harmonized information on addresses and personal characteristics from different administrative data sources collected by the Federal Employment Agency are used. Individuals who never worked in dependent employment, who are exempt from social security contributions (e.g., civil servants), or who have never been registered as unemployed or benefit recipients are not in the data. The record linkage is based on multilevel deterministic and probabilistic methods for linking datasets (see Sakshaug et al. 2017 for a detailed description and Appendix 2). From our sample of 16,874 household-year observations of respondents who agreed to data linkage, we identified 15,925 observations in the administrative data, which amounts to a linkage rate of 94%. Of the 15,925 matches, 15,095 were unique matches and 830 were duplicates, which were corrected following a procedure described in Appendix 2. As a robustness check, we verify our main findings for a sample without duplicates in Sect. 5.3. From our sample of 15,925 linked observations, we keep 14,834 observations with no missing values in the covariates for our descriptive results and the regression analysis. Finally, we drop 270 observations for whom the gender and age information of survey data and administrative data in the LHG does not match. After these steps, the analysis sample comprises 14,564 observations, which represents 83% of the simulated eligible population sample (17,585).

A potential problem of the data linkage is that results may be biased because of selectivity in either non-consent or non-identifiability in the administrative data. With respect to non-consent, misreporting of benefit receipt might be biased downwards if non-consent to the data linkage is positively correlated with the misreporting of benefit receipt: Households who do not want to admit to receiving UB II might also be reluctant to agree to data linkage if they fear that their misreporting might be discovered. Column 1 of Table 8 indicates correlation patterns underlying the probability of not giving consent to the data linkage. We find some statistically significant and small effects: Immigrants, younger, non-disabled persons, those living in single households, or those living in Western Germany have a higher probability to refuse consent. This suggests that our analysis is a rather conservative estimate of misreporting because individuals with low data linkage probability and who are underrepresented in the data generally tend to have a high propensity to misreport.

With respect to non-identifiability in administrative data, column 2 of Table 8 shows the correlation patterns behind the probability that a household cannot be linked to the administrative data for the sample of simulated eligible households with consent to data linkage. Here, we find no significant marginal effects in most sociodemographic and household characteristics except for some age groups, migration background, home owners, and the subsample two indicator.

Overall, these results indicate small systematic effects; thus, we conservatively underestimate misreporting and its correction. We will provide two robustness checks in Sect. 5.3 concerning the potential selectivity of data linkage.

4.3 Over- and underreporting

Next, we compare UB II program take-up as reported in the survey, with the information in the administrative data. In our data, 11,106 respondents reported benefit receipt in the survey and—based on administrative records—actually received benefits in the month of the interview (take-up households). Additionally, 2277 respondents reported not claiming the benefit, which is confirmed by the information from the administrative data (non-take-up households). A group of 851 respondents did not indicate receipt in the survey, but actually received benefits in the month of the interview based on administrative data (underreporting households). This results in an underreporting rate of 7.1 (7.6)% relative to all true recipients without (with) survey weights being applied. Compared to other benefit programs, this rate is rather low. For example, Meyer et al. (2018) find underreporting rates for food stamp program take-up between 23 and 50% in US surveys. Parolin (2019) confirms high underreporting of the US Food Stamp Program (SNAP) of about 38%: The share of SNAP recipients in the CPS is 11.1% compared to almost 18% in the administrative data. Comparing EU Statistics on Income and Living Conditions (EU-SILC) with administrative data, Tasseva (2016) reports that the number of benefit recipients of important programs in Bulgaria is underreported by between 3 and 40%. For the UK, Brewer et al. (2017) show that particularly, tax credits, means-tested benefits, as well as disability or health benefit have low coverage rates in the Living Costs and Food Survey (LCFS).

In addition to underreporting, 330 benefit-eligible respondents claimed to receive UB II in the survey, but they did not receive benefits according to the administrative records (overreporting households). Various mechanisms may cause overreporting. First, it may result if respondents are mistaken about the period when they actually received the benefit. The administrative data show that for 83 observations (25%) the household received UB II not during the month of the interview, but in the previous or in the following month. Hence, the telescoping bias could be one important source of overreporting in our data. Second, the method of dependent interviewing can also contribute to overreporting when former recipients incorrectly confirm the preloaded information on UB II receipt in the last year (Eggs and Jäckle 2015). Third, overreporting could result from incorrect linkage. However, a closer look at the match types used in the data linkage across the three groups does not reveal large differences: While 96.5% of the observed take-up households are linked by “gold-standard” or deterministic match,Footnote 7 this share is 96.4% for the underreporting households and 94.2% for the overreporting households. This could indicate that inaccurate answers may be more common in the survey than errors in the matching process.

Table 1 shows descriptive statistics of our explanatory variables used in the regression analysis for the full sample and separately for households with and without benefit program take-up and for those underreporting and overreporting benefit program take-up. The four subgroups differ in their characteristics. Interestingly, we find some similarities between non-take-up households (column 3) and the underreporting households (column 4). Compared to take-up households, the latter two groups have significantly lower simulated benefit entitlements, younger household heads, a lower (higher) share of household heads with lower secondary education (upper secondary education), and a larger share of families with children. This similarity suggests that a take-up regression erroneously classifying underreporting households as non-take-up households overestimates the heterogeneity between the take-up and non-take-up groups, i.e., after correcting underreporting, we expect that the take-up regression yields coefficients of smaller magnitude. An exception are the differences concerning the immigration status of the head of the household: While we find a significantly lower share of immigrants among the non-take-up households compared to take-up households, significantly more household heads who are first-generation immigrants belong to the underreporting households. For the overreporting households, we find several significant differences compared to take-up households. However, the differences in their characteristics make them comparable neither with underreporting households nor with non-take-up households.

Table 1 Descriptive statistics: covariate means.

The next section describes our analysis results. First, we describe the extent to which benefit program take-up as reported in survey data must be corrected based on information from administrative data. Then, we look at the effect of correcting the dependent variable on the correlation patterns of non-take-up behavior and investigate the robustness of these results. We describe the characteristics of those misreporting benefit receipt in the last step.

5 Results

5.1 Descriptive effects of data correction

In Table 2, we report the simulated group-specific UB II non-take-up rates for the sample that could be linked to administrative records. Column 1 shows the shares before considering corrections for UB II program take-up misreporting, column 2 shows the rates after correction for underreporting only, and column 3 shows the rates after corrections for under- and overreporting. Initially, we observe an overall weighted non-take-up rate of 40% (see bottom of column 1 of Table 2) with substantial heterogeneity across subgroups: We observe the highest rate of non-take-up for couples without children (64%), while single-parent households feature the lowest rates of benefit non-take-up (30%). The size of the non-take-up rate and the variation over the subgroups is in line with findings based on other data (see, e.g., Bruckmeier and Wiemers 2012 and the literature cited there).Footnote 8

Table 2 Group-specific non-take-up rates before and after correction of misreporting.

We use our administrative data on actual benefit receipt to correct for misreporting of program take-up in the survey.Footnote 9 The correction of the underreporting results in a non-take-up rate of 35%, i.e., a reduction by five percentage points (see column 2 of Table 2). If we additionally reclassify the overreporting households as non-take-up households, this increases the non-take-up rate again to 37% (see column 3 of Table 2). Hence, program take-up misreporting reduces the overall non-take-up rate from approximately 40% to approximately 37%. Thus, misreporting caused us to overestimate the UB II non-take-up rate by approximately 3% points, or 8%. The extent of the correction in the non-take-up rate varies across subgroups (see the last two columns of Table 2). The relative decline in non-take-up rates ranges from 2% for disabled heads of the household to approximately 17% for households whose head is a first-generation immigrant.

Figure 1 depicts the distribution of simulated benefit entitlements for those taking up the benefit according to administrative data and reporting it (take-up households) or not reporting it (underreporting household), and those who did not take up the benefit according to administrative data and did not report it in the survey (non-take-up households) or reported it in the survey (overreporting households). As expected, we observe the highest benefits among households who claim their benefits with a median value of 722 euros (see the top panel of Fig. 1). The distributions of benefit entitlements for the non-take-up and underreporting households yield a large share of households with small claims and median claims of 415 and 563 euros, respectively. The median of the overreporting households lies in between with 658 euros.

Fig. 1
figure 1

Distribution of simulated monthly benefit entitlements by type of household (HH). Notes: HH stands for household. Fifty-seven outlier observations with monthly entitlements above 1700 euros excluded (51 take-up-HH, 3 non-take-up-HH, 2 underreporting-HH, 1 overreporting-HH). Weighted values using cross-sectional sample weights

5.2 Patterns of benefit take-up and the effects of data correction

Table 3 presents the estimation results of our take-up model. We regress a binary indicator of benefit take-up on household characteristics in the sample of 14,564 pooled observations of benefit-eligible households. In column 1, we present the estimated marginal effects of an IV probit estimation with cluster-robust standard errors; in column 5, we show the estimates of a random effects IV probit (RE-IV) estimation; and in columns 2 and 6, both estimation approaches are repeated, now using the corrected dependent variable.

Table 3 Take-up regression: marginal effects before and after correction of misreporting.

First, the correlations between the first-stage and the second-stage error terms (“\( \rho_{1,2} \)”) are statistically highly significant for all estimated models. Accordingly, the corresponding Wald tests (“Wald test of exogeneity: \( \chi^{2} \left( 1 \right) \)”) strongly reject the null hypothesis of exogeneity for the simulated benefit.Footnote 10 Thus, we prefer the IV models in Table 3 to the non-IV models presented in Table 9 for all specifications (uncorrected and corrected models). Second, a comparison of the pooled and random effects probit models in columns 1 and 2 versus 5 and 6 reveals the importance of controlling for unobserved heterogeneity at the household level. The results of the random effects estimation in columns 5 and 6 allow us to reject the pooled model of take-up in columns 1 and 2: The share of the total variance (“panel variance share \( \rho \)”) contributed by the household-level variance reaches 54 and 71% in columns 5 and 6, respectively. It is highly statistically significant at the one percent level in both cases. Therefore, the RE-IV probit is our overall preferred model when interpreting the marginal effects of the determinants of take-up.

We start by briefly discussing the estimated marginal effects for the models prior to correcting the dependent variable for UB II program take-up misreporting (see columns 1 and 5). In general, the results across the two estimation approaches are somewhat similar and the signs of the estimated marginal effects meet our expectations. The marginal effect of the simulated benefit entitlement in the uncorrected IV probit model (column 1) implies that raising the entitlement by 100 euros per month increases the probability of program take-up by 5.1% points. The RE probit (column 5) results in a smaller marginal effect (4.7% points). The propensity of benefit program take-up is positively correlated with lower education, the advanced age of the head of household, or the presence of young children in the household. The size of the effects is broadly in line with other studies on the determinants of taking up UB II (e.g., Bruckmeier and Wiemers 2012, 2017). Those residing in East Germany also have a higher propensity to take-up benefits than their counterparts in the West, which may reflect the relatively poor labor market situation and lower earnings expectations. Bruckmeier and Wiemers (2017) find similar relationships based on data from the SOEP.

Interestingly, disability status is not correlated with benefit program take-up. We observe significant marginal effects of household composition. Ceteris paribus, the likelihood of claiming the benefit is significantly lower for couples with or without children compared to single person households, i.e., the reference group. This result might reflect the importance of a potential second earner for the program take-up decision in couple households. Additionally, single parents have a higher probability of program take-up.Footnote 11

Next, we study the results when we consider an indicator of benefit take-up corrected for misreporting. We now recode all eligible households who report no receipt in the survey data but who actually receive benefits according to administrative records as take-up households. Conversely, households reporting the benefit in the survey but who do not receive UB II according to the administrative data are reclassified as non-take-up households.

Columns 2 and 6 of Table 3 present the estimated marginal effects, and columns 3 and 7 show the differences between the corrected and uncorrected marginal effects and their statistical significanceFootnote 12 and columns 4 and 8 present the relative change in individual marginal effects when the corrected instead of the original take-up measure is used.

In the pooled IV and the RE-IV estimation, some of the marginal effects change in economically and statistically significant ways when we correct the dependent variable (columns 3 and 7). For both estimators, the largest absolute change results for the indicators of advanced age versus young age and for highly educated versus low educated heads of households. The majority of the marginal effects of the RE-IV estimation that are statistically significant in the uncorrected as well as the corrected model change by more than 30%. Meyer and Mittag (2017a) study the impact of misclassification corrections on probit coefficient estimates in food stamp take-up regressions. They find that estimates are mostly attenuated due to non-random misreporting. We do not confirm this pattern: The marginal effects in columns 1 and 5 of Table 3 appear to be generally larger than the marginal effects obtained after data correction.

Thus, correcting for misreporting affects not only the level of non-take-up but also the impact of the correlates of the take-up decision. Therefore, we empirically corroborate the findings of Pudney (2001) Monte Carlo simulations that even moderate measurement errors in reported benefit receipt can lead to strong biases in coefficient estimates.

5.3 Robustness tests

We offer several types of robustness tests. First, we repeat the estimations in Table 3 using sample weights. The direction of all statistically significant marginal effects is robust to adding sample weights (Table 10). For most variables, the extent of the impact and the significance also remain at a similar level. Our key interest is the effect of correcting the dependent variable. The comparison of the changes in the marginal effects in Table 10 (see columns 3 and 7) with Table 3 shows that most of the results are robust.

Second, we address concerns about the potential selectivity due to (a) missing consent to the data linkage and (b) failed linkage because not all households that agreed to the data linkage could actually be linked (see Sect. 4.2 and Appendix 2). We reestimate our take-up equation for the full sample of eligible households before data linkage and thus without corrections. Table 11 shows the take-up estimation results for the full sample of households that are simulated as eligible to UB II before data linkage. Thus, the results in Table 11 are independent of the linkage procedure and are comparable to the typical situation in which administrative data are not available. Columns 1 and 3 show the results for all eligible households, while columns 2 and 4 again show the results of the linked sample from Table 3. With the exception of the marginal effect for the family without children indicator in the RE-IV model, the significance of the marginal effects does not change. Additionally, the magnitude of all marginal effects changes only slightly. Thus, we find no evidence that there is selection into linkage based on reported receipt.

In two further robustness checks, we analyze whether false matches in the linked data may bias our estimates. In the third robustness check, we only keep observations that were identified in the matching procedure based on the gold-standard or deterministic procedure, i.e., the most reliable matches (see Appendix 2). False matches based on these two procedures are very unlikely. We show the results for the take-up regression for the restricted sample of gold-standard/deterministic matches in Table 12. The comparison with our main findings depicted in Table 3 shows that the statistical significances, the signs, and the magnitude of the marginal effects and the effects of the data correction change only slightly.

In the fourth robustness check, we dropped all non-unique matches, which were corrected after matching (see Sect. 4.2 and Appendix 2). We present the results of our take-up estimation for the sample without these observations in Table 13. Again, we find only minor differences compared to the main results, which are based on all matched observations (Table 3). Thus, the inclusion of the corrected duplicates does not affect our results.

5.4 Patterns of misreporting

Finally, we offer a multivariate characterization of those households who failed to report their benefit program take-up going beyond the descriptive statistics of Tables 1 and 2. Using our baseline specification, we estimate probit models for the outcomes “underreporting” and “overreporting” with pooled and random effect models. Patterns of underreporting are analyzed within the sample of households with UB II program take-up according to the administrative data. For the determinants of overreporting, we rely on observations without registered UB II take-up in the administrative data only.

Table 4 presents the marginal effects of pooled (column 1 and 3) and RE models (column 2 and 4), respectively. We find under- as well as overreporting to be correlated with several characteristics. Particularly, for the indicators of the age of the household head the effects are large and statistically significant, where younger heads are most likely to underreport and less likely to overreport program take-up. Families with children are more likely to underreport and have a lower probability to overreport than individuals in single-person households. Furthermore, households with smaller entitlements or whose head of household is higher educated are more likely to underreport. In contrast, within the sample of households without registered UB II receipt, households with smaller simulated entitlements and a higher educated heads of the household are less likely to overreport. The amount of the monthly rent is positively (negatively) correlated with the probability of underreporting (overreporting). Finally, we find a lower probability of underreporting benefit program take-up for households from East Germany and a higher probability of underreporting for households whose head of household is a first-generation immigrant. Overall, we find that misreporting is not random. This agrees with the finding of Meyer et al. (2018) who show that misreporting is systematically related to household characteristics.

Table 4 Regression of misreporting on household characteristics: marginal effects.

A comparison of the characteristics of underreporting and overreporting households (Table 4) and of the misreporting-induced bias of marginal effects of given characteristics (Table 3) yields similar patterns. We find both strong age dependence in misreporting and a strong response of the misreporting correction on the marginal effect of age. Similarly, we observe statistically significant changes in the marginal effects of the monthly simulated entitlement, monthly rent, first-generation immigrant, upper secondary education, and East Germany after correction for mismeasurement of UB II program take-up. These outcomes are also directly correlated with the misreporting of benefit take-up. Clearly, the estimation of take-up regressions is more reliable for those groups for whom the outcome is measured correctly. Thus, the bias in survey-based estimations of take-up equations varies depending on the extent to which an analysis framework correlates with the propensity to misreport program take-up.

6 Conclusions

This study contributes to the literature on benefit program non-take-up behavior. Because this literature relies on survey data, it suffers from measurement error if respondents do not reveal their true use of welfare benefits. We inspect this issue for the case of a general welfare program using linked representative survey and administrative data. Given that our welfare receiving respondents are aware of the administrative origin of the sampling, misreporting might be smaller here than in other data settings. For households with linked survey and administrative data, we simulate a non-take-up rate based on survey information of 40%, which is in line with results found for comparable benefit programs in other countries (see, e.g., Eurofound 2015). The data linkage shows under- as well as overreporting of benefit program take-up in the survey, whereas we find an underreporting rate of 7.6% in the survey data. Correcting the survey responses for mismeasurement of benefit program take-up (under- and overreporting) reduces the simulated non-take-up rate to 37%.

We use the information on misreporting on program take-up to test whether the results of take-up regressions differ depending on the treatment of misreporting households, i.e., depending on whether we recode them as take-up (underreporting) or non-take-up (overreporting) households in our outcome measure. We estimate pooled and panel instrumental variable probit models and calculate marginal effects. When we compare the estimation results obtained with corrected and uncorrected dependent variables, we find that the absolute difference in marginal effects is often statistically significant and large. In relative terms, many marginal effects change by at least 30% after the correction. These results hold up to various changes in the empirical approach: We used different estimators (e.g., with and without random effects), and offer a variety of robustness tests to account for potential selectivity problems and sensitivities of the data linkage procedures.

We find that the patterns of the changed marginal effects mostly agree with the correlation patterns underlying misreporting behaviors: Households, whose head of the household is a first-generation immigrant, with small benefit claims and high monthly rents, or with young or highly educated household heads are particularly likely to underreport their benefit receipt. In terms of overreporting, we find an opposite pattern: Most characteristics that positively correlate with underreporting have a negative effect on overreporting and vice versa.

The marginal effects of the characteristics which are correlated with misreporting also changed in a statistically significant way when we corrected our take-up outcome measure for misreporting. This agrees well with the literature showing that households close to the labor market or with a risk of benefit confusion tend to misreport or to not take up their benefit (Bruckmeier and Wiemers 2012, 2017; Bruckmeier et al. 2014). The mechanisms of these groups’ underreporting program take-up are different and individually plausible. Because German naturalization rules require that applicants should be able to support themselves and do not rely on social transfers or means-tested benefits, there may be a high perceived cost connected to admitting benefit receipt for immigrants (see, e.g., Riphahn and Saif 2019). Also, households who are close to the labor market, e.g., higher educated and younger persons, may suffer from (perceived) stigma effects and work the hardest to avoid transfer dependence. Those receiving several social transfers at the same time may not be able to keep track of the specific transfer programs from which they benefit, particularly if transfer eligibility changes in short intervals. Overreporting, in contrast, would be more likely among households who confuse different benefit programs or accidentally err on the timing of benefit receipt.

In sum, the analyses based on our linked data suggest that research concerned with take-up and its determinants needs to account for potential misreporting of benefit receipt. The marginal effects in regressions of benefit take-up may well be biased unless the outcome measures can be corrected for misreporting. The bias in survey-based estimations of take-up equations depends on the extent of mismeasurement and the correlates of the propensity to misreport program take-up with the characteristic of the analyzed population.

Take-up analyses are often determined by an interest in distributional effects of government transfers. If the indicators of certain parts of the distribution are more likely mismeasured due to misreporting of benefit receipt, the results for these groups tend to be biased. We are among the first to show such patterns empirically, which are important for the correct interpretation of distributional analyses of government benefits.