Misreporting of program take-up in survey data and its consequences for measuring non-take-up: new evidence from linked administrative and survey data

The international literature studies non-take-up behavior of eligible populations to evaluate the effectiveness of government programs. A major challenge in this literature is the measurement error regarding benefit take-up. In our data, we observe both actual welfare receipt and respondents’ survey information on their program take-up. This allows us to observe the measurement errors that other researchers must estimate. We describe survey misreporting and investigate how it biases the estimates of the magnitude and patterns of benefit take-up among eligible households. Our findings suggest that the extent of measurement error can be substantial. It varies with the characteristics of the misreporting population and is associated with the drivers of misreporting. This indicates that survey-based analyses of take-up behavior are likely subject to severe biases.


Introduction
The non-take-up of benefit programs is an important aspect of state support programs; if eligible households do not take up transfers, such programs are ineffective and the basic needs of population groups may remain unaddressed. Eurofound (2015) shows that benefit non-take-up is internationally pervasive and frequently affects more than B Regina T. Riphahn regina.riphahn@fau.de 1 IAB, Nuremberg, Germany 2 Friedrich-Alexander-University Erlangen-Nürnberg, Lange Gasse 20, 90403 Nuremberg, Germany 40% of the eligible population. This has motivated a long-standing discussion of the extent and determinants of take-up behavior in the international literature (e.g., Moffitt 1983;Blundell et al. 1988;. We contribute to that literature and address the research question of how measurement error in reported benefit receipt biases survey-based analyses of take-up behavior. Our unique data provide unusually precise information on true program participation. A key challenge in the empirical analysis of non-take-up is its measurement. A correct measurement of non-take-up requires valid information on both program eligibility and program take-up. Most non-take-up studies must rely on survey data to measure both concepts. However, several factors can generate measurement errors in survey data and lead to imprecise estimations of program eligibility as well as program take-up. Several studies discussed the relevance of misreporting for the reliability of survey data, see, e.g., Meyer et al. (2015), Meyer and Goerge (2011), Taeuber et al. (2004), or Card et al. (2001). Regarding mismeasurement in program take-up, Meyer et al. (2015) pointed out that data from household surveys missed the measurement of approximately half of all welfare and food stamp payments in major household surveys in the USA. Given the importance of mismeasurement in reported program take-up, our paper focuses on its consequences for the analysis of benefit non-take-up.
Several factors contribute to the mismeasurement of benefit receipt (Bound et al. 2001). Surveys often ask respondents whether they have received benefits during a certain period in the past. Respondents may have completely forgotten past benefit receipt (recall bias), or they may not remember the exact dates of receipt. For example, events can be reported as more recent than they actually occurred, which is known as the "forward telescoping bias" in the survey literature (Bradburn et al. 1994). This form of bias could lead to mismeasurement in the form of misreporting of benefit take-up. Additionally, if different benefits are available or benefits can be claimed simultaneously, beneficiaries might report incorrectly, claiming specific benefit(s) that they did not receive while inadvertently omitting the benefit(s) that they did receive, which ultimately leads to misreporting (Hancock and Barker 2005;Krafft et al. 2015). Another source of misreporting is the "social desirability bias" (Bound et al. 2001). In particular, the receipt of means-tested social welfare benefits is often perceived as stigmatizing and thus respondents may underreport their receipt of these benefits.
The literature on program non-take-up and program participation has proposed several approaches to deal with potential measurement biases. We contribute to the set of studies that apply external validation samples for the study of measurement errors in benefit receipt. Other approaches rely on structural modeling (e.g., Duclos 1995Duclos , 1997Pudney 2001;Bollinger and David 2001; or focus on careful data handling (Hancock and Barker 2005).
External validation samples can be used to assess the extent of measurement errors in survey data and to mitigate their effects in analyses. Studies that employ this approach are scarce because this requires linking survey data to administrative data. Linked data are often not available or only available for specific groups, periods, or regions. One example of this approach is Bollinger and David (1997). They take advantage of data with information on true participation and survey responses. Information on response errors in the validation sample is then considered in the likelihood function for the primary sample. The authors find that modeling response errors generates large differences in the estimation of program participation even when the validation data are gathered on a sample that differs from the survey data. Other examples are Mittag (2016) and Meyer and Mittag (2017a). Both investigate different methods to account for misclassification in survey data if linked data are not available to the researcher. They use validation data to evaluate the effectiveness of their formulas for bias reduction. The studies closest to our approach are Meyer and Mittag (2017b) and Meyer et al. (2018). The former use linked data to correct survey data from the Current Population Survey (CPS) on reported benefit receipt from different programs and cover the state of New York. They find that the poverty-reducing effect of benefit programs is nearly doubled using the corrected data but do not study take-up behavior. In contrast, Meyer et al. (2018) only focus on the food stamp program, but link data from three different surveys (American Community Survey, Current Population Survey, and Survey of Income and Program Participation) with administrative data covering the states of Illinois and Maryland. The authors find underreporting rates of between 23% and 50% depending on the survey. They study the reasons for misreporting and investigate the impact of misreporting on program take-up estimations. The authors find that particularly single parents, non-whites, and the elderly understate participation which generates biases in program receipt regressions: Underreporting is part of the explanation for low benefit receipt among the elderly.
In this study, we similarly link survey data to administrative data, which informs us about the true program take-up of survey respondents. This allows us to determine precisely when survey information differs from actual benefit receipt. Thus, we can determine the presence of measurement errors and misreporting directly. We consider a general income support program that is available for the working-age population in Germany (Unemployment Benefit II). This general benefit is less subject to the risk of benefit confusion than specific transfers available, e.g., for retirees only, which have been discussed in the literature (e.g., Duclos 1997;Hancock and Barker 2005). We investigate whether the extent and pattern of program non-take-up differ after correcting for the misreporting of survey respondents. To the best of our knowledge, our study is the first to use linked data to investigate the impact of mismeasurement on program non-take-up. 1 Relative to the extant literature, the value added by our contribution relates to several aspects. First, we offer evidence from a European country where social norms and the cost-benefit setting with respect to public transfers may differ from that in the USA, the origin of the previous literature. Second, we can take advantage of a nationwide program with nationwide validation data from administrative sources. Prior studies in the USA are typically based on evidence from regional units. Third, we refer to prior studies on take-up behavior and their common empirical specification and investigate to what extent the results may be biased. This differs from the existing literature in that we limit our analysis to the eligible population identified using simulation techniques. Thus, our analysis does not examine program participation within the general population, i.e., program targeting, but rather the claiming behavior of the eligible population. Finally, we offer evidence on the direction of the bias introduced by misreporting and study the correlation patterns of misreporting itself.
We find that correcting for misreporting in the data modifies the results of takeup regressions. The marginal effects of characteristics associated with benefit take-up often deviate by more than 30% after correcting for measurement error in the outcome. These results are robust to various changes in procedure. Furthermore, we find evidence that the patterns determining misreporting of program take-up are reflected in the sensitivity of marginal effects to the data correction. In particular, we observe that reporting behavior varies with age, income, and immigrant status as in Meyer et al. (2018); those who are more likely to rely on benefits in a less permanent way are most likely to underreport. Therefore, uncorrected estimations may misleadingly lead us to overestimate the stability of welfare receipt and suggest that welfare is less a protection against income shocks and volatility and more a source of subsistence income.
Our results are important for several reasons. First, they show that survey data can yield biased results in the study of take-up behavior based on self-reported information; this confirms prior studies which investigated the role of misreporting on program takeup in the overall population (e.g., Meyer et al. 2018). Second, our findings are more reliable than prior contributions that are not based on linked survey and administrative data. Third, we cover a general and well-known nationwide benefit program. Fourth, we show that the patterns of misreporting and the estimation biases can be related. The coefficients that are the most biased in take-up equations are those associated with the misreporting groups' characteristics. Thus, even though misreporting rates may be lower than in other settings studied, the biases are nevertheless sizeable.
We structure our paper in six sections. In the next section, we briefly characterize the benefit program considered in our analysis. We lay out our empirical approach in Sect. 3. Section 4 describes the nature of our data and provides descriptive statistics. Section 5 presents our empirical results and robustness tests. We draw conclusions in Sect. 6.

Institutional background
We study the take-up of the German minimum income support program Unemployment Benefit II (UB II). The transfer is available for working-age individuals who are able to work and their families. Alternative programs cover persons who have reached retirement age or are unable to work. UB II eligibility exists if a household's net income is below the legally determined minimum; in this situation, the benefit covers the difference. The benefits are means tested. Almost all types of income are considered in the means test. The minimum income deemed sufficient to guarantee an acceptable minimum living standard for a household is calculated based on the number of household members and-for minors-their age. In 2018, the standard benefit for an adult is 416 euros per month. Expenses for rent, heating, and health care are paid in addition to the standard benefit; benefits can be higher in special circumstances (e.g., for single-parent families, pregnant women, or those with special food requirements). Households with more than a maximum amount of wealth are not eligible; wealth comprises financial assets plus the value of owned property minus mortgage liabilities. Self-occupied residential property is ignored in the wealth check in most cases. Eligibility is not conditional on unemployment. In 2016, approximately 41% of regular benefit recipients were unemployed and 28% received the benefit to top up (insufficient) earnings from employment. Others were temporarily unable to work, e.g., because of child care obligations (Statistik der Bundesagentur für Arbeit 2018). 2 The UB II program follows federal regulations and is administered either by the employment services of the Federal Employment Agency or by the municipality. In 2016, the program covered approximately 6.2 million individuals in 3.3 million households and paid out approximately 35.2 billion euros (STBA 2018). Thus, UB II is a well-known program that is generally available to the entire working-age population which is able to work. As a large part of the German population receives UB II, the program is comparable to programs such as Medicaid and SNAP (see Table 14), which are the subject of recent studies on benefit misreporting based on linked data (Meyer et al. 2018;Davern et al. 2019). In contrast to these programs, UB II provides a meanstested basic income that is the most important source of income for many households. The UB II benefit level is thus also significantly higher than, for example, for SNAP (Table 14). This may lead to a relatively small measurement error for UB II in survey data if the household is well informed about the benefit.
Recent studies on take-up of UB II using survey data from the German Socioeconomic Panel (SOEP) show that based on monthly data, between 46 and 58% of eligible households did not take up the benefit in the years (e.g., Bruckmeier and Wiemers 2012. The authors find that take-up varies with the potential benefit amount and the expected duration of eligibility expressed in proxy variables such as education and region of residence.

Empirical approach
In recent decades, a large number of empirical studies on the determinants of (non-)take-up have been conducted for a wide range of means-tested benefits (see, e.g., Blundell et al. 1988;Blank and Ruggles 1996;Riphahn 2001;Wilde and Kubis 2005;Whelan 2010;Bruckmeier and Wiemers 2012). All survey-based studies of take-up behavior have to address the problem that the data do not provide information about benefit eligibility. The studies therefore simulate welfare eligibility for every household in the dataset using a microsimulation model. Then, given a model of welfare eligibility, the literature typically defines benefit non-take-up as being eligible according to the simulation model while reporting non-receipt of the benefit in the survey data.
Following Blundell et al. (1988), we model the take-up decision in a discrete choice framework to analyze determinants of take-up behavior. This approach assumes that benefits are taken up if the household's net utility from claiming exceeds the utility when not claiming the benefit. The net utility takes into account that claiming a benefit is associated with non-monetary costs, which occur, e.g., because of insufficient knowledge of entitlement rules, the claiming process, the administrative procedures, and stigma costs, i.e., the fear of stigmatization and negative societal attitudes toward welfare dependence (see van Oorschot 1991).
Since neither the utility nor the costs associated with claiming a benefit can be directly observed, empirical analyses of take-up behavior use proxy variables to capture both. We build on the existing empirical literature for choosing the proxy variables that influence the take-up decision (Riphahn 2001;Wilde and Kubis 2005;Frick and Groh-Samberg 2007;Whelan 2010;Bruckmeier and Wiemers 2012) to enhance the external validity of our analysis. The most obvious factor affecting utility when claiming UB II is the household's benefit entitlement (see, e.g., Blundell et al. 1988). We follow the standard procedure in the literature and simulate the level of benefit entitlements for all households in the survey data, i.e., for households reporting benefit receipt and for households not reporting benefit receipt. This ensures that the level of entitlement for both types of households is determined according to a uniform measurement concept. 3 In addition, utility and costs of claiming may vary by household type, i.e., whether it is a single or a couple household and whether or not children are present. Therefore, we consider indicators of household type. As general sociodemographic indicators, we account for age, education, an indicator for first-or second-generation immigrants, and disability of the household head. We also control for actual rent paid and home ownership. We expect higher take-up rates in Eastern Germany due to higher unemployment there and thus control for residence in Eastern Germany.
We follow the literature (e.g., Blundell et al. 1988;David 1997, 2001;Duclos 1995;Pudney 2001) in assuming an i.i.d. standard normal error term to capture the unobserved part of the take-up decision. 4 Furthermore, we account for the potential endogeneity of the simulated benefit. The simulated benefit amount is endogenous if unobserved factors that influence the take-up decision are correlated with unobserved factors determining the household's labor supply. In this case, the simulated benefit amount would be endogenous because it depends on the household's simulated net income, which is a function of earned income.
Thus, our first specification for the empirical take-up equation is an IV probit estimation (see, e.g., Whelan 2010 andWiemers 2012). The estimation of the IV probit requires the choice of credible instruments for the level of entitlement. Following Bruckmeier and Wiemers (2012), we use (a) the level of the regular benefit entitlement before own income is deducted, (b) non-earned public transfers, i.e., income independent of current labor supply including public and private pensions, and (c) private transfers from persons outside of the household. These are important determinants of the simulated UB II entitlement, and thus, these instruments satisfy the requirement that the instrument must be (strongly) partially correlated with the endogenous variable.
In a second specification, we utilize the panel structure of our data by adding random effects to the IV probit model (RE IV); this accounts for unobserved heterogeneity at the household level. 5 The endogeneity of the benefit level for the take-up decision is often ignored in the literature on empirical models of take-up behavior, either because of the lack of credible instruments or because accounting for endogeneity complicates the estimation of the model substantially. Thus, to demonstrate the impact of misreporting on take-up models that do not consider the benefit level as endogenous, we also estimate a pooled probit model and a RE probit model (see Table 9).

Data
We use data from the household panel study "Labour Market and Social Security" ("Panel Arbeitsmarkt und soziale Sicherung," PASS), a survey designed for research on unemployment and poverty (Trappmann et al. 2010(Trappmann et al. , 2013Berg et al. 2014 for technical documentation). The first survey of this study interviewed more than 12,000 respondents in 2006-2007. The seventh survey wave was completed in 2013 (for interview periods, see Table 5). Because the survey instruments and interview program were revised after the first wave (Gebhardt et al. 2009), we only use surveys 2-7.
The data consist of two subsamples. The first subsample considers UB II recipients, while the second subsample covers the overall German population, oversampling those with low socioeconomic status. The UB II sample is randomly drawn from the administrative records of the Federal Employment Agency. To retain a representative character for the population of UB II recipients, subsample one is refreshed each year to include new recipients of UB II (benefit-inflow-sample).
The general population sample is a random draw from a database of addresses of private households in Germany. It is provided by a commercial provider in wave one and is taken from municipality population registers in wave five (refreshment sample). For a detailed description of the sampling design, see Gebhardt et al. (2009). 6 The final weights we use in the descriptive analysis balance distortions arising from the sample design and reflect the entire German population. All regression analyses use unweighted data; a robustness test discussed in Sect. 5.3 evaluates whether this affects the results.
The PASS data are particularly suitable for our analyses because they focus on potential beneficiaries living in low-income households. Beste et al. (2018) find that 5 All models are estimated with the Stata command "gsem." 6 The sampling in wave five involved several steps.
Step one draws 300 postcodes (regions) as primary sampling units, i.e., households from both populations-UB II recipients and private households-within each postcode. Based on the number of benefit-receiving households (sample 1) and the number of private households (sample 2) in a postcode, each household receives a uniform selection probability. Design weights for the gross sample reflect the selection probability. Logit models for panel participation are the basis to account for the participation probability and to adjust design weights in the second step (see Gebhardt et al. 2009). Finally, both samples were calibrated to official statistics on UB II recipients and private households in Germany. the income distribution in the PASS data (starting with wave 2) is similar to that of two other data sources (SOEP and Mikrozensus). Furthermore, PASS interviews respondents about their current welfare receipt, and it allows us to link survey data with administrative records on welfare receipt. The interviewers ask the head of household whether and for how long the household has received UB II since the last survey or in the last two years, and whether the household currently still receives UB II. Interviewers determine the head of the household during the household's first participation in the survey as the person who is best informed about the household finances. The PASS gathers information on UB II receipt via "dependent interviewing," i.e., interviewers remind the head of the household of the answer in the previous interview prior to asking about current receipt (Berg et al. 2012). This form of interviewing should result in reduced misreporting of program take-up (Lynn et al. 2012).
In our analysis sample, we consider household observations with realized personal interviews. We drop household observations whose heads are above age 65 years or in receipt of retirement benefits, students, and individuals pursuing apprenticeship training because these groups benefit from alternative transfer programs. We require that the household responds to the question on current welfare receipt, that there is only one benefit-receiving unit ("community of need") in the household, and that there is valid information on earnings (see Appendix 1 for details). Across waves 2-7, our sample covers 30,878 annual household-level observations overall and approximately 5000 observations per year. For each household-year observation, we simulate UB II benefit eligibility and calculate potential benefit amounts. This yields 17,585 UB II eligible household-year observations.

Data linkage
The opportunity to link survey with administrative data is rare in the literature. In particular, we are able to link the PASS survey data to the administrative records of the Federal Employment Agency. The data, originally collected at local job agencies ("job centers"), contain information on claims for UB II. The data perfectly reflect official payments. The Institute for Employment Research (IAB) and the Research Data Center (FDZ) of the German Federal Employment Agency (BA) at the IAB have access to these administrative data and are responsible for processing, anonymizing, and providing it for empirical research. For our analysis, we link the survey data to administrative data of the "Unemployment Benefit II Recipient History" ("Leistungshistorik Grundsicherung," LHG, version 11.01.01-150220) of the IAB (Antoni et al. 2016). The administrative UB II data contain information on socioeconomic variables of eligible individuals and regional variables. To measure UB II over-and underreporting, we compare the information the respondents provide on UB II receipt at the month of the interview with the information on UB receipt in this month gained from the LHG.
Because of legal constraints, the survey information can only be linked to the administrative data if the participant consented to linkage in the survey. Therefore, interviewers ask participants during the course of the interview for consent to merge their survey data to their administrative data that are available at the IAB (for details, see Appendix 2). The consent rate in the PASS is approximately 80%, which is comparable to other survey studies (Berg et al. 2014;Sakshaug and Kreuter 2012). In our sample of simulated eligible households, we have a consent rate of 83.4% (see Table 6). Because respondents who do not agree to the data linkage are asked again in the next wave, the proportion of observations for which an approval is available is significantly higher. Overall, we could not use only 4% of all household-year observations of eligible households because of missing consent to data linkage. This leaves us with 16,874 household-year observations of simulated eligible respondents who agreed to the linkage.
Next, we merge these 16,874 observations with a key file generated by the German Record Linkage Center (Antoni and Schnell 2019). This key file is based on the identification of the PASS respondents in administrative data of the IAB. To identify respondents in the administrative data, harmonized information on addresses and personal characteristics from different administrative data sources collected by the Federal Employment Agency are used. Individuals who never worked in dependent employment, who are exempt from social security contributions (e.g., civil servants), or who have never been registered as unemployed or benefit recipients are not in the data. The record linkage is based on multilevel deterministic and probabilistic methods for linking datasets (see Sakshaug et al. 2017 for a detailed description and Appendix 2). From our sample of 16,874 household-year observations of respondents who agreed to data linkage, we identified 15,925 observations in the administrative data, which amounts to a linkage rate of 94%. Of the 15,925 matches, 15,095 were unique matches and 830 were duplicates, which were corrected following a procedure described in Appendix 2. As a robustness check, we verify our main findings for a sample without duplicates in Sect. 5.3. From our sample of 15,925 linked observations, we keep 14,834 observations with no missing values in the covariates for our descriptive results and the regression analysis. Finally, we drop 270 observations for whom the gender and age information of survey data and administrative data in the LHG does not match. After these steps, the analysis sample comprises 14,564 observations, which represents 83% of the simulated eligible population sample (17,585).
A potential problem of the data linkage is that results may be biased because of selectivity in either non-consent or non-identifiability in the administrative data. With respect to non-consent, misreporting of benefit receipt might be biased downwards if non-consent to the data linkage is positively correlated with the misreporting of benefit receipt: Households who do not want to admit to receiving UB II might also be reluctant to agree to data linkage if they fear that their misreporting might be discovered. Column 1 of Table 8 indicates correlation patterns underlying the probability of not giving consent to the data linkage. We find some statistically significant and small effects: Immigrants, younger, non-disabled persons, those living in single households, or those living in Western Germany have a higher probability to refuse consent. This suggests that our analysis is a rather conservative estimate of misreporting because individuals with low data linkage probability and who are underrepresented in the data generally tend to have a high propensity to misreport.
With respect to non-identifiability in administrative data, column 2 of Table 8 shows the correlation patterns behind the probability that a household cannot be linked to the administrative data for the sample of simulated eligible households with consent to data linkage. Here, we find no significant marginal effects in most sociodemographic and household characteristics except for some age groups, migration background, home owners, and the subsample two indicator.
Overall, these results indicate small systematic effects; thus, we conservatively underestimate misreporting and its correction. We will provide two robustness checks in Sect. 5.3 concerning the potential selectivity of data linkage.

Over-and underreporting
Next, we compare UB II program take-up as reported in the survey, with the information in the administrative data. In our data, 11,106 respondents reported benefit receipt in the survey and-based on administrative records-actually received benefits in the month of the interview (take-up households). Additionally, 2277 respondents reported not claiming the benefit, which is confirmed by the information from the administrative data (non-take-up households). A group of 851 respondents did not indicate receipt in the survey, but actually received benefits in the month of the interview based on administrative data (underreporting households). This results in an underreporting rate of 7.1 (7.6)% relative to all true recipients without (with) survey weights being applied. Compared to other benefit programs, this rate is rather low. For example, Meyer et al. (2018) find underreporting rates for food stamp program take-up between 23 and 50% in US surveys. Parolin (2019) confirms high underreporting of the US Food Stamp Program (SNAP) of about 38%: The share of SNAP recipients in the CPS is 11.1% compared to almost 18% in the administrative data. Comparing EU Statistics on Income and Living Conditions (EU-SILC) with administrative data, Tasseva (2016) reports that the number of benefit recipients of important programs in Bulgaria is underreported by between 3 and 40%. For the UK, Brewer et al. (2017) show that particularly, tax credits, means-tested benefits, as well as disability or health benefit have low coverage rates in the Living Costs and Food Survey (LCFS).
In addition to underreporting, 330 benefit-eligible respondents claimed to receive UB II in the survey, but they did not receive benefits according to the administrative records (overreporting households). Various mechanisms may cause overreporting. First, it may result if respondents are mistaken about the period when they actually received the benefit. The administrative data show that for 83 observations (25%) the household received UB II not during the month of the interview, but in the previous or in the following month. Hence, the telescoping bias could be one important source of overreporting in our data. Second, the method of dependent interviewing can also contribute to overreporting when former recipients incorrectly confirm the preloaded information on UB II receipt in the last year (Eggs and Jäckle 2015). Third, overreporting could result from incorrect linkage. However, a closer look at the match types used in the data linkage across the three groups does not reveal large differences: While 96.5% of the observed take-up households are linked by "gold-standard" or deterministic match, 7 this share is 96.4% for the underreporting households and 94.2% for the overreporting households. This could indicate that inaccurate answers may be more common in the survey than errors in the matching process. Table 1 shows descriptive statistics of our explanatory variables used in the regression analysis for the full sample and separately for households with and without benefit program take-up and for those underreporting and overreporting benefit program take-up. The four subgroups differ in their characteristics. Interestingly, we find some similarities between non-take-up households (column 3) and the underreporting households (column 4). Compared to take-up households, the latter two groups have significantly lower simulated benefit entitlements, younger household heads, a lower (higher) share of household heads with lower secondary education (upper secondary education), and a larger share of families with children. This similarity suggests that a take-up regression erroneously classifying underreporting households as non-takeup households overestimates the heterogeneity between the take-up and non-take-up groups, i.e., after correcting underreporting, we expect that the take-up regression yields coefficients of smaller magnitude. An exception are the differences concerning the immigration status of the head of the household: While we find a significantly lower share of immigrants among the non-take-up households compared to take-up households, significantly more household heads who are first-generation immigrants belong to the underreporting households. For the overreporting households, we find several significant differences compared to take-up households. However, the differences in their characteristics make them comparable neither with underreporting households nor with non-take-up households.
The next section describes our analysis results. First, we describe the extent to which benefit program take-up as reported in survey data must be corrected based on information from administrative data. Then, we look at the effect of correcting the dependent variable on the correlation patterns of non-take-up behavior and investigate the robustness of these results. We describe the characteristics of those misreporting benefit receipt in the last step.

Descriptive effects of data correction
In Table 2, we report the simulated group-specific UB II non-take-up rates for the sample that could be linked to administrative records. Column 1 shows the shares before considering corrections for UB II program take-up misreporting, column 2 shows the rates after correction for underreporting only, and column 3 shows the rates after corrections for under-and overreporting. Initially, we observe an overall weighted non-take-up rate of 40% (see bottom of column 1 of Table 2) with substantial heterogeneity across subgroups: We observe the highest rate of non-take-up for couples without children (64%), while single-parent households feature the lowest rates of benefit non-take-up (30%). The size of the non-take-up rate and the variation over Footnote 7 continued linkage," which uses first name, last name, zip code, city, street name, house number, sex, and the birth cohort indicator. Both gold-standard and deterministic linkage should result in highly reliable results. Asterisks */**/*** denote significantly different means compared to the group of take-up households (column 2) at the significance level of 0.1/0.05/0.01. Hh stands for head of household. "Subsample two" indicates whether an observation belongs to the second, nationally representative subsample. "Take-up households" report UB II in the survey and actually receive UB II according to the administrative data. "Non-take-up households" report non-receipt in the survey and actually do not receive UB II according to the administrative data. "Underreporting households" report UB II non-receipt in the survey and actually receive UB II according to the administrative data. "Overreporting households" report UB II receipt in the survey and actually do not receive UB II according to the administrative data. Weighted values using cross-sectional sample weights the subgroups is in line with findings based on other data (see, e.g., Bruckmeier and Wiemers 2012 and the literature cited there). 8 We use our administrative data on actual benefit receipt to correct for misreporting of program take-up in the survey. 9 The correction of the underreporting results in a non-take-up rate of 35%, i.e., a reduction by five percentage points (see column 2 of Table 2). If we additionally reclassify the overreporting households as non-takeup households, this increases the non-take-up rate again to 37% (see column 3 of Table 2). Hence, program take-up misreporting reduces the overall non-take-up rate from approximately 40% to approximately 37%. Thus, misreporting caused us to overestimate the UB II non-take-up rate by approximately 3% points, or 8%. The extent of the correction in the non-take-up rate varies across subgroups (see the last two columns of Table 2). The relative decline in non-take-up rates ranges from 2% for disabled heads of the household to approximately 17% for households whose head is a first-generation immigrant. Figure 1 depicts the distribution of simulated benefit entitlements for those taking up the benefit according to administrative data and reporting it (take-up households) or not reporting it (underreporting household), and those who did not take up the benefit according to administrative data and did not report it in the survey (non-takeup households) or reported it in the survey (overreporting households). As expected, we observe the highest benefits among households who claim their benefits with a median value of 722 euros (see the top panel of Fig. 1). The distributions of benefit entitlements for the non-take-up and underreporting households yield a large share of households with small claims and median claims of 415 and 563 euros, respectively. The median of the overreporting households lies in between with 658 euros. Table 3 presents the estimation results of our take-up model. We regress a binary indicator of benefit take-up on household characteristics in the sample of 14,564 pooled observations of benefit-eligible households. In column 1, we present the estimated marginal effects of an IV probit estimation with cluster-robust standard errors; in column 5, we show the estimates of a random effects IV probit (RE-IV) estimation; and in columns 2 and 6, both estimation approaches are repeated, now using the corrected dependent variable.

Patterns of benefit take-up and the effects of data correction
First, the correlations between the first-stage and the second-stage error terms ("ρ 1,2 ") are statistically highly significant for all estimated models. Accordingly, the corresponding Wald tests ("Wald test of exogeneity: χ 2 (1)") strongly reject the null 8 Although the UB II non-take-up rate is quite similar to the non-take-up rate found in other studies, comparability is limited due to the selection steps described in Appendix 1 and Sect. 4.2, which make our sample no longer representative of the entire population. 9 Note that the simulation may erroneously predict benefit eligibility for households that underreport other income than benefits. Because these households are not actual benefit recipients, we would overestimate the non-take-up rate. However, their reported benefit receipt would also not be corrected based on linked administrative data. Therefore, these observations are irrelevant to the evaluation of the take-up correction. (2)- (1) Rel. diff.
(6)- (5) Rel. diff.     . Notes: HH stands for household. Fifty-seven outlier observations with monthly entitlements above 1700 euros excluded (51 take-up-HH, 3 non-take-up-HH, 2 underreporting-HH, 1 overreporting-HH). Weighted values using crosssectional sample weights hypothesis of exogeneity for the simulated benefit. 10 Thus, we prefer the IV models in Table 3 to the non-IV models presented in Table 9 for all specifications (uncorrected and corrected models). Second, a comparison of the pooled and random effects probit models in columns 1 and 2 versus 5 and 6 reveals the importance of controlling for unobserved heterogeneity at the household level. The results of the random effects estimation in columns 5 and 6 allow us to reject the pooled model of take-up in columns 1 and 2: The share of the total variance ("panel variance share ρ") contributed by the household-level variance reaches 54 and 71% in columns 5 and 6, respectively. It is highly statistically significant at the one percent level in both cases. Therefore, the 10 In order to examine the validity of the instruments used in the IV estimations (IV probit and RE-IV probit), we also estimate the take-up models using pooled 2SLS, because some validity tests (underidentification and weak instrument tests, in particular) are only available for the linear probability model. All instruments turn out to be highly statistically significant in the first-stage regression ( p < 0.001). The first-stage R 2 is 0.31, and the partial R 2 for the instruments is 0. RE-IV probit is our overall preferred model when interpreting the marginal effects of the determinants of take-up. We start by briefly discussing the estimated marginal effects for the models prior to correcting the dependent variable for UB II program take-up misreporting (see columns 1 and 5). In general, the results across the two estimation approaches are somewhat similar and the signs of the estimated marginal effects meet our expectations. The marginal effect of the simulated benefit entitlement in the uncorrected IV probit model (column 1) implies that raising the entitlement by 100 euros per month increases the probability of program take-up by 5.1% points. The RE probit (column 5) results in a smaller marginal effect (4.7% points). The propensity of benefit program take-up is positively correlated with lower education, the advanced age of the head of household, or the presence of young children in the household. The size of the effects is broadly in line with other studies on the determinants of taking up UB II (e.g., Wiemers 2012, 2017). Those residing in East Germany also have a higher propensity to take-up benefits than their counterparts in the West, which may reflect the relatively poor labor market situation and lower earnings expectations. Bruckmeier and Wiemers (2017) find similar relationships based on data from the SOEP.
Interestingly, disability status is not correlated with benefit program take-up. We observe significant marginal effects of household composition. Ceteris paribus, the likelihood of claiming the benefit is significantly lower for couples with or without children compared to single person households, i.e., the reference group. This result might reflect the importance of a potential second earner for the program take-up decision in couple households. Additionally, single parents have a higher probability of program take-up. 11 Next, we study the results when we consider an indicator of benefit take-up corrected for misreporting. We now recode all eligible households who report no receipt in the survey data but who actually receive benefits according to administrative records as take-up households. Conversely, households reporting the benefit in the survey but who do not receive UB II according to the administrative data are reclassified as non-take-up households.
Columns 2 and 6 of Table 3 present the estimated marginal effects, and columns 3 and 7 show the differences between the corrected and uncorrected marginal effects and their statistical significance 12 and columns 4 and 8 present the relative change in 11 We investigated the marginal effects for various subgroups (e.g., by gender, migration background and for East and West Germany) and did not find major heterogeneities. The results are available upon request. 12 We test for the significance of the difference between the marginal effects across specifications by estimating a joint covariance matrix of the "stacked" system of equations, i.e., the joint system consisting of the uncorrected and the corrected probit models. Fundamentally, this is a non-standard application of the robust Huber-White covariance estimator. The exact technical implementation of the test differs with respect to our model specifications. For the models which do not include random effects on the household level (pooled IV probit and pooled probit), we use the Stata command "suest" to estimate the joint covariance matrix. For the models including a random effect (RE-IV probit and RE probit), we perform an ML estimation of the uncorrected and corrected RE probit models as a joint system of equations using the Stata command "gsem." In the final step, the difference in the marginal effects and their standard errors are estimated using Stata's "margins" command. The default prediction of the "margins" command does not account for the endogeneity of the simulated benefits, which leads to biased estimates of the marginal effects of the uncorrected and corrected models as well as biased estimates of the differences of the marginal effects. Therefore, we correct for this bias as suggested by Skeels and Taylor (2015).
individual marginal effects when the corrected instead of the original take-up measure is used.
In the pooled IV and the RE-IV estimation, some of the marginal effects change in economically and statistically significant ways when we correct the dependent variable (columns 3 and 7). For both estimators, the largest absolute change results for the indicators of advanced age versus young age and for highly educated versus low educated heads of households. The majority of the marginal effects of the RE-IV estimation that are statistically significant in the uncorrected as well as the corrected model change by more than 30%. Meyer and Mittag (2017a) study the impact of misclassification corrections on probit coefficient estimates in food stamp take-up regressions. They find that estimates are mostly attenuated due to non-random misreporting. We do not confirm this pattern: The marginal effects in columns 1 and 5 of Table 3 appear to be generally larger than the marginal effects obtained after data correction.
Thus, correcting for misreporting affects not only the level of non-take-up but also the impact of the correlates of the take-up decision. Therefore, we empirically corroborate the findings of Pudney (2001) Monte Carlo simulations that even moderate measurement errors in reported benefit receipt can lead to strong biases in coefficient estimates.

Robustness tests
We offer several types of robustness tests. First, we repeat the estimations in Table 3 using sample weights. The direction of all statistically significant marginal effects is robust to adding sample weights (Table 10). For most variables, the extent of the impact and the significance also remain at a similar level. Our key interest is the effect of correcting the dependent variable. The comparison of the changes in the marginal effects in Table 10 (see columns 3 and 7) with Table 3 shows that most of the results are robust.
Second, we address concerns about the potential selectivity due to (a) missing consent to the data linkage and (b) failed linkage because not all households that agreed to the data linkage could actually be linked (see Sect. 4.2 and Appendix 2). We reestimate our take-up equation for the full sample of eligible households before data linkage and thus without corrections. Table 11 shows the take-up estimation results for the full sample of households that are simulated as eligible to UB II before data linkage. Thus, the results in Table 11 are independent of the linkage procedure and are comparable to the typical situation in which administrative data are not available. Columns 1 and 3 show the results for all eligible households, while columns 2 and 4 again show the results of the linked sample from Table 3. With the exception of the marginal effect for the family without children indicator in the RE-IV model, the significance of the marginal effects does not change. Additionally, the magnitude of all marginal effects changes only slightly. Thus, we find no evidence that there is selection into linkage based on reported receipt.
In two further robustness checks, we analyze whether false matches in the linked data may bias our estimates. In the third robustness check, we only keep observations that were identified in the matching procedure based on the gold-standard or determin-istic procedure, i.e., the most reliable matches (see Appendix 2). False matches based on these two procedures are very unlikely. We show the results for the take-up regression for the restricted sample of gold-standard/deterministic matches in Table 12. The comparison with our main findings depicted in Table 3 shows that the statistical significances, the signs, and the magnitude of the marginal effects and the effects of the data correction change only slightly.
In the fourth robustness check, we dropped all non-unique matches, which were corrected after matching (see Sect. 4.2 and Appendix 2). We present the results of our take-up estimation for the sample without these observations in Table 13. Again, we find only minor differences compared to the main results, which are based on all matched observations (Table 3). Thus, the inclusion of the corrected duplicates does not affect our results.

Patterns of misreporting
Finally, we offer a multivariate characterization of those households who failed to report their benefit program take-up going beyond the descriptive statistics of Tables 1 and 2. Using our baseline specification, we estimate probit models for the outcomes "underreporting" and "overreporting" with pooled and random effect models. Patterns of underreporting are analyzed within the sample of households with UB II program take-up according to the administrative data. For the determinants of overreporting, we rely on observations without registered UB II take-up in the administrative data only. Table 4 presents the marginal effects of pooled (column 1 and 3) and RE models (column 2 and 4), respectively. We find under-as well as overreporting to be correlated with several characteristics. Particularly, for the indicators of the age of the household head the effects are large and statistically significant, where younger heads are most likely to underreport and less likely to overreport program take-up. Families with children are more likely to underreport and have a lower probability to overreport than individuals in single-person households. Furthermore, households with smaller entitlements or whose head of household is higher educated are more likely to underreport. In contrast, within the sample of households without registered UB II receipt, households with smaller simulated entitlements and a higher educated heads of the household are less likely to overreport. The amount of the monthly rent is positively (negatively) correlated with the probability of underreporting (overreporting). Finally, we find a lower probability of underreporting benefit program take-up for households from East Germany and a higher probability of underreporting for households whose head of household is a first-generation immigrant. Overall, we find that misreporting is not random. This agrees with the finding of Meyer et al. (2018) who show that misreporting is systematically related to household characteristics.
A comparison of the characteristics of underreporting and overreporting households (Table 4) and of the misreporting-induced bias of marginal effects of given characteristics (Table 3) yields similar patterns. We find both strong age dependence in misreporting and a strong response of the misreporting correction on the marginal effect of age. Similarly, we observe statistically significant changes in the marginal   Panel variance share ρ 0.37*** 0.79*** Asterisks */**/*** denote statistically significant results (standard errors in parentheses) using cluster-robust standard errors at the significance level of 0.1/0.05/0.01. Hh stands for head of household. "Panel variance share ρ" denotes the share of the total variance contributed by the panel-level variance component. "Subsample two" indicates whether an observation belongs to the second, nationally representative subsample. Survey wave indicators are included in all estimation. Unweighted results effects of the monthly simulated entitlement, monthly rent, first-generation immigrant, upper secondary education, and East Germany after correction for mismeasurement of UB II program take-up. These outcomes are also directly correlated with the misreporting of benefit take-up. Clearly, the estimation of take-up regressions is more reliable for those groups for whom the outcome is measured correctly. Thus, the bias in survey-based estimations of take-up equations varies depending on the extent to which an analysis framework correlates with the propensity to misreport program take-up.

Conclusions
This study contributes to the literature on benefit program non-take-up behavior. Because this literature relies on survey data, it suffers from measurement error if respondents do not reveal their true use of welfare benefits. We inspect this issue for the case of a general welfare program using linked representative survey and administrative data. Given that our welfare receiving respondents are aware of the administrative origin of the sampling, misreporting might be smaller here than in other data settings. For households with linked survey and administrative data, we simulate a non-take-up rate based on survey information of 40%, which is in line with results found for comparable benefit programs in other countries (see, e.g., Eurofound 2015). The data linkage shows under-as well as overreporting of benefit program take-up in the survey, whereas we find an underreporting rate of 7.6% in the survey data. Correcting the survey responses for mismeasurement of benefit program take-up (under-and overreporting) reduces the simulated non-take-up rate to 37%. We use the information on misreporting on program take-up to test whether the results of take-up regressions differ depending on the treatment of misreporting households, i.e., depending on whether we recode them as take-up (underreporting) or non-take-up (overreporting) households in our outcome measure. We estimate pooled and panel instrumental variable probit models and calculate marginal effects. When we compare the estimation results obtained with corrected and uncorrected dependent variables, we find that the absolute difference in marginal effects is often statistically significant and large. In relative terms, many marginal effects change by at least 30% after the correction. These results hold up to various changes in the empirical approach: We used different estimators (e.g., with and without random effects), and offer a variety of robustness tests to account for potential selectivity problems and sensitivities of the data linkage procedures.
We find that the patterns of the changed marginal effects mostly agree with the correlation patterns underlying misreporting behaviors: Households, whose head of the household is a first-generation immigrant, with small benefit claims and high monthly rents, or with young or highly educated household heads are particularly likely to underreport their benefit receipt. In terms of overreporting, we find an opposite pattern: Most characteristics that positively correlate with underreporting have a negative effect on overreporting and vice versa.
The marginal effects of the characteristics which are correlated with misreporting also changed in a statistically significant way when we corrected our take-up outcome measure for misreporting. This agrees well with the literature showing that households close to the labor market or with a risk of benefit confusion tend to misreport or to not take up their benefit Wiemers 2012, 2017;Bruckmeier et al. 2014). The mechanisms of these groups' underreporting program take-up are different and individually plausible. Because German naturalization rules require that applicants should be able to support themselves and do not rely on social transfers or meanstested benefits, there may be a high perceived cost connected to admitting benefit receipt for immigrants (see, e.g., Riphahn and Saif 2019). Also, households who are close to the labor market, e.g., higher educated and younger persons, may suffer from (perceived) stigma effects and work the hardest to avoid transfer dependence. Those receiving several social transfers at the same time may not be able to keep track of the specific transfer programs from which they benefit, particularly if transfer eligibility changes in short intervals. Overreporting, in contrast, would be more likely among households who confuse different benefit programs or accidentally err on the timing of benefit receipt.
In sum, the analyses based on our linked data suggest that research concerned with take-up and its determinants needs to account for potential misreporting of benefit receipt. The marginal effects in regressions of benefit take-up may well be biased unless the outcome measures can be corrected for misreporting. The bias in surveybased estimations of take-up equations depends on the extent of mismeasurement and the correlates of the propensity to misreport program take-up with the characteristic of the analyzed population.
Take-up analyses are often determined by an interest in distributional effects of government transfers. If the indicators of certain parts of the distribution are more likely mismeasured due to misreporting of benefit receipt, the results for these groups tend to be biased. We are among the first to show such patterns empirically, which are important for the correct interpretation of distributional analyses of government benefits.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix 1: Simulation of welfare benefit eligibility/simulation of entitlement
We base our simulation on waves 2 to 7 of the Panel Study "Labour Market and Social Security" (PASS). In these six waves, 55,069 household interviews were realized. In general, the simulation of welfare benefits using survey data requires several sample selection steps (Table 5).
We first restrict the sample to households for which both household and personal interviews are available. We further drop households headed by individuals over the age of 65 years because they are not eligible according to the rules of the German minimum income support program Unemployment Benefit II (UB II). If households do not answer the question on benefit receipt, we are not able to distinguish between take-up and non-take-up. Therefore, we omit household observations with missing information on benefit receipt. Next, we drop students and apprentices because they must apply for different benefits. Furthermore, we drop households in which not all members participated in the survey. Households in which more than one "community of need" exists are also omitted. The community of need is the legally relevant unit in the means test (for a definition, see Social Code II § 7(2, 3)); in most cases, it is identical with the household. The community of need consists of singles and their partner and their children up to age 24 years. Since we cannot assign household incomes and household wealth across different communities of need, we keep only those households in the sample that consist of exactly one such community.
For the computation of household income, which is relevant to the means test, we need information on household incomes as reported in the household questionnaire and on personal incomes of household members as reported in individual questionnaires. Thus, we must drop households with missing information on these variables. After these sample selection steps, the simulation sample is composed of 30,878 householdyear observations. Our simulation procedure implements the eligibility rules for UB II benefits. A household is eligible if the household's total need exceeds the income and the household's wealth remains below the household-specific maximum. Additionally, we simulate eligibility for means-tested housing allowance and supplementary child allowance ("Kinderzuschlag"), which must be claimed prior to claiming UB II benefits. Including these prioritized benefits in the simulation procedure is important to assess UB II eligibility correctly: The household can only claim UB II if benefits from the housing allowance (possibly in combination with the supplementary child allowance) do not cover its basic needs.  Respondents were interviewed between December and July (waves 2 and 3) and between February and September (waves 4 to 7). Unweighted results Respondents who agreed to linkage in wave t were linked in all subsequent waves. Respondents who did not agree to linkage in wave t where asked once again in the subsequent wave. If they declined again, they were not asked about linkage again. Unweighted results In the first step of our simulation procedure, we calculate the total needs of each household. We determine total needs as the legally defined regular personal needs of household members, additional needs, and housing costs. Next, we consider an additional national standardized benefit for single parents, which varies with the age of their children. The PASS data provide information on the household type and the age of children in the household. Households also report their housing costs. We employ the reported monthly rent for tenants and the reported monthly repayment of mortgage loans for homeowners. Furthermore, we consider reported housing costs for all households. We do not consider additional needs that can be claimed by households in "special circumstances." Special circumstances include certain disabilities, pregnancy, and special dietary needs for health reasons. The incidence of these special benefits is very low: In 2007, approximately 5% of all households received one of these benefits, and the average benefit was 50 euros per month (Statistik der Bundesagentur für Arbeit 2008). Therefore, we can assume that ignoring them will not alter our results considerably.
In the second step of our simulation, we determine the household income relevant to the means test. Household income consists of the sum of all the individual incomes of household members. We apply the earned income exemption rules, which depend on family type and vary with gross wages. All other reported types of income-capital income, rental income, public, and private transfers-reduce the benefit by 100% of their amount.
In the third step, we check whether the household's wealth exceeds the allowable wealth. Each household reports its total financial wealth in each survey wave. The answers are coded in wealth brackets. We assume that the mean of the reported wealth bracket represents the household's wealth and compare it with the individual maximum allowable wealth. The calculation of total allowable maximum wealth depends on household structure and age only. We are not able to consider housing wealth in the simulation due to missing information.
In step four, we calculate entitlements to the housing allowance and the supplementary child allowance and compare the results with those calculated for UB II. If the combined amount of the former two benefits exceeds the UB II benefit, the household is ineligible for UB II.
Following these steps, our simulation classifies 17,585 out of 30,878 households as eligible for UB II. To assess the quality of the simulation we calculate the type II or beta error, i.e., the share of households simulated as ineligible among all households who do report benefit receipt (see, e.g., Bargain et al. 2012, Frick andGroh-Samberg 2007). The share of misclassified households provides an upper bound of the simulation error because beta errors can also be caused by administrative errors in the assessment of eligibility or false answers provided by the respondents in the survey. Our simulation yields a beta error rate of 3.8% (weighted) or 4.6% (unweighted). We interpret these small beta error rates as indicative of a high simulation quality. For comparison, based on the German Socioeconomic Panel, Frick and Groh-Samberg (2007) report a beta error rate of 12.6%.
In addition to quantifiable errors, our simulations may be subject to non-quantifiable errors. These may result if the simulations mispredict transfer eligibility or if the information on household income and wealth is incorrect. If, e.g., respondents underreport household finances, then a simulated benefit eligibility may be wrong and the observed non-take-up rate is overestimated. However, this will not affect our evaluation of household misreporting on transfer receipt, since these households will not appear in the administrative data as benefit recipients.

Appendix 2: Data linkage
In the person interview, the PASS asks respondents aged 15 to 65 years for their consent to link administrative data from the Federal Employment Agency to their survey data. The question reads as follows (English translation, see Sakshaug and Kreuter 2012): To keep the interview as brief as possible, the Institute for Employment Research in Nuremberg could merge the study results with data about your times of employment, unemployment or participation in measures by the employment office (Arbeitsamt). For the results of this study it would be a great advantage. For reasons of data protection this cannot be done without your agreement, which I kindly ask you to provide. This is of course just as voluntary as the interview you are so kind as to give us. Of course, you may withdraw your consent at any time. It goes without saying that all rules of data protection and of the de-personalization of the results reported apply to these additional data as well. So, may I write down your answer: Do you agree to the use of these additional data?
Overall, in the PASS, the share of respondents who agree to merge their data is approximately 80% (Berg et al. 2014). Our sample of 17,585 simulated UB II eligible household-year observations consists of 8318 different individual households. Table 6 shows the sample of 8318 UB II eligible households how often the consent question was asked in each wave and how many respondents agreed. The number of times the consent question was asked (9349) exceeds the number of eligible household observations, as participants who did not agree to the data linkage during their initial interview are asked once more in the next wave. We find a high average consent rate of approximately 83% in our sample.
In the sample that we use for the analysis, we can link 16,874 household-year observations of simulated eligible respondents to the administrative data, i.e., 96% of our 17,585 UB II eligible household-year observations. Thus, our rate of linkage is substantially higher than the average consent rate. This happens for two reasons: First, a consent given once holds for all future and past waves of the PASS. Second, participants who do not agree to the data linkage are asked again in the next wave. Only if households refuse to give their consent in two consecutive waves is the question no longer repeated in future waves.
In the next step, we merge the observations of respondents who agreed to the linkage to a key file, which identifies respondents in the administrative data. The German Record Linkage Center provides this file, which utilizes several administrative data sources collected by the Federal Employment Agency (BA) (Antoni et al. 2016).
The matching variables used in the linkage are a person's first name, last name, zip code, city, street name, house number, sex, and an indicator for the birth cohort . These variables are available in the sampling data and in the administrative data. For the PASS sample drawn from UB II recipients, an additional household identifier is available. The linkage follows a stepwise procedure with variation across the number of matching variables and record linkage processes. Antoni et al. (2017) and Sakshaug et al. (2017) describe the linkage processes. They label a match "goldstandard linkage" if it is based on an exact match of the household identifier, name, sex, and date of birth. This highest quality match is possible only for households in the UB II sample. Observations that cannot be matched by the gold-standard linkage are matched based on "deterministic linkage." This procedure uses first name, last name, zip code, city, street name, house number, sex, and the birth cohort indicator. Both gold-standard and deterministic linkage should result in highly reliable results. For observations that could not be linked using these two procedures, distance-based and probabilistic linkage procedures are used, which match based on comparison functions using first name, last name, zip code, city, street name, house number, sex, and birth cohort. Table 7 shows the frequency of linkage procedures for our sample of simulated UB II eligible households. From our 15,925 matched observations, 13,089 observations (82%) are linked by the gold-standard match. Adding the 2073 observations which are linked by the deterministic match, our overall share of highly reliable matches (gold standard and deterministic) exceeds 95%. Because of this high share of reliable matches, we consider the overall match quality to be excellent.
Only 574 observations, mainly from the population sample, are matched based on the distance-based procedure. Finally, 189 observations are valid matches, but the type of match is recorded as missing in the data. Since there might be concerns about the reliability of these latter two types of matches, we provide a robustness check Linkage procedures for the sample of 15,925 simulated UB II eligible household-year observations with consent to data linkage by sample type (columns 3 and 4) and reported UB II receipt (columns 5 and 6). Unweighted results (Table 12) in which we keep only the gold-standard and deterministic matches in our estimation sample. In some instances, the matching procedure generated duplicate matches, i.e., a survey observation can have more than one valid match in the administrative data and vice versa. In our linked data, duplicate matches are resolved by choosing one of the duplicate observations based on gender, year of birth, and highest level of education. This affected 830 cases for which two survey respondents were assigned to the same person in the administrative data and 77 cases for which two persons in the administrative data were assigned to one survey respondent. As a robustness check, we reestimated the key results presented in this paper based on a sample in which we dropped all observations with ambiguous, i.e., duplicate matches. All results proved to be robust against this selection step (see Table 13).
One potential problem with the data linkage is that results may be biased because of selectivity in non-consent and non-identifiability in the administrative data. For the sample of simulated eligible households with consent to data linkage, Table 8 shows the correlates of the probability of not giving consent to data linkage and the probability that a household cannot be linked to the administrative data. The results indicate only a minor selection bias concerning the composition of simulated eligible households.

Table 10
Take-up regression: marginal effects before and after correction of misreporting (weighted results). Source: Own calculation based on PASS waves 2-7 Dependent variable: (1) (3)    Table 10 continued Dependent variable: (1) (3)    Asterisks */**/*** denote statistically significant results (standard errors in parentheses) using cluster-robust standard errors at the significance level of 0.1/0.05/0.01. Hh stands for head of household. The correlation of the error terms of the first-and second-stage equations is denoted "ρ 1,2 " and "Wald test of exogeneity: χ 2 (1)" shows the test statistic for the null hypothesis ρ 1,2 0. For the RE-IV model, "Panel variance share ρ" denotes the share of the total variance contributed by the panel-level variance component. "Subsample two" indicates whether an observation belongs to the second, nationally representative subsample. Columns 1 and 3 correspond to columns 1 and 5 in Table 3 and are repeated for convenience. Columns 2 and 4 show the estimates for the 17,585 household-year observations simulated as eligible for UB II and reduced by 1228 observations with missing values in covariates. Unweighted results

Table 12
Take-up regression: marginal effects when using only gold-standard/deterministic links. Source: Own calculation based on PASS waves 2-7 Dependent variable: (1)

Table 13
Take-up regression: marginal effects for sample without corrected duplicates. Source: Own calculation based on PASS waves 2-7 Dependent variable: (1) (3)  Table 13 continued Dependent variable: (1) (3)  Table 13 continued Dependent variable: (1) (3) Asterisks */**/*** denote statistically significant results (standard errors in parentheses) using cluster-robust standard errors at the significance level of 0.1/0.05/0.01. Hh stands for head of household. The correlation of the error terms of the first-and second-stage equations is denoted "ρ 1,2 " and "Wald test of exogeneity: (1)" shows the test statistic for the null hypothesis

Appendix 4: basic characteristics of different social programs
See Table 14.