1 Introduction and Review of the Literature

The policy relevance of social indicators has risen with the latest financial and economic crisis. They were awarded a prominent status in European politics with the European Commission’s Europe 2020-target for social inclusion (2010) and before that the Laeken indicator set on social inclusion (2001). The European Community Statistics on Income and Living Conditions (EU-SILC) are one of the pillars of social statistics in the European Statistical System and the most relevant household survey at the European level in the field of household income, living standards and poverty. Several indicators of social inclusion, amongst them the Europe 2020-target for the “risk of poverty or social exclusion”, are calculated annually on the basis of this source.

Being so highly recognized, those indicators are expected to fulfil high statistical standards concerning reliability, validity and comparability (both over time and between countries). The evaluation of measurement error in this context is therefore crucial. In our paper, we focus on the measurement of household income in EU-SILC and investigate differences between data collected using surveys and data collected from registers. For this purpose, we take advantage of the fact, that for the Austrian EU-SILC of 2008–2011, both register- and survey-based income data are available for the same observational units.Footnote 1 Using the differences in these measurements for households at the micro level, we aim to provide explanations for changes in different income-based poverty indicators by investigating the underlying changes in the distribution of household income as a consequence of using register data. First, by estimating multinomial logit and linear models with covariates referring to the income and employment structure, the interview situation (e.g. CATI vs. CAPI) and other household characteristics, we try to explain whether certain types of households tend to under- or over-report their household income when asked via the survey method. Second, we ask which component (income type, weighting) contributes most to the change in the poverty measurement if register data are used instead of survey data.

1.1 Differences Between Register and Survey Data, Measurement Error and Its Impact on Poverty

The identification of data errors requires by definition some a point of reference to judge the accuracy of the information. In most cases, administrative data are proclaimed to be the benchmark. Bound et al. (2001) distinguish between micro-level and macro-level validation studies for assessing measurement error. Micro-level validation studies usually define measurement error as the difference between the value recorded in administrative records and the value observed in the survey. Macro-level validation studies, in contrast, compare population parameters, such as income inequality or the sum of earnings, derived from the survey to official reports based on administrative records or to estimates obtained from a comparable survey. Existing studies on measurement error are mostly done for the US population and with a focus on personal or market income.

Mean-reverting errors with low earnings inflated and high earnings underreported are a common finding in such studies (Bound et al. 2001; Gottschalk and Huynh 2010; Kim and Tamborini 2012, 2014). Income volatility and income structure also matter: based on a survey sample for a developing country, Akke (2011) found that prior earnings volatility strongly affects measurement error in the current period. Moreover, there is evidence for a positive correlation between measurement error and the number of different income sources in the household (Moore et al. 2000).

Besides income-related variables, studies have also shown the importance of the survey duration and survey mode. In longitudinal studies, panel participants’ responses may increasingly begin to differ from their initial responses to the same survey questions due to learning effects in answering a complex questionnaire and/or by an improved personal relationship between respondent and (the same) interviewer (Sikkel and Hoogendoorn 2008; Chadi 2013). Such effects have been found for questions on life satisfaction (Frick et al. 2006; Landua 1992) and for subjective mental health (Wooden and Li 2014). For income, however, a longer participation in a panel does not necessarily result in higher accuracy. Measurement errors for income are usually found to be positively serially correlated in such studies for the US population (Pischke 1995; Bound and Krueger 1991). Whether this also applies for representative survey data for a population sample in a European country will be investigated in our paper.

Mode effects refer to the type of interaction between interviewer and interviewee. Existing studies focus on differences between CATI and CAPI and on the relevance of proxy interviews for income measurement error. The literature has shown that respondents to CATI are more likely to present socially desirable responses (Beland and St-Pierre 2007; Groves et al. 2009; Holbrook et al. 2003). A study for Austria found that telephone interviews lead to a larger downward bias concerning income inequality (Fessler et al. 2013). For proxy interviews, however, a more ambivalent picture emerges (Brown et al. 2001; Tourangeau et al. 2000). On the one hand, proxy interviews may enhance data quality because there is less social desirability pressure and thus a lower likelihood of mean-reverting errors. On the other hand, income of other household members can easily be overlooked due to recall error or interview fatigue. Some (and mostly older) studies have found only little proxy bias in earnings (Bound and Krueger 1991), whereas more recent studies show that proxy interviews bias earnings downwards (Reynolds and Wenger 2012) and their effect also interacts with demographic variables (Tamborini and Kim 2013).

Furthermore, differences in income inequality are observed when survey and register data are compared: Gottschalk and Huynh (2010) discuss the implications of measurement error in surveys on earnings income inequality. By matching the US Survey of Income and Program Participation to tax data, they find that income inequality is 20% higher in the register data. Based on a random sample from the Danish population, Kreiner et al. (2013) compare a one-shot recall question on total personal income (employment income, pension income, social transfers) with the corresponding tax records of the respondents. The authors find a lower mean and a lager spread for the survey measure.

A smaller number of studies are concerned with total household income and the consequences of using register data instead of survey data for the calculation of household income and poverty indicators. The studies available also differ in their validation methodology. In sum, the measurement error of income has been shown to affect cross-sectional poverty rates (Nordberg et al. 2001; Figari et al. 2012), poverty dynamics (Rendtel et al. 1998; McGarry 1995; Breen and Moisio 2004; Worts et al. 2010) and statistical relationships of poverty indicators with other variables (Lohmann 2011). Nordberg et al. (2001) found that income estimates derived from administrative records are quite reliable and generally higher than surveyed income, except for very low register incomes. They interpreted the differences observed as being mainly due to measurement errors in the interview data. Their results showed that survey data produced higher inequality and poverty estimates than register data. Lohmann (2011) makes use of between-country differences concerning data sources for income variables (register or survey data). Results show that the degree of consistency between earnings and employment status (i.e. no earnings reported if the status is non-working) is on average lower in register countries; this also impacts on the poverty rate conditional on activity status in some countries. The author concludes that the relationship between employment status and poverty status also depends on the data collection approach used. Figari et al. (2012) compare empirical estimates of income distribution and poverty rates based on microsimulation methods with observed survey-data-based estimates. The authors use simulated estimates in their model in accordance with prevailing rules on liability and eligibility in four European countries. On the one hand, their results show that poverty rates, defined as the number of people with equalized incomes less than 60% of the national median, which use reported data are slightly higher than those calculated using simulated incomes. On the other hand, there was an overlap of 75% for both approaches.

1.2 Characteristics of Register Data and Implementation in EU-SILC

Assuming register data to be a less error-prone source for validating survey questions on income, however, may not always be justified (Abowd and Stinson 2013; Kapteyn and Ypma 2007) and depends on the context of data production. Since administrative registers are not initially built to answer certain research questions, they should not be expected to provide perfect statistical data (United Nations 2007; Wallgren and Wallgren 2007; Zhang 2011). Abowd and Stinson (2013) identify three potential causes for deviations from survey data which must not be confounded with different levels of measurement error: a) definitional differences between survey and register data—like taxable income relevant for a wage-tax register versus actual disposable income from a standard-of-living perspective; b) errors in administrative data itself (e.g. coverage issues and updating intervals) and c) and mistakes in the matching process of multiple data sources.

In the European Statistical System, common definitions and methods have been agreed upon in order to facilitate the comparability of poverty indicators and income between countries. EU-SILCFootnote 2 comprises several variables of personal and household income components and is conducted in all 28 member countries plus several more.Footnote 3 In cooperation with the National Statistical Institutes (NSI), Eurostat aims to maximize the comparability of indicators across the participating countries through the output harmonization of variables (i.e. providing/developing explicit conceptual definitions of what to measure, namely so-called “target variables”, as opposed to specifications of how to measure them) and agreements on various methodological aspects like sampling, weighting and precision requirements. However, whereas detailed rules for the content of variables and the construction of those indicators exist, the source of income data—amongst other parameters—is up to the Member States. As a consequence, some countries mainly use official registers, whereas other countries mostly (have to) rely on survey data to fill specific income components. Thus, the heterogeneity of the data sources is something of an obstacle to their comparability, though it may lead to a good overall level of data quality in the outcome indicators.

When EU-SILC started in 2004, only few countries were using registers; but nowadays ever more Member States are making the step towards integrating register data into their SILC data collections. Studies investigating the impact of register use on measurement error are therefore vital. Törmälehto (2013) draws four main conclusions for the context of EU-SILCFootnote 4:

  1. 1.

    Integrating register data in a data collection may affect multiple phases of a survey process: sampling and weighting (as new information from registers can be used e.g. to design the sample), non-response analysis, calibration of weights, survey designs (as the potential for dropping questions from the survey may alter the whole “flow”), processing and quality control, imputation, dissemination and documentation.

  2. 2.

    It is challenging to generalise about quality of registers in a cross-national context.

  3. 3.

    There is a lot of variation concerning the particular data sources for specific variables. Register data may originate from survey-like data collections (e.g. self-administered questionnaires) but also from entirely electronic exchanges of administrative data.

  4. 4.

    The combined use of survey and register data affects the total survey error (Groves et al. 2009), and expands the traditional survey error sources to those related to registers. To explain this, Törmälehto (2013) also cites Zhang (2011), who proposed an addition to Groves’ Total Surveys Error model. While Groves’ ideas were designed for the context of (sampled) survey data, Zhang (2011) further develops and applies them to error sources associated with register data (e.g. problems of conceptualization, measurement, and accuracy). Zhang proposes a “two-phase life cycle of integrated statistical micro data” where the first phase concerns the data from each single source, and the second the integration of data from different sources. Register data could be used as a benchmark against which survey data could be compared to estimate the magnitude and predictors of measurement error in a given country. However, it should not be expected that register data themselves are not prone to (other) sources of error and that the combination of register and survey data leads to perfect statistical data—on the contrary: “At the present stage, there is still clearly a lack of statistical theories for assessing the quality of such register-based statistics.” (Zhang 2011: 446). In sum, there could be more sources of error when using register data, but usually the expectation is to have a lower total error due to fewer measurement errors.

1.3 Effects of Register Data Use in EU-SILC: A Comparative Perspective

Some countries have a longer history of register data use than others, mostly for legal and administrative reasons: Denmark, Finland, Island, Netherlands, Norway, Sweden and Slovenia are those that started with administrative data in SILC right away (i.e. from 2004/05). Then there are those who joined in more recently: Italy gradually since 2004, France since 2008, Austria since 2012, and Spain since 2013. Although the “old” register countries encountered the same challenges,Footnote 5 we focus on those countries that have made the transition from survey income in more recent years and therefore have SILC waves with different income sources to compare.

In Spain, the new methodology, where register and survey income information is combined, is considered a more comprehensive method of collecting income in lower and in higher parts of the distribution (Méndez 2015). Income levels are significantly higher than when using the survey approach but inequality indicators, like the risk of poverty, remain similar. Similarly, the French experience (Burricand 2013) showed that the change in methodology did not have a significant impact on the poverty rate, while other inequality indicators increased. Differences between the two income sources—registers versus surveys—were more important in the extremes of the distribution than in the mid-range, and for some income components (pensions) than others (wages). In Italy (Consolini and Donatiello 2013), the inclusion of register data produced a substantial increase in the estimate of average income among self-employed earners, while the increase for employees was less pronounced. At the same time, the use of a mixed data-collection strategy versus survey data only resulted in a substantial decrease in the risk of poverty and Gini coefficient. Only about half of all persons were at risk of poverty according to both methodologies; the others had a different status with each methodology.

1.4 Conclusions Drawn from the Literature Review

To sum up, the prevailing literature in the field highlights the following problems related to measurement error in income: (a) errors explained by data collection methods (e.g. type of question—yearly vs. current, simple vs. complex; source for income variables—register or survey or any combination of both); (b) problems caused by panel design and relevant for measuring poverty dynamics correctly; and (c) challenges concerning cross-country comparisons. Furthermore, two main conclusions can be derived from existing SILC studies and similar surveys: (1) the effect of register data is generally more visible in the lower and upper extremes of the household income distributions and varies for different income components; (2) the effect of register data use on income inequality and poverty indicators varies between countries.

We add to the literature in several aspects. All of the studies on household income and poverty indicators discussed above use either microsimulation or some variant of Markov modelling to capture measurement error. In this paper, survey data is directly validated against register data at a micro level. Moreover, the consequences of deviations for estimating poverty indicators are investigated. The focus of the analysis lies on equivalised household income. Additionally, studies that report on situations where consent from sampled individuals is necessary to link a survey with register data—as in the US—do not apply in our case, as giving or withdrawing consent introduces a further burden and potential bias. Seeking respondents’ consent is not legally required in Austria for a voluntary survey like SILC. This allows for a more complete comparison of survey and register data across the income distribution and socio-demographic groups. This article thus aims to pave the way to a better understanding of the specifics for Austria and to contribute to a more comprehensive picture in EU-SILC and other large European surveys.

The remainder of the article is structured as follows: Sect. 2 describes the development and specifics of register use for SILC in Austria as well as the context of the re-calculation of household incomes for 2008–2011. It then goes on to illustrate the household income concept and its components. Section 3 describes how the analysis addresses the main research questions. The fourth section is divided into two parts. The first part illustrates the effects of the data switch on the aggregate poverty rate and the underlying statics of the distribution of household income. In the second part, the results of both cross-sectional (for 2010) and longitudinal (2008–2011) regression models for the observed income differences are discussed. The robustness of the main results is further evaluated against different model specifications and statistical tests. The outcomes of these tests are summarized in the fifth section. Section 6 concludes and describes limitations of the current study and suggestions for further research.

2 Data

2.1 Register Use in the Austrian EU-SILC

SILC Austria was launched in 2003. At that time, income components were exclusively collected through surveys (CAPI, voluntary participation). Since 2008 CATI interviews have been used for the panel part of the survey. Survey data for particular income components were first substituted by register data in 2012 (Heuberger et al. 2013). The main reasons for gradually switching to register data were quality and response burden considerations.Footnote 6 Together with the Federal Ministry for Labour, Social Affairs and Consumer Protection, Statistics Austria decided to recalculate and revise income data for 2008–2011 using administrative registers. One main target was to shift the break in time series of the EU 2020 Social Inclusion Indicators further back to the baseline year 2008, resulting in parallel data for the same respondents for 2008–2011. In this paper, we compare the Austrian EU-SILC data from 2008 to 2011 before and after that revision. The sample size ranges between 13,621 and 14,085 individuals nested in around 5700–6100 households in these years.

Austrian law stipulates that the linkage of personal micro data from surveys with registers be done using an anonymized personal identifier (bPK). However, unlike in many other countries including the USA, it is not necessary to seek consent of the interviewee for the linkage procedure.Footnote 7 In order to identify individuals in the administrative datasets, the personal identifier of people covered by the survey is required. Usually, this information is collected as part of the sampling procedure. The share of identifiers found for the total population in the survey varies over time but generally decreases the farther back the survey year is. For the years analyzed here it ranges between 96% (2008) and 99% (2011). However, there are always individuals who turn out to be actual household members but are not covered by the sampling frame (mainly because they are not officially registered at a particular household’s address). Their linkage key is missing ex ante and must be retrieved by a procedure involving the Federal Ministry for the Interior. For 2008–2011 missing keys most often occur among younger people, persons living in ViennaFootnote 8 (capital) and persons with non-Austrian citizenship. With the exception of EU-SILC 2011, the proportion of missing keys for women was higher than for men among all age groups (by about 1 percentage point). As a consequence, using register data results in an under-reporting of income data for those groups. Most households with a reported household income of zero Euros receive an imputed income during the data editing process.Footnote 9 However, for unlinked individuals in households with linkable people the under-reporting remains, resulting in household incomes that are too low on average.

Register data use affects both (1) the sources of the household income and (2) weighting. For the calculation of sample weightsFootnote 10 the wage tax income (number of recipients of income from employment and pensions) was used as a new marginal distribution to improve the consistency of the results compared to the income tax register (distribution of income) and to even out selective nonresponse bias for certain groups.

2.2 Measuring Household Income and Poverty in EU-SILC

Total household income in EU-SILC is calculated as the sum of earned income, capital gains, pensions and public social transfers minus taxes and social security contributions plus the net flow of (paid/received) alimonies and other (paid/received) private transfers between households (UNECE 2011). Equivalised household income is defined as a household’s disposable income divided by the sum of consumption equivalents of that household using the modified OECD scale to reflect economies of scale: for each household the first adult receives a weight of 1, each additional adult gets a weight of 0.5 and each (additional) child under 14 years receives a weight of 0.3. The poverty indicator is based on the distribution of the equivalised household income. If it is below 60% of the median of its distribution, a household and all of its members are defined as poor. The poverty rate refers to the weighted percentage of people in poverty in a population.

Table 1 provides an overview of the data source for each income variable for 2010 before and after the revision. Income components highlighted in grey were derived from registers for the revision.Footnote 11 Some variables are exclusively based on survey data due to the unavailability of register data sources or because of methodological reasons (e.g. the time lag for receiving final data for self-employment income PY050 is too long). When speaking of “register based household income” in this paper we refer to the revised household income compiled from this combination of register and survey data. However, we argue that this is sufficiently justified as the sum of all components from register data is rather high as percentage of the total amount (between 86.4% in 2011 and 87.5% in 2010).

Table 1 Calculation of the total household income in EU-SILC 2010

Income from employment and family benefits are most common, with approximately 50% of the population receiving each of these types of income (Table 1).Footnote 12 In general, the total weighted sum of household income captured by surveys is markedly lower than the register data. Furthermore, registers show a noticeably higher share of recipients of unemployment benefits, sickness benefits and employee income.

3 Methods and Hypotheses

3.1 Steps of the Analysis

The analysis comprises two main parts. The first part (Sect. 4.1) examines the factors that explain the observed differences in equivalised household income between register and survey data. Second, as the total household income is the source for calculating the poverty rate we then go on (Sect. 4.2) to investigate how the poverty rate and other income statistics are affected if the data source for income components is changed from surveys to registers.

For 56.5% of individuals (57.5% of all households) a negative difference (survey < register) for equivalised household income is found, whereas a positive deviation (register > survey) occurs for 42.8% of all individuals (41% of the households). Plotting means and medians of these observed differences along twenty income percentiles based on register data shows a clear tendency among higher-income groups to under-report/underestimate their income and vice versa for lower-income groups (Fig. 1). This pattern is more systematic for under-reporting than for over-reporting.Footnote 13

Fig. 1
figure 1

Weighted data. Median and mean of absolute deviation for equivalised household income for 20 quantiles derived from registers (2010). Persons are units of observation: b N = 8078, c N = 5913. Figure 4 in the “Appendix” contains the difference between data sources if equivalised household incomes are measured in logs. This procedure takes into account the current level of income and illustrates the relative deviation. A similar pattern then occurs, although the deviation in the upper tail of the distribution is markedly lower

Differences between survey data and register data are regarded as measurement error. We differentiate between two main explanations for measurement error (Bound et al. 2001): social desirability (aka interviewer bias; Groves et al. 2009) and cognitive errors (misunderstanding, retrieval and calculation problems). The multivariate analysis in Sect. 4.1 takes a closer look at this issue. By including (and thus controlling for) other possible determinants, we aim to investigate whether social desirability or cognitive error is more relevant as the mechanism behind the observed income differences between data sources.

The multivariate analysis uses three types of regression models. First, separate multinomial logit models (with alternative-invariant regressors) are estimated (Cameron and Trivedi 2005: 500). The dependent variable has three categories that mirror three groups of households for which odds-ratios, conditional on socio-economic characteristicsFootnote 14 and some survey mode aspects (see the next section), are estimated. It is coded “0” if the relative deviationFootnote 15 of the survey income from the register income lies within the range of 0.95–1.05 (“almost perfect identity”Footnote 16—reference category); “1” if above 1.05 (“over-reporting”) and “2” if below 0.95 (“under-reporting”).

Second, we also take a direct look at the magnitude of metric differences of equivalised household income between data sources and estimate OLS models with the same set of explanatory variables as in the previous step. This also makes it possible to alter the functional form of the regression between explanatory variables and income differences. We present specifications where both the income from registers (independent variable) and the difference between surveys and registers (dependent variable) are measured in absolute terms (Table 2; models 2, 5) and natural logs (Table 2; models 3, 6). The latter estimates coefficients that represent the effect of a 1% change in the independent variable on the corresponding  %-change of the dependent variable. Wald tests of the null hypothesis that the two alternatives (over-reporting vs. under-reporting) can be combined for all pairs of alternatives were rejected.Footnote 17 Thus, positive and negative differences are modelled separately. Negative absolute differences (survey < register), however, have been converted to positive values to facilitate the interpretation of coefficients.

Third, the panel dimension of our dataset is exploited by applying panel regression models with household fixed effects and time fixed effects. Due to the focus on within-household change over time (as compared to between-household differences in the cross-section), panel regression models allow controlling for household characteristics that are not observed in the dataset but are constant over time (e.g. cognitive ability of all household members or past experience with surveys). This helps to increase the consistency of the estimates and decrease a (possible) selection bias for the independent variable(s) under investigation (Hsiao 2014).

Data for SILC Austria come from a representative probability sample that involves stratification (based on federal states and interviewer regions) and features households as the primary sampling unit (Glaser and Till 2010). We consider these complex sampling design features by applying STATA’s survey procedure to our regression estimation commands (Kreuter and Valliant 2007). This also implicates the use of sampling weights for all cross-sectional analyses (descriptive statistics and regression models). Households are the units of observation in all regression models. In the cross-sectional regression analysis, we focus on the year 2010 as this is the most recent year available with a high share of successful links of person-specific IDs to their register entries (97%) and where the data source differs for maximum number of income components.Footnote 18 Full tables for all years are provided in the online supplementary materials.

3.2 Modelling Household Income Differences: Explanatory Variables and Hypotheses

The main focus of the regression models is on the effect of equivalised household income taken from registers on over-/under-reporting. Based on the social desirability argument, it is expected that households with low incomes tend to report a higher income than they actually have. On the other hand, households with a higher income are expected to make themselves “poorer” in the interview situation. In the multivariate analysis we aim to control for variables that are correlated with income and the observed measurement error. This allows to adjust the observed effect of income on measurement error (Fig. 1) for those dimensions that are related to cognitive error (e.g. number of income components, job changes) or other interview effects (e.g. no. of years in the panel). If there remains a significant effect of income on over-reporting/under-reporting, this could ceteris paribus be interpreted as evidence for social desirability bias related to the level of income. Furthermore, we test whether effects that are found in the literature for mode variables but for different samples or different types of income (Sect. 1.1) are also present in the SILC data.

The remaining explanatory variables in the models can be differentiated into four groups. The first group is related to the structure of the household income. It is expected that households with a lower number of different income sources (see Table 1 for all components) should have a less complex income situation and thus be less likely to have measurement error (see Sect. 1.1). Moreover, we assume an underlying social norm that makes poverty undesirable (Bosma et al. 2015; Sutton et al. 2014; Walker et al. 2013) and thus hypothesize that a lower level of satisfaction with household income increases the likelihood of over-reporting. As different income components represent different proportions of the total household income (Table 1) and also different magnitudes of measurement error,Footnote 19 we also include a categorical variable which captures the income source with the highest share of total household income.

The second group of variables is related to employment status. Main employment status refers to the selected respondent for household variables (according to the interview guidelines in SILC). Changes in the main employment status during the income reference period are aggregated over all household members. In addition to the number of different income sources, these two variables can serve as a proxy for how much fluctuation occurs in yearly income streams which could make recall problems more probable. Thus, it is expected that retirees and people who mainly do housework have the lowest likelihood of misreporting. Similarly, households with a higher number of changes in labor status are expected to be more likely to have measurement error. The direction of effects for these variables is assumed to be the same, regardless of whether they are positive or negative deviations.

The third group of explanatory factors contains information on the interview situation: the total number of proxy interviews, a binary indicator which indicates whether the household was surveyed with CATI (rather than CAPI), the interview month and how often the household has already participated in SILC. The discussion of existing studies (Sect. 1.1) has shown that the evidence for the effects of proxy interviews on income measurement error is mixed and may depend on whether (lower) social desirability or recall problems are the main driver behind the effect. We thus do not assume any ex-ante hypothesis for proxy interviews. Based on the literature review, it is also ambivalent whether there will be less income measurement error in our data set with an increasing number of panels rounds a household participated. However, we expect that a longer distance between the interview month and the last income reference year is generally associated with a higher likelihood of reporting errors due to recall problems. Concerning mode effects, the literature has demonstrated that respondents to CATI are more likely to present socially desirable answers and that this can lead to a downward bias concerning income inequality (Sect. 1.1). Furthermore, CATI does not allow the use of visual aids may leave the respondent less time to check income records resulting in an enhanced likelihood of cognitive error. Thus, CATI is expected to increase measurement error in general and to be associated with both more over- and under-reporting than CAPI.

The fourth group of explanatory factors comprises socio-demographic characteristics. These mainly serve as control variables for income. Thus, we do not make any assumptions on their effects. Furthermore, consistent evidence for some socio-demographic variables (e.g. sex, children in the household) is lacking (Bound et al. 2001). Age and sex refer to the response person for household variables in SILC. Education measures the highest completed level of education in the household. Health is measured as the household’s median of an ordinal variable capturing the self-assessed health of all household members over the age of 15. Household structure and population size at the place of residence are measured directly at the household level.

4 Results

4.1 Explaining Household Income Differences

The primary focus of this section is on the effects of income, income-related variables (income satisfaction, income structure, employment status variables) and the interview context (e.g. mode). We do not comment in detail on the outcomes for the remaining socio-demographic variables as they mainly serve as controls in the models (see Sect. 3.2).

4.1.1 Negative Deviations (Under-Reporting)

From Table 2 it is evident that, even after controlling for a variety of other variables, both the likelihood of under-reporting (compared to a close identity of incomes) and the magnitude of under-reporting significantly increase with rising register income. This outcome is stable for all four years. For instance, a one percent increase in equivalised household income raises the odds of reporting a lower income than actually found in administrative records by a factor of 3.667 (column 4). Every additional available Euro increases the difference between data sources in this group by approximately 22 cents (column 5). Similarly, a one percent increase in personal income raises this difference by almost 2 percent (column 6).

Table 2 Results from cross-sectional regression models (2010)

In the multinomial logit models (column 4), other statistically significant effects are found for households with the main income source coming from self-employment or private sources (as compared with those receiving old-age benefits). Increasing satisfaction with the household income lowers the odds for under-reporting compared to approximate equivalence between data sources. If the main activity status of the person answering the household questionnaire is housework, this raises the odds of under-reporting. A change in the labor status of the household also increases the odds of under-reporting, as does being unemployed.

The evidence for variables related to the interview context is mixed. A higher number of proxy interviews increases the odds of under-reporting, whereas the effects of CATI are statistically not significant.

The results of the OLS models (column 6) for metric differences resemble the outcomes of the multinomial logit models for the most part. The higher the number of different income components in the household, the lower the magnitude of under-reporting. This outcome may also suggest that using a detailed collection of income components to calculate the total household income (vs. a single question) does not affect data accuracy very much. The higher the satisfaction with the household income is, the lower the magnitude of under-reporting. The same applies if the main income source of the household is either self-employment or private resources (in comparison with old-age benefits as reference category). Housework or being in education as the main activity status of the respondent person for the household questionnaire and changes in labor status are associated with higher income differences. Contrary to expectations, the number of household members with more than one employment activity in the survey year has a negative effect on under-reporting. Households where the response person was unemployed for most of the income reference period have a significantly higher amount of under-reporting than households with the response person working full-time. Finally, when looking at variables closely related to the interview procedure itself, we do not find any relationship of the dependent variable with CATI. In contrast to the multinomial logit, the effects of the number of proxy interviews are not significant anymore.

4.1.2 Positive Deviations (Over-Reporting)

Both the likelihood of over-reporting (compared to a close identity of incomes) and the magnitude of over-reporting significantly decrease with rising income. For instance, a one percent increase in equivalised household income lowers the odds of reporting a higher income than actually found in administrative records by a factor of 0.244 (Table 2, column 1). A one percent increase in personal income lowers the difference by roughly 0.5 percent (column 3). In absolute terms (€), the relationship between income from registers and the magnitude of over-reporting is negative and marginally non-linear, i.e. the effect size slightly decreases with rising income (column 2).

In contrast to under-reporting, various significant effects are also found for other income-related variables. A one unit increase in satisfaction with household income is generally associated with a small but statistically significant increase in the odds of over-reporting (column 1) and a significant increase in the magnitude of over-reporting (columns 2, 3). If the main income source of the household is private resources (as compared with old-age benefits as the reference category) this raises the difference between data sources substantially (9869.80 €). Smaller significant positive effects are also found if income from employment or social transfers is the main income source. In contrast, to what was expected in Sect. 3.1, a rising number of different income components in the household decreases the absolute difference in the case of over-reporting.

Similar to the models for under-reporting, changes in household members’ labor status during the income reference period slightly increase the odds of over-reporting. Moreover, households where the response person was unemployed or in education most of the time during the income reference period, have a significantly lower amount of over-reporting than households with the response person working full-time (column 2, 3).

Variables related to the interview context do not seem to strongly influence the outcome variable. A statistically significant but rather weak positive effect on the odds of over-reporting was only found for the sum of proxy interviews in the household (column 1 & 4). CATI is not significant.

Taking together all model results for over-reporting and under-reporting three main conclusions can be drawn. First, income effects are as expected and also mirror the descriptive analysis: the magnitude of under-reporting rises with increasing income, whereas an opposite correlation can be found for over-reporting. However, the latter statistical relation is weaker in magnitude when compared with under-reporting. Second, for the OLS models (col. 2, 3, 5, 6) a generally higher model fit is observed for under-reporting. Third, variables related to the interview context play only a very modest role in explaining the dependent variables.

In a final step, we exploit the panel dimension of our dataset. 28.1% of all households in the unbalanced panel sample (at least 1 participation 2008–2011) have a mix of both positive and negative deviations from their equivalised household income as derived from administrative records. 28.5% have only positive deviations and 41.3% have only negative deviations. 2.1% have complete conformity for their equivalised household income from both data sources over their observation period. All panel regression models are estimated at the household level and control for time-constant unobservable household characteristics. The primary focus is on the effects of income and time (‘learning effect’; i.e. whether the difference between survey data and administrative data on average decreases over time). Due to space limitations we cannot comment in detail on the outcomes for other model variables. The panel models (Table 6, “Appendix”) provide additional evidence for the U-shaped relationship between equivalised household income and both the absolute and the logarithmic differences between survey and register data. Furthermore, we now see evidence for a learning effect, which was less pronounced in the cross-sectional analysis. There are fewer significant effects for other model variables, which may be partly due to the short observation period of only 4 years and the low with-in variation (resulting in higher standard errors for the estimates).

4.2 How Changes in the Distribution of Equivalised Income and Its Components Affect the Poverty Rate

Table 3 shows that the poverty rate is persistently underestimated in all four years if based on survey data. Except for 2011, there is also an increase in the poverty gap (distance of the median income of the poor to the poverty threshold as percentage of the threshold) of 3–4 percentage points. Furthermore (not shown in Table 2), longitudinal poverty (4 times poor out of 4 times 2008–2011) also increases from 4.8% to 7.1%. As the poverty rate is calculated based on total household income, changes in its distribution translate directly into changes in the poverty rate. The Gini coefficient and the relation of the income at the 90th percentile to the 10th percentile indicate that total household income inequality increases if registers are used. Both the median and mean incomes are slightly higher. Except for 2008, the total income found in registers also varies more, as indicated by the standard deviation. Furthermore, Fig. 2 shows that there is more probability mass at the lower end of the distribution based on register data. Taken together, this evidence indicates that the rise in the aggregate poverty rate is in part explained by a higher number of households with a very low income than in the survey data.

Table 3 Poverty indicators and the distribution of equivalised household income for different data sources
Fig. 2
figure 2

Weighted data. Histogram for the distribution of equivalised household income (2010). Red line = poverty threshold (€). Top 1% excluded for better readability. Persons are units of observation

By definition, household income, the poverty threshold and the sample weights determine the poverty rate in arithmetical terms. The left panel of Fig. 3 illustrates the effect of changing the data source from surveys to register for either only the poverty threshold (red bar) or both household income and the threshold (green bar). This is compared to a baseline where both income and the calculation of the threshold is based on survey data (blue bar). The red reference line marks the “old” poverty rate calculated with both sample weights and income data from surveys. If nothing else except the source for the sample weights is changed (compare c to b), the poverty rate decreases by less than one percentage point. A similar conclusion but with a different direction of change can be drawn for the poverty threshold (compare blue to red bar within a, b, c). In contrast, irrespective of sampling weights, altering the data source for household income leads to marked upsurges in the poverty rate of 2.5–3 percentage points (compare blue to green within a, b, c).

Fig. 3
figure 3

Poverty rates (1 = 100%) for 2010 based on income components and/or weights from different data sources. The red reference line represents the poverty rate if both income data and the poverty threshold are derived from surveys (12.2%). Persons are units of observations

The right panel (d) of Fig. 3 simulates the effects of a change from survey data to register only for single income components. Such a change leads to a different household income and subsequently alters the overall distribution of all household incomes including the median and poverty threshold in a given year. Each bar thus represents the poverty share in the population based on a corresponding new poverty threshold. Changing the source for income from employment clearly has the greatest effect on the poverty rate. Given that 58% receive this type of income (Table 1) this is a predictable outcome. Moreover, family/children-related allowances and old-age benefits also noticeably increase the aggregate poverty rate, whereas sickness benefits and disability benefits decrease the poverty rate if derived from registers.

As changes in the poverty rate are strongly driven by differences in employment income, the question arises whether the differences measured between data sources themselves are particularly sizeable with regard to this income component. Rows (a) and (b) of Table 4 contain descriptive statistics on the distribution of differences among particular income components after the change from survey to register data. Differences are grouped with respect to poverty status changes due to the data switch. Those who enter poverty clearly over-report their employment income with a higher magnitude (≈90%) than those who exit poverty (≈−25%), both in absolute and relative terms. Moreover, we find relative differences of similar magnitude for old-age benefits. Those who are newly classified as poor based on register data also have a markedly higher share of family benefits as part of total household income (row c). Furthermore, the median relative distance of the new equivalised income to the new poverty threshold (not shown in Table 4) is 17.5% for households switching from poor to non-poor and −18.8% for households switching from non-poor to poor, whereas the median relative distance of the old equivalised income to the new threshold amounts to −13.6% for the exits and 25.3% for the entries.

Table 4 Effects of using register data on income statistics for those whose poverty status changes thereof, 2010

4.3 Robustness Checks

Additional checks and analysis beyond our main models deal with (1) the robustness of the cross-sectional models’ results for the other three available years, (2) several specification tests concerning the functional form and (3) a comparison of register data against a single survey question on total household income available in the questionnaire. Detailed results of all these estimations beyond the summary in this section are provided in the online supplementary materials.

Overall, repeating the estimation of the main models (Table 2) for the remaining three years 2008, 2009, 2011 again reveals that both the odds and magnitude of under-reporting generally rise with increasing income, whereas an opposite correlation can be found for over-reporting. However, the latter statistical relation is less robust over time (sometimes insignificant) and weaker in magnitude when compared with under-reporting: there are small negative effects for income which, however, are only statistically significant in 2010 and 2011. Using logarithmic values for the magnitude of measurement error (dependent variable) and income from register data (independent variable) instead of levels does not substantially change the statistical significance and direction of parameter estimates for the explanatory variables. However, the OLS models with logarithmic values of under-reporting and income yield lower fit statistics for all years except in 2011 as compared to the specifications in levels.

A series of specification tests were applied to validate the robustness of the main results reported in Table 2. First, Box-Cox transformations (dependent variable) were used to check whether the OLS regression model for the dependent variable is better in logs than in levels (Cameron and Trivedi (2005), chapter 8.5.2, boxcox command in STATA). Second, J-tests and Cox-Pesaran tests for non-nested OLS models were applied to choose between income on the right-hand side of the equation to be specified in logs or in levels (nnest command in STATA, see Greene (2000): 302–305). This procedure did not yield completely unambiguous results. However, for positive differences (survey > register), it generally indicated that models with the left-hand side income variables measured in levels and measured the right-hand side income variables measured in logs may suit the data better than using models with levels for both. We are also aware of potential problems in log–log models due to heteroscedasticity (Silva and Tenreyro 2006). Consequently, we also estimated Poisson regression models, which are suggested as one solution for this problem (Martinez-Zarzoso 2013; Santos Silva and Tenreyro 2011; Silva and Tenreyro 2006). In sum, substantial results of our study are robust to these tests and do not crucially depend on the functional form of the regression model.

Finally, some studies have shown that measurement error can be of different magnitude whether total household income is calculated based on a single survey question or aggregated based on multiple income questions. Using data for the UK, Hansen and Kneale (2013), find that households with more diverse sources of income, such as the self-employed, part-time employed and those in receipt of means-tested benefits, were more likely to report higher incomes when using multiple income questions compared to using a single question. A study on behalf of Eurostat (Dia et al. 2013) concluded that measurement of yearly income based on several components is more accurate and complete than current monthly income.

The Austrian SILC also contains a single question on current monthly household income.Footnote 20 Thus, we checked whether the observed effects of our main models apply to a lesser or greater extent if register data are compared to this single question on current monthly household income. To construct the dependent variable, the difference between this variable (equivalised for household size) and 1/12 of the equivalised annual household income from register was calculated. For the construction of the dependent variable for the multinomial logit model, we used the same thresholds as four the model in Sect. 4. The median of the relative deviation [(survey minus register)/register*100] amounts to amounts to 25% as compared to 10% when measurement error is calculated as described in Sect. 3.1. Overall, the estimated effects of income on measurement error have the same direction and significance as the regression models in Sect. 4.1. A one percent increase in equivalised monthly household income derived from a single question lowers measurement error by roughly 0.7% in the case of over-reporting and increase measurement error by 1.6% in the case of under-reporting. Again, a mean-reverting relationship between income and measurement error is observed.

5 Conclusion and Discussion

The aim of this paper is to investigate the consequences of substituting survey data with register data in Austria for income measurement at the household level and how this affects the poverty rate based on a threshold relative to the median income.

In the multivariate analysis, differences between the two data sources for the same observations were regressed on income variables, socio-demographic variables and variables related to the interview context. One the one hand, both the likelihood of under-reporting and the magnitude (metric differences) of under-reporting significantly increase with rising income. This outcome is also relatively stable for all of the four years. On the other hand, the likelihood and magnitude of over-reporting significantly decreases with rising income. Panel regression results reflect these outcomes and complete the picture of mean-reverting errors (differences) when measuring disposable household income. Furthermore, a generally higher model fit is observed for under-reporting.

A different question is whether these income effects found are mainly due to cognitive error or social desirability. Controlling for variables that are correlated with income and the observed measurement error allows to adjust the observed effect of income on measurement error for those dimensions that are related to cognitive error (e.g. number of income components, job changes) or other interview effects (e.g. no. of years in the panel). As there remains a significant effect of income on over-reporting/under-reporting in our models, this could ceteris paribus be interpreted as evidence that it is primarily social desirability bias related to the level of income that underlies the observed pattern. However, effect heterogeneity between income groups is also possible. For instance, even after controlling for the number of different income sources in the household, cognitive errors may become increasingly important as income rises. In consequence, this could render the interpretation of the effect size and effect direction more ambiguous for higher-income groups.

Besides the relationship between income and the measured difference, we also find evidence for a “learning effect”: differences between data sources for both under-reporting and over-reporting decrease with the number of panel waves a household has participated in. Whether this effect occurs because respondents feel less uncomfortable reporting their income over time or due to better preparation and knowledge of one’s income data over time, however, does not have a straightforward answer based on the available data. Among the other variables related to the interview context only the number of proxy interviews (weakly) increases the odds of under- and over-reporting.

Finally, the analysis reveals a quite significant increase in the cross-sectional poverty rates for 2008–2011 and the longitudinal poverty rate if register data are used, whereas central measures of equivalised household income remain rather unchanged. The income distribution becomes more uneven when using register data. At the lower tail of the distribution the median income increases, whereas the opposite is true for the upper tail of the income distribution. Using register data also results in a higher number of households with a very low income as compared to survey data. Overall, the observed changes in the poverty rate are mainly driven by differences in employment income rather than sampling weights and other income components. Solely changing the source for the sample weights has only a very moderate effect.

Further research endeavors could test the implications of these outcomes for poverty dynamics. For instance, under-estimation of the poverty headcount in the cross-section based on questionnaire data may lead to a higher rate of households’ mobility into and out of poverty that is higher than the actual rate. As a consequence, poverty may turn out to be more persistent based on register data. Moreover, it could be investigated if statistical relationships between material deprivation indicators available in SILC and income poverty are altered significantly if register data are used. Further research could also clarify if social desirability is more relevant for some income types than for others (e.g. social benefits vs. market income).