1 Introduction

Over the last three decades, administrative data sets have been “transforming the analysis of economic policy” (Friedman 2010, p. 2). Card et al. (2010, p. 1) highlight that as compared to survey data “[a]dministrative data offer much larger sample sizes and have far fewer problems with attrition, non-response, and measurement error.” In one of the pioneering and most influential examinations of measurement error in survey data, Bound and Krueger (1991) (hereafter: BK) focus on the prevalence and properties of measurement error in longitudinal earnings information. Their analysis compares potentially mismeasured Current Population Survey (CPS) data to administrative Social Security payroll tax records, which BK assume to be free of measurement error. According to the findings by BK, in a cross section survey-based earnings data are relatively reliable. At the same time, reliability ratios are much lower for specifications that rely on survey-based earnings data in first differences. In addition, measurement errors are not “classical” in the sense of being identically and independently distributed and uncorrelated with administrative earnings. Instead, measurement errors are serially correlated over 2 years and negatively correlated with administrative earnings (or “mean-reverting” according to BK’s parlance).

In the first part of this paper, we replicate and extend the exercise by BK with the help of German data that link the administrative “ADIAB” social security records with the “PASS” (“Panel Study Labour Market and Social Security”/“Panel Arbeitsmarkt und soziale Sicherung”) household survey. In order to compare our findings for Germany directly with those already available for the United States, we make the same assumptions as BK. In particular, we assume that administrative ADIAB earnings represent “true earnings” and hereafter designate them as such. In fact, because the underlying administrative data are used to compute social security contributions, the ADIAB earnings information is considered highly reliable. While the data set used by BK only spans two time periods, we are able to exploit cross-sectional and longitudinal variation in earnings over the first four waves of the PASS survey. Our data are also more recent than those of BK. In addition, the provision of internationally comparable evidence on the extent of measurement error in German longitudinal earnings data makes it possible to put the conclusions by BK into an international perspective.

Qualitatively, we confirm many of the main results of BK. In particular, we confirm that in a cross section survey-based earnings data are relatively reliable but that their reliability tends to be much lower when the data are specified in first differences. Quantitatively, in the cross section our findings for Germany are very similar to those of BK for the United States while we find even stronger evidence that first-differencing exacerbates measurement error problems. As yardstick for the reliability of survey-based earnings data, BK consider the “reliability ratio” defined as the ratio of the covariance between mismeasured earnings and true earnings to the variance of mismeasured earnings. A value for this ratio of 1 would indicate that the covariance between mismeasured earnings and true earnings equals the variance of mismeasured earnings. This would imply perfect reliability. Conversely, a value of 0 would imply a complete lack of reliability. Allowing for mean-reverting measurement error, BK estimate that in a cross section, the reliability ratio is 0.97 to 1.02 for men and 0.93 to 0.96 for women. They also estimate that this ratio falls to between 0.78 and 0.86 when the data are specified in first differences. Also allowing for mean-reverting measurement error, we find that in a cross section the reliability ratio is 0.93 to 0.95 for men and 0.90 to 1.00 for women. In first differences, it falls to 0.11 to 0.47 for men and 0.25 to 0.34 for women.

Our other noteworthy findings include (a) that measurement error in our survey data is not classical but mean-reverting, with a high degree of autocorrelation and a strong negative correlation with true earnings, and (b) that the mismeasurement of earnings leads to little bias when survey-based earnings are on the left-hand side of a typical Mincer-type earnings regression. Both these findings are again in line with conclusions reached by BK. Going beyond the exercise by BK, we analyze both first- and higher-order autocorrelations of measurement error. Our results strongly suggest that measurement errors in our survey-based earnings data are positively correlated over an extended period of time. This leads us to conjecture that at least a sizable fraction of the autocorrelation in the measurement error is not actually due to a simple autoregressive process but either to a person fixed effect or to a more complex time series process.

In the second part of this paper, we use our data to estimate the dynamic model of measurement error by Pischke (1995). Pischke (1995) explains the measurement error in earnings data from the Panel Study of Income Dynamics Validation Study (PSIDVS)—which combines earnings information from the Panel Study of Income Dynamics (PSID) questionnaire with information from payroll records at a specific, anonymous firm which he assumes to be free of measurement error—by an individual fixed effect, the misreporting of transitory earnings shocks and a white-noise component. This simple model fits our data surprisingly well. Again under the assumption that administrative earnings information represents true earnings, we find mean-reverting measurement error and, like Pischke (1995), that individuals underreport the transitory component of earnings. Pischke (1995, p. 309) explains that the underreporting of transitory earnings changes is entirely plausible given that “[o]bviously, it is the changes in permanent earnings that are related to the more important events in people’s lives.” Also in line with Pischke (1995), we find that underreporting of transitory earnings leads to downwardly biased estimates of the variance of earnings growth while the white-noise component induces upward bias. In our case, the upward bias more than offsets the downward bias, implying that earnings growth observed in survey data appears to be more spread out than it actually is. This is in contrast to Pischke (1995), who finds upward and downward biases of similar magnitude. As a further contribution, we document an upward bias in estimates of the variance of transitory earnings and a downward bias in estimates of the variance of permanent earnings due to measurement error in the survey data. Thus, earnings inequality is actually more persistent than suggested by survey data.

Our results are robust to various sensitivity checks and arguably exhibit a high degree of external validity as demonstrated through the use of a second linked survey-administrative data set that combines the administrative social security records with the “WeLL” (“Continuing Education and Lifelong Learning”/“Weiterbildung und Lebenslanges Lernen”) household survey. While the PASS survey oversamples poorer households, the WeLL survey includes a disproportionally large number of high earners. Reassuringly, our results are qualitatively robust across the two surveys.

We also make three methodological contributions: (a) we extend the simple model of measurement error suggested by BK as well as the dynamic measurement error model by Pischke (1995) from the two-period to the four-period case, (b) we introduce the methodology for taking account of top-coded administrative earnings information developed by Card et al. (2013) to the measurement error literature and (c) we develop a procedure for merging the correct administrative records to the different waves of the PASS and WeLL surveys.Footnote 1 These methodological contributions will help researchers to further investigate the prevalence and properties of measurement error in longitudinal earnings data and tap the full potential of German linked survey-administrative data.

Our study is relevant for three distinct strands of literature. First, it is relevant for the literature that links administrative and survey data to investigate the extent of measurement error in longitudinal earnings data. This literature is partly surveyed in Bound et al. (2001) and Meyer and Mittag (2021) and almost exclusively focuses on the United States. Together with BK and Pischke (1995), pioneering studies include those by Duncan and Hill (1985) and Bound et al. (1994) which both rely on the PSIDVS. In accordance with the findings by BK, Duncan and Hill (1985) show that in levels the reliability ratio of earnings in the PSID data exceeds 80 percent while Bound et al. (1994) document mean reversion and a positive autocorrelation in measurement error. Other noteworthy studies that link American administrative and survey data sets to investigate the extent of measurement error in earnings data include those by Mellow and Sider (1983), Rodgers et al. (1993), Bollinger (1998) and Stinson (2002).Footnote 2

In addition, our results on measurement error in longitudinal earnings data complement the emerging literature that compares the quality of German administrative and survey-based earnings information in the cross section. Notable contributions to this literature include Oberski et al. (2017), Antoni et al. (2019), Valet et al. (2019), Gauly et al. (2020) and Stüber et al. (2023). One recurrent conclusion of this literature is that while there is a certain tendency for average earnings to be larger in administrative records than in surveys, in the cross section average earnings differ only relatively little between the two types of data sources.

Third and finally, our study is relevant for the literature that investigates whether findings regarding labor market characteristics and impacts of economic policy depend on using either administrative or survey-based earnings data (either in levels or with regard to dynamics). For instance, Gideon et al. (2017), Abowd and McKinney (2017) and Kopczuk et al. (2010) demonstrate that using administrative instead of survey-based earnings data can alter well-established and policy-relevant findings regarding topics as diverse as the extent of race-based wage discrimination, the role of firms in explaining earnings inequality and the degree of earnings mobility over the life cycle. Two recent studies on the effects of the introduction of a nationwide minimum wage in Germany neatly illustrate the advantages of administrative data, but also that these advantages come at a cost. Dustmann et al. (2022) rely on administrative data and convincingly argue that as compared to exploiting survey data this approach improves the measurement of earnings and the precision of estimates. In contrast, Caliendo et al. (2023) follow the more traditional route and employ survey data. While the authors acknowledge that there may be measurement error in survey-based earnings information, they also highlight that some other relevant variables such as working hours are only partially and imprecisely captured in German administrative data sources. Indeed, Dustmann et al. (2022) need to rely on certain assumptions and an imputation procedure to expand their analysis from daily earnings to hourly wages.

The remainder of this paper is structured as follows: the PASS and ADIAB data, our linkage and sampling procedure and summary statistics are presented in the next section. This is followed by a description of our static measurement error model in Sect. 3. In Sect. 4, we characterize the measurement error in the PASS earnings data and the induced bias in survey-based earnings regressions. Section 5 contains the findings of our main empirical analysis using the static model of measurement error, including sensitivity checks. Section 6 outlines our application of the dynamic measurement error model of Pischke (1995), presents relevant estimates and discusses biases in estimates of earnings processes when using error-ridden data. Section 7 concludes.

2 Data, sampling and summary statistics

We base our investigation on the PASS household survey linked with ADIAB administrative social security records prepared and maintained by the Institute for Employment Research (IAB) in Nuremberg, Germany.Footnote 3 PASS is a household panel survey with a focus on poverty and receipt of non-contributory, means-tested unemployment benefits (“Arbeitslosengeld II”). The survey covers approximately 10,000 households and is carried out annually. Households receiving unemployment benefits are oversampled to allow for a detailed analysis of the dynamics of benefits receipt. As unemployment benefit recipients are comparatively more likely to transition into and out of lower-paying jobs, average earnings in PASS are lower than in the German labor market as a whole (cf. Fig. 1 which captures both the actual earnings distribution of PASS for our estimation sample as defined below and the counterfactual PASS earnings distribution approximating the German labor market using probability weights, i.e., where the weight of a respondent is equivalent to the reciprocal value of their inclusion probability).

Fig. 1
figure 1

Distribution of earnings in PASS estimation sample, weighted and unweighted

PASS comprises information on, for example, individuals’ socio-demographic characteristics and subjective well-being. In our study, we incorporate the first four waves of the survey, collected in 2006/2007, 2007/2008, 2008/2009 and 2010, respectively. The PASS survey takes great care to collect data that is reliable, robust and comparable across households and across time. Given the declining prevalence of landline phones in Germany there is a mix of CATI and CAPI interviews. Based on respondents’ preferences, interviews are conducted either in German or in one of three other languages (English, Russian and Turkish). Moreover, there are detailed interviewer training, outreach, engagement, follow-up and quality assurance processes. While there is a survey module regarding the overall household situation that is directed at one knowledgeable household member, the information on individual household members’ characteristics used here is directly solicited from these individual members.Footnote 4

The version of the ADIAB data used here contains the universe of all individuals who were employed subject to social security contributions, received unemployment benefits or were registered as job seekers in the Federal Republic of Germany at least once between 1975 and 2010. All individuals with at least one spell of “marginal” employment, i.e., employment not covered by social security, in 1999 or later are also included. For most individuals in the data set, information on the majority of their labor market biography is available. Only spells of employment not covered by social security—like those as civil servants or family workers—and spells of self-employment are not covered. All in all, the ADIAB data cover more than 80 percent of Germany’s total workforce. They encompass detailed longitudinal information on employment status, earnings, socio-demographic and firm characteristics to the exact day. Because Germany’s social security agencies use the underlying administrative data to compute social security contributions and unemployment benefits, the earnings information in the ADIAB data is considered highly reliable.

In the ADIAB data, precise earnings information is not recorded beyond Germany’s “contribution assessment ceiling”, the maximum level of earnings that are subject to contributions to the country’s various social security programs. To address the resulting top-coding of earnings information for 1.6 percent of observations, we rely on the imputation procedure developed by Card et al. (2013). This procedure uses separate Tobit regressions for different calendar years and for East and West Germany with a series of imputation variables that can be calculated when an establishment’s entire workforce and a time series of earnings are observed.Footnote 5 Importantly, as reported in Sect. 5.3 all our results are robust to instead excluding observations with top-coded ADIAB earnings or simply keeping these observations with the original earnings information.

For the purpose of our study, we disregard spells of unemployment and inactivity and focus on individuals’ spells in employment only (which will include time spent on annual or sick leave as long as the employment spell is not interrupted). Labor market biographies from the PASS and ADIAB data are linked using individuals’ social security numbers. Linkage is only possible for those survey participants who explicitly permitted a match of their survey data to administrative records (depending on the PASS wave, this is the case for 80 to 87 percent of participants). A number of steps are needed to derive comparable information on earnings. PASS refers to gross earnings in the last month before the survey. In the first wave the respondents were to report the earnings from their main job, in the other waves they were to report the total amount of earnings. The administrative data, in turn, encompass all jobs of a person and include the sum of gross earnings over a reporting period of up to a year from a given job. There are also differences in the PASS and ADIAB data regarding whether one-time payments are captured in the definition of earnings and regarding the coverage of non-standard forms of employment. These differences are the subject of robustness checks in Sect. 5.3.Footnote 6

The process of linking PASS with ADIAB data and defining the estimation sample through data cleansing and taking account of the top-coding of the ADIAB earnings data is visualized in simplified form in Fig. 2. First, we identify the ADIAB spells covering the month the respective PASS survey waves refer to. Next, for this month we calculate average daily gross earnings in the ADIAB data. In case of multiple earnings records within the month, for wave one of PASS we select the job with the highest average daily earnings as the main job while for waves two to four we combine earnings from all spells. Finally, we calculate monthly gross earnings by multiplying average daily gross earnings with \((7/12*31+4/12*30+1/12*28.75)\). In practical terms, we are able to successfully link 11,575 spells with information on earnings from both ADIAB and PASS. We label these 11,575 spells our “raw” data.

Fig. 2
figure 2

Linking PASS and ADIAB data and defining the estimation sample. The figure visualizes the process of linking the PASS survey data and ADIAB administrative data and defining the estimation sample. Relevant samples are the raw data without the data cleansing steps implemented for selecting the estimation sample (“raw data”) and the full estimation sample with top-coded earnings information replaced by imputed earnings

As the literature emphasizes the potentially substantial consequences of mismatch between different data sources, we go to great lengths to assure that the earnings information gathered from the PASS and ADIAB data are in fact comparable. Starting with our raw data, we implement four discrete data cleansing steps. First, we exclude data points where earnings information from PASS is unlikely to be directly comparable to earnings as recorded in the ADIAB data. This is evident for 179 cases with a measurement error that in absolute terms exceeds 150 percent. Second, we exclude 851 cases where respondents are only willing to give their earnings in terms of a broad range instead of a precise number. Third, we drop 158 cases where respondents indicate that they are self-employed at least once in addition to or instead of being in dependent employment. Fourth, following BK we exclude some occupations and sectors with untypical pay structures. For this reason, all observations for workers in agriculture and engaged as coachmen, managers, artists, performers, clerks, hairdressers, innkeepers, waiters, cleaners, housekeepers, cab drivers, barkeepers or homeworkers are dropped. The fourth step affects 1404 observations in our raw PASS data. Altogether, data cleansing cuts the estimation sample from 11,575 to 9254 observations.

Table 1 contains separate summary statistics for men and women for (a) the raw linked survey-administrative data, (b) the estimation sample and (c) the estimation sample reduced to a strongly balanced panel (which is required for some of our subsequent analyses). The table focuses on the following individual characteristics: education (whether an individual holds a school leaving certificate from an academic high school, also called “Abitur”), citizenship (German passport or not), marital status, geographic location (East or West Germany), monthly gross earnings according to both the survey and the administrative information, age and weeks in employment in the respective year.

Table 1 Summary statistics by gender and sample

Table 1 makes it possible to compare differences in terms of observable individual characteristics between the raw data and the estimation sample. Almost all of the individuals considered here are German citizens. On average, sampled individuals are employed for 47 to 48 weeks per year (with a minimum of less than 1 week of employment and a maximum of 52 weeks, denoting continuous employment throughout the respective year). This means they are quite closely attached to the labor market. A comparison across genders shows that women tend to be comparatively less likely to be married and to receive lower earnings. Table 1 also shows that apart from minor exceptions like the earnings information for women the different data cleansing steps leave the mean values of almost all observable characteristics practically unchanged. Thus, data cleansing does not appear to induce a significant amount of selectivity. In contrast, reducing the sample to a strongly balanced panel has more marked implications. Not surprisingly, individuals in the strongly balanced panel have higher average earnings both according to the survey and administrative data. On average, they also have a stronger labor market attachment and are older and more likely to be married, among other notable differences. We return to this issue in Sect. 5 below and show that despite the selectivity of the strongly balanced panel our results are robust to either using this sample or the entire estimation sample.

Table 1 also provides a first comparison between earnings as recorded in the administrative data and reported in the PASS survey. The mean average monthly gross earnings differ little between administrative records and survey responses, though there is a slight tendency for average administrative earnings to be larger than survey earnings. In the following sections, we analyze the distribution of measurement error in survey-based earnings information in greater detail.

3 A static model of measurement error

For our simple static model of measurement error, we follow BK and assume that log gross monthly earnings recorded in the PASS survey x in wave \(t=\{1,\ldots ,4\}\) are composed of a true earnings component in logs y and an error term m:

$$x_{t} = y_{t} + m_{t} .$$
(1)

Again following BK, we assume that the log gross monthly earnings recorded in the administrative ADIAB data equal the true earnings y (after adjustment for the top-coding). In addition, we assume that measurement error m is either classical or “general” in nature. As already mentioned, classical measurement error assumes that m is identically and independently distributed, with \(E[m_{t}]=\text {cov}[y_{t},m_{s}]=0 \forall {t,s}\). General measurement error also assumes \(E[m_{t}]=0\) but allows for \(\text {cov}[y_{t},m_{s}] \ne 0\). Below, we compute variance-covariance matrices for y and m. Similar to BK, we find a negative correlation between \(y_{t}\) and \(m_{s}\) for \(t=s\) (and in weaker form also for \(t \ne s\)). The negative correlation between true earnings and measurement error leads us to reject the assumption of classical measurement error. Given that higher earnings tend to be underreported and lower earnings overreported, we follow BK and refer to the general measurement error as mean-reverting measurement error.

As already highlighted in Sect. 1, a suitable statistic for the reliability of the earnings data in the PASS survey is given by the ratio of the covariance between mismeasured earnings and true earnings to the variance of mismeasured earnings. In the case of general measurement error, the resulting reliability ratio \(\lambda ^{L}\) for log earnings in levels is given by

$$\begin{aligned} \lambda ^{L}_{t} = \text {cov}[y_{t}+m_{t}, y_{t}]/\text {var}[y_{t}+m_{t}] = (\sigma ^{2}_{y_{t}}+\sigma _{y_{t}m_{t}})/(\sigma ^{2}_{y_{t}}+\sigma ^{2}_{m_{t}}+2\sigma _{y_{t}m_{t}}) . \end{aligned}$$
(2)

A value for \(\lambda ^{L}\) of 1 would indicate that the covariance between mismeasured earnings and true earnings equals the variance of mismeasured earnings. This would imply perfect reliability. Conversely, a value of 0 would imply a complete lack of reliability.

Correspondingly, the reliability ratio for log earnings in first differences \(\lambda ^{D}\) can be computed as follows:

$$\begin{aligned} \begin{aligned} \lambda ^{D}_{t}&= \text {cov}[\varDelta (y_{t}+m_{t}), \varDelta y_{t}]/\text {var}[\varDelta (y_{t}+m_{t})] \\&= (\sigma ^{2}_{y_{t}}+\sigma ^{2}_{y_{t-1}}-2\sigma _{y_{t},{y_{t-1}}}+\sigma _{y_{t},{m_{t}}}-\sigma _{y_{t-1},{m_{t}}}-\sigma _{y_{t},{m_{t-1}}}+\sigma _{y_{t-1},{m_{t-1}}})/ \\&[\sigma ^{2}_{y_{t}}+\sigma ^{2}_{y_{t-1}}+\sigma ^{2}_{m_{t}}+\sigma ^{2}_{m_{t-1}}+2(\sigma _{y_{t},{m_{t}}}+\sigma _{y_{t-1},{m_{t-1}}}-\sigma _{y_{t},{y_{t-1}}}-\sigma _{m_{t},{m_{t-1}}}-\sigma _{y_{t},{m_{t-1}}}-\sigma _{y_{t-1},{m_{t}}})]. \end{aligned} \end{aligned}$$
(3)

If we allow for mean-reverting measurement error and replace top-coded earnings with imputed earnings based on the procedure described above, we can estimate the reliability ratios in levels and first differences by regressing the ADIAB earnings variable on the earnings information from PASS:

$$y_{t}=c^{L}_{t}+\lambda ^{L}_{t}x_{t}+u^{L}_{t},$$
(4)

or

$$\varDelta y_{t}=c^{D}_{t}+\lambda ^{D}_{t}\varDelta x_{t}+u^{D}_{t},$$
(5)

respectively.

Alternatively, under the assumption of both classical and general measurement error reliability ratios can be estimated by first estimating the variance-covariance matrix of y and m and then inserting the relevant estimates of variances and covariances into Eqs. (2) and  (3). The variance-covariance matrix of y and m is also interesting in itself.

4 Measurement error and induced bias in survey-based earnings regressions

Figure 3 presents two plots of the probability density of measurement error m, calculated by subtracting log earnings as captured in the PASS survey x from log earnings recorded in the administrative ADIAB data y. The figure pools all four waves of the survey. Panel (a) shows the probability density of m for men, Panel (b) for women. Both panels also depict the density of a hypothetical normal distribution, vertical lines for zero measurement error and a number of summary statistics. Among the summary statistics, we care most about the mean measurement error as its sign will indicate whether on average earnings are under- or overreported while its absolute value is a key indicator of the typical extent of measurement error.

Fig. 3
figure 3

Distribution of measurement error by gender. The figure shows the distribution of measurement error for men and women in the PASS survey; top-coded earnings have been replaced by imputed earnings; measurement error \(m_{t}\) is defined as the log difference in earnings between survey-based earnings and administrative earnings; also depicted is the density of a normal distribution as are vertical lines for zero measurement error

Both panels of Fig. 3 show plots that are unimodal and largely symmetric, with maybe a slight positive skew. Although both distributions are more or less bell-shaped, the tails appear thinner than one would expect with a normal distribution. The histograms indicate that in the PASS survey measurement error has a slightly negative mean. This is confirmed by the summary statistics given at the bottom of the panels. Mean measurement error is \(-\)5.4 percent for men and \(-\)7.3 percent for women, indicating that both men and women have a slight tendency to underreport their earnings. This result is somewhat in contrast to findings by BK who report that women but not men have such a tendency.

At 5.3 percent for women and 5.7 percent for men, the magnitude of the error variance is substantial for both genders and does not notably differ across genders. The error variances documented here are of a similar order of magnitude to those found by BK. In the CPS data used by BK, the error variance is about 11.4 percent for men and 5.1 percent for women.

Somewhat in contrast to what BK find for the United States, in our linked survey-administrative data set there are very few observations with measurement error at or very close to zero. In PASS only 13.3 percent of observations for men and 13.7 percent of observations for women are associated with an absolute measurement error in log earnings that is smaller or equal to 2.5 percent. 44.4 percent of observations show absolute measurement error not exceeding ten percent, and deviations of more than 30 but less than 150 percent occur in about 13.7 percent of cases. In contrast, according to BK more than 40 percent of CPS respondents report earnings that are within 2.5 percent of their true earnings.

The comparatively small fraction of observations in our sample where measurement error is non-existent or negligible is likely due to differences in the definition of earnings in the PASS and the ADIAB data. Apparently, this causes some residual uncertainty or fuzziness even after the implementation of our thorough procedure to derive comparable information on earnings from the survey-based and administrative data sources.

In Table 2, we turn our attention to the potential biases measurement error might inflict on the estimation of economic relationships. As mentioned by BK, an error-ridden measure of earnings as a left-hand-side variable will produce biased results if and only if the measurement error is correlated with a right-hand-side regressor. Based on this reasoning, Table 2 presents results of regressing measurement error m on a variety of socio-demographic variables typically found in Mincer-type earnings regressions. For comparison purposes, the table also depicts results from a second set of regressions where log earnings according to the PASS survey data x are regressed on the same socio-demographic variables. These more standard though simplified Mincer-type earnings regressions serve as a useful benchmark—under the assumption that researchers without access to data on true earnings would want to use x as their earnings measure.

Table 2 Individual characteristics associated with measurement error and survey-based earnings by gender

Columns (1) and (2) of Table 2 show that there is little correlation between some of the socio-demographic variables and the measurement error. For both genders, this is for instance the case for the variables capturing whether an individual holds a school leaving certificate from an academic high school or has German citizenship. However, some other socio-demographic variables exhibit a statistically significant correlation with measurement error either for men or for women or for both genders. For instance, the regression results indicate that earnings of men from East Germany tend to be overreported in the PASS survey while earnings for both men and women with stronger labor market attachment as measured by weeks employed are underreported.

BK point out that the magnitude of the coefficients reported in regressions akin to those in columns (1) and (2) of Table 2 equal the magnitude of the bias expected for the relevant socio-demographic variables when the log of earnings is the dependent variable of a regression. Although some of the coefficients in each of the two regressions of measurement error are statistically significant, the estimated bias for each variable is generally small compared to its typical effect in an earnings equation. This is evident from a comparison of the coefficients in columns (1) and (2) with those in columns (3) and (4). For instance, as mentioned earnings of East German men tend to be overreported in the PASS survey. But the relevant coefficient is less than ten percent of that of the “East German dummy” in a simplified Mincer-type earnings regressions for men using PASS survey data.

At 1.3 percent for men and 0.8 percent for women, the coefficients of determination \(R^{2}\) implied by the regressions of measurement error on socio-demographic variables are substantially smaller than the coefficients from the comparable regressions of survey-based earnings (which amount to 32.6 percent and 8.4 percent for men and women, respectively). Together with the estimated bias for each variable being generally small compared to its typical effect in an earnings equation, and confirming what BK find for the United States, this suggests that measurement error leads to little bias when earnings information from the PASS data is on the left-hand side of a regression.

5 Reliability of the data

5.1 Variance-covariance matrices of true earnings and measurement error

In Table 3, we estimate variance-covariance matrices and correlation coefficients of true earnings y and measurement error m separately for men and women. The table gives an indication of the magnitude and covariance structure of measurement error over the first four waves of the PASS survey and allows us to assess the reliability of the survey-based earnings data. The imputation procedure of top-coded administrative earnings developed by Card et al. (2013) makes it possible to derive the variance-covariance matrix of y and m directly from the data. Using the formula \(\text {cor}[X,Y]=\text {cov}[X,Y]/\left( \text {sd}[X]\text {sd}[Y]\right)\) it is then straightforward to calculate the correlation coefficients from the variances and covariances. Covariances are depicted at and below the diagonal, correlations above the diagonal and the two samples are reduced to strongly balanced panels.

Table 3 Variance-covariance matrix by gender

Four noteworthy results emerge from Table 3. First, true earnings exhibit a high degree of first- and higher-order autocorrelation. First-order autocorrelation coefficients of true earnings \(\text {cor}[y_{1},y_{2}]\), \(\text {cor}[y_{2},y_{3}]\) and \(\text {cor}[y_{3},y_{4}]\) are 0.92, 0.96 and 0.96 for men and 0.94, 0.92 and 0.96 for women, respectively. Higher-order autocorrelation coefficients are pronounced as well. This points at a high degree of earnings persistence in Germany.

Second, Table 3 shows large negative correlations between measurement error and true earnings for both men and women. The correlation coefficients \(\text {cor}[y_{1},m_{1}]\), \(\text {cor}[y_{2},m_{2}]\), \(\text {cor}[y_{3},m_{3}]\) and \(\text {cor}[y_{4},m_{4}]\) for men amount to \(-\)0.23, \(-\)0.24, \(-\)0.29 and \(-\)0.31 in waves 1, 2, 3 and 4, respectively. For women, they are \(-\)0.19, \(-\)0.21, \(-\)0.20 and \(-\)0.31. We also document sizable negative correlations between measurement error and true earnings in different years that hardly abate as the distance between years increases. The strong negative correlations between measurement error and true earnings imply that measurement error is not classical as this assumption would require all these correlations to be zero. Instead, higher earnings tend to be underreported and lower earnings overreported, making measurement errors in PASS earnings data mean-reverting. This was also documented by BK for men—but not necessarily for women.Footnote 7

Third, our results strongly suggest that measurement errors in the earnings information available in PASS are positively correlated from one year to the next and even over an extended period of time. BK document a first-order autocorrelation coefficient for m of 0.40 for men and 0.10 for women. We find similar and even somewhat more pronounced first-order autocorrelation coefficients \(\text {cor}[m_{1},m_{2}]\), \(\text {cor}[m_{2},m_{3}]\) and \(\text {cor}[m_{3},m_{4}]\) of 0.44, 0.53 and 0.42 for men and 0.31, 0.30 and 0.36 for women, respectively. This implies that a positive measurement error in 1 year is relatively likely to be followed by another positive measurement error in the next (and similarly for negative measurement errors).

Fourth, going beyond the exercise by BK, our extended panel makes it possible to investigate higher order autocorrelations of m. At 0.31 and 0.35 for men and 0.19 and 0.27 for women, second-order autocorrelation coefficients \(\text {cor}[m_{1},m_{3}]\) and \(\text {cor}[m_{2},m_{4}]\) are only slightly lower than first-order autocorrelation coefficients. At 0.16 for men and 0.31 for women, even third-order autocorrelation coefficients \(\text {cor}[m_{1},m_{4}]\) are pronounced. Apparently, a sizable fraction of the autocorrelation in the measurement error is not actually due to an autoregressive process but rather caused by either individual-specific time-invariant characteristics (i.e., person fixed effects) or other, more complex time series processes.

5.2 Reliability of PASS earnings data

Table 4 summarizes reliability ratios for PASS earnings data. These have been derived by first estimating the variance-covariance matrix of y and m and then inserting the relevant estimates into Eqs. (2) and  (3). Reliability ratios are presented separately for men and women as well as for the pooled sample. The table also differentiates between the first four waves of PASS, between assuming classical measurement error (Panel (a)) and allowing for mean-reverting measurement error (Panel (b)), and between the cross section and first differences.

Table 4 Reliability ratios inferred from variance-covariance matrix by gender and type of measurement error

In a cross section our longitudinal earnings data are relatively reliable: Under the assumption of classical measurement error, the cross-sectional reliability of PASS earnings data is around 0.85 for both genders. This is not too far removed from the value of 1 which would imply perfect reliability. If we allow for mean-reverting measurement error—our preferred specification—this further increases the estimates of cross-sectional reliability. As noted by BK, this is exactly as one would expect in the case of \(\sigma _{m}^{2}<\sigma _{y}^{2}\) which is what we find in our data (as discussed above). In the mean-reverting specification, cross-sectional reliability ratios for male PASS respondents range from 0.93 to 0.95 and for female respondents from 0.90 to 1.00. For the combined sample, they range from 0.93 to 0.98. These estimates are similar to those by BK and other researchers using data sets for the United States. In particular, BK report that in a cross section, the reliability ratio is 0.82 to 1.02 for men and 0.92 to 0.96 for women.

Similar to what is reported by BK for the CPS, measurement error is significantly exacerbated when considering earnings changes. In fact, the reliability ratios for PASS earnings data in first differences are even lower than the comparable statistics for the CPS documented by BK. Their estimates generally range from 0.65 to 0.85. In contrast, when we allow for mean-reverting measurement error we document point estimates for reliability ratios in first differences that range from 0.18 to 0.38. Under the assumption of classical measurement error, estimates range from 0.29 to 0.41. In contrast to what is found by BK, in our case allowing for mean-reverting measurement error does not increase but in fact decreases the reliability ratio in first differences.

As mentioned above, an alternative procedure to estimate the reliability ratios in levels and first differences is to regress the ADIAB earnings variable on PASS earnings information. Results from this procedure are summarized in Table 5. The table again estimates reliability ratios separately by gender and survey wave. All estimates depicted in the table allow for mean-reverting measurement error. The main value-add of the alternative procedure is that it allows us to exploit the full extent of our estimation sample without the need to construct a balanced panel. In addition, the alternative procedure also makes analytical standard errors readily available.

Table 5 Reliability ratios inferred from OLS regressions by gender

Results from the regressions summarized in Table 5 are qualitatively and for the most part also quantitatively very similar to those from the more indirect procedure depicted in Table 4. In fact, reliability ratios both in levels and in first differences estimated from the full samples tend to be somewhat larger than the estimates from the strongly balanced panels. In addition, Table 5 makes it clear that all reliability ratios are statistically significantly different from zero. This means that even in the case of earnings data in first differences, an analysis relying purely on PASS earnings data would be able to capture some meaningful information. Nevertheless, while Table 4 and Table 5 leave us confident to encourage analyses using PASS earnings data in levels, more caution might be in order when analyzing earnings data in first differences.

5.3 Robustness and sensitivity checks

Our main results on the reliability of the PASS earnings data are robust to a wide range of alternative specifications. This is demonstrated through a series of sensitivity checks summarized in Table 6 that estimate reliability ratios \(\lambda\) derived from variance-covariance matrices of true earnings y and measurement error m. For conciseness, all estimates allow for mean-reverting measurement error and are pooled for men and women (but depicted separately for earnings information in levels and first differences and for each of the four waves of PASS).

Table 6 Reliability ratios inferred from variance-covariance matrix by survey and sample restriction

As described above, our preferred “baseline” estimates of \(\lambda\) from Table 4 are derived using an estimation sample where top-coded earnings are imputed using Tobit regressions and various data cleansing steps are implemented to minimize mismatch between different data sources. Columns (1) to (3) of Table 6 deal with the issue of top-coding in the administrative data in alternative ways while in columns (4) to (7) we check if our results are robust to alternative delineations of the estimation sample. Finally, column (8) deals with differences in the definition of earnings in the PASS and ADIAB data.

In column (1) of Table 6, all observations with top-coded earnings are excluded. This exclusion leads to a loss of external validity but not necessarily internal validity.Footnote 8 Estimates of reliability ratios are somewhat lower than the baseline reliability ratios with regard to earnings in levels and somewhat higher when earnings are specified in first differences. Overall, differences between baseline estimates and those of column (1) are not particularly pronounced. In column (2), top-coded earnings information is not replaced by imputed values but kept in its original form. In levels, the resulting—biased—estimates of \(\lambda\) tend to be lower than the baseline estimates. In first differences, time-invariant effects are eliminated and differences between baseline estimates and those of column (2) are small. In column (3), top-coded earnings information is replaced through an alternative imputation procedure proposed by BK. Instead of directly imputing top-coded earnings, BK impute the variance-covariance matrices of earnings and measurement error using a maximum likelihood procedure. Column (3) yields reliability ratios that are very similar to those of our baseline estimates.Footnote 9

In column (4) of Table 6, we do not exclude outliers (defined as observations with absolute measurement error in excess of 150 percent) from the observation sample, in column (5) we do not exclude observations where survey earnings were reported as a range instead of a precise number and in column (6) we do not exclude observations for individuals working in sectors or occupations with untypical pay structures, such as agriculture. In column (7), we restrict our sample to standard employment spells only by excluding spells of student interns, apprentices or those in marginal employment (i.e., with earnings below the social security contribution threshold). The underlying motivation is that the first wave of PASS includes earnings information for apprentices and student interns but excludes those in marginal employment while the ensuing waves include the marginally employed but exclude apprentices and students—whereas in the ADIAB data all these groups in non-standard employment are consistently included. Finally, in column (8) we account for the fact that one-time payments are always included in the ADIAB earnings data but not in the standard PASS earnings variable by adding the value of one-time payments (which is collected separately) to the standard PASS earnings variable.Footnote 10 Although the point estimates for reliability ratios in levels or first differences differ slightly across the different specifications of columns (4) to (8), the general picture and key findings stay very robust.

In addition to the sensitivity checks of Table 6, the appendix replicates all estimation results of this paper using a second linked survey-administrative data set that combines the WeLL household panel with the ADIAB administrative records. The main objective of this replication exercise is to investigate the external validity of our results across the earnings distribution, given that PASS primarily encompasses low-income households whereas WeLL focuses on employees of a random but stratified sample of all German establishments. As compared to the labor market as a whole, WeLL respondents are positively selected in terms of their labor market attachment and also earnings (with average gross monthly earnings of 3,126 Euro as compared to 2,203 Euro for the PASS estimation sample and 2,545 Euro for the PASS sample weighted to approximate the German labor market). As documented in detail in the appendix, qualitatively and for the most part also quantitatively this replication exercise leads to very consistent results.

6 Measurement error and earnings dynamics

6.1 A dynamic model of measurement error

In this section, we assess how measurement error biases parameter estimates in a dynamic model of earnings. Building on Pischke (1995), we assume that true earnings \(y_t\) are composed of a permanent component \(z_t\) with a random walk on \(\epsilon _t\) and a transitory component \(\eta _t\):

$$y_{t}=z_{t}+\eta _{t}; \qquad z_{t}=z_{t-1}+\epsilon _{t}.$$
(6)

Pischke (1995, p. 308–309) notes that “[i]n many applications it is useful to decompose earnings in to a permanent and transitory component” as “[d]ifferent events are associated with these components. Promotions or job loss in a high-paying industry lead to permanent changes in earnings, and overtime and temporary layoffs induce temporary variations.”

A specification of the earnings process defined by Eq. (6) in first differences illustrates that changes in earnings follow a first-order moving average process:

$$\varDelta y_{t}=\epsilon _{t}+\eta _{t}-\eta _{t-1} .$$
(7)

Pischke (1995) observes only two periods of survey-based earnings data. He therefore has to make a number of relatively specific assumptions about the measurement error process and the structure of the variances of permanent and transitory earnings shocks. For reasons of comparability, we impose the exact same assumptions. More specifically, measurement error is assumed to be composed of three additive components, (a) the transitory earnings shock \(\eta _{t}\) with “transmission” factor \(\alpha\), (b) an individual fixed effect \(\mu\) that is uncorrelated with earnings and (c) a white noise error term \(\xi _{t}\):

$$\ m_{t}=\alpha \eta _{t}+\mu +\xi _{t} .$$
(8)

Due to the individual fixed effect the model assumes a time-invariant autocorrelation of measurement error. If \(\alpha\) is smaller than zero then this implies mean-reverting measurement error and underreporting of transitory earnings shocks. In this simple model, measurement error is assumed to be unrelated to permanent earnings shocks. Again following Pischke (1995), we additionally assume that the variance of permanent earnings shocks \(\text {var}[\epsilon _{t}]\) is constant over time while the variance of transitory earnings shocks \(\text {var}[\eta _{t}]\) may vary:

$$\begin{aligned} \text {var}[\epsilon _{t}]=\sigma ^{2}_{\epsilon }; \quad \text {var}[\eta _{t}]=\sigma ^{2}_{\eta t}; \quad \text {cov}[\epsilon _{t},\eta _{t}]=0; \quad \text {var}[\xi _{t}]=\sigma ^{2}_{\xi t}; \quad \text {var}[\mu ]=\sigma ^{2}_{\mu } . \end{aligned}$$
(9)

The resulting model for earnings dynamics and measurement error implies the following moment conditions:

$$\begin{aligned} \begin{aligned} \text {var}[\varDelta y_{t}]&=\sigma ^{2}_{\epsilon }+\sigma ^{2}_{\eta t}+\sigma ^{2}_{\eta t-1}; \\ \text {cov}[\varDelta y_{t}, \varDelta y_{t-1}]&=-\sigma ^{2}_{\eta t-1}; \\ \text {var}[m_{t}]&=\alpha ^{2}\sigma ^{2}_{\eta t}+\sigma ^{2}_{\xi t}+\sigma ^{2}_{\mu }; \\ \text {cov}[\varDelta y_{t}, m_{t}]&=\alpha \sigma ^{2}_{\eta t}; \\ \text {cov}[\varDelta y_{t}, m_{t-1}]&=-\alpha \sigma ^{2}_{\eta t-1}; \\ \text {cov}[m_{t}, m_{t-j}]&=\sigma ^{2}_{\mu }. \end{aligned} \end{aligned}$$
(10)

All other covariances are assumed to be zero.

6.2 Estimation results

We identify the parameters of the dynamic model of measurement error and earnings as described in Eqs. (7) and (8) by fitting the moment conditions of Eq. (10) to the sample covariance matrix of earnings changes and measurement error using Generalized Method of Moments (GMM). For this purpose, we rely on the same two linked survey-administrative data sets also used for the static model of measurement error and earnings and compute estimation results separately for the PASS survey data (as described here) and for WeLL (cf. the appendix).

The earnings process of Eq. (7) can be decomposed into a permanent and a transitory component. The relevant estimates in Table 7 imply that according to PASS earnings data, between 40 percent and 56 percent of the variation in earnings changes is due to the permanent component. For instance, for the second wave of PASS the variation in earnings changes amounts to \(\text {var}[\varDelta y_{2}] = {\hat{\sigma }}^{2}_{\eta 1}\) + \({\hat{\sigma }}^{2}_{\eta 2}\) + \({\hat{\sigma }}^{2}_{\epsilon } = 0.0114 + 0.0083 + 0.0133 = 0.0330\) and the variation in the permanent component to \({\hat{\sigma }}^{2}_{\epsilon } = 0.0133\). In turn, \(0.0133 / 0.0330 = 0.4030\) indicates that for the second wave of PASS 40 percent of the variation in earnings changes is due to the permanent component. Over all four waves of PASS, the importance of the permanent component for explaining the variation in earnings changes is rather sizable. It is also much higher than what is found by Pischke (1995) who only attributes between ten and 25 percent of the variation in earnings changes to the permanent component. According to these results, compared to the United States earnings changes in Germany are relatively more likely to result from events like promotions or job loss in a high-paying industry that lead to permanent changes in earnings rather than occurences like overtime and temporary layoffs that induce only temporary variations.

Table 7 GMM estimates of earnings dynamics and measurement error

The estimates of the measurement error process of Eq. (8) make it possible to account for the contributions of white noise, individual fixed effects and underreporting of transitory earnings. According to our estimates, the variance of the white-noise component is quite large, between 0.0221 and 0.0264. At 0.0128, the variance of the individual fixed effect is sizable as well. In contrast, the variance of underreporting is only between 0.0052 and 0.0114. The contribution of underreporting is further moderated by the transmission factor \(\alpha\) at \(-\)0.6974. Overall, the white-noise component accounts for between 56 percent to 63 percent of the total variance of the measurement error, individual fixed effects for between 31 and 34 percent and the underreporting of transitory earnings for between six and 13 percent.Footnote 11 This means that according to our model and data both random fluctuations and individual-specific time-invariant characteristics account for most of the total variance of the measurement error while the underreporting of transitory earnings is much less important. While Pischke (1995) also finds that underreporting plays a relatively small role in explaining the total variance, he attributes more than 80 percent of this variance to the white-noise component.

In turn, the underreporting of transitory earnings depends on the extent of transitory earnings shocks and the transmission factor \(\alpha\). The negative estimate of \(\alpha\) confirms the finding of mean-reverting measurement error already documented in Sect. 5 and implies that individuals underreport transitory earnings by 70 percent. Pischke (1995) also finds mean-reverting measurement error and underreporting of transitory earnings. As already mentioned, Pischke (1995) regards it as very plausible that people underreport changes in transitory earnings as such changes tend to be related to relatively minor events.Footnote 12

In order to investigate the fit and potential shortcomings of the dynamic model of measurement error, Table 8 uses the PASS data to report both the actual covariance matrix of earnings dynamics and measurement error observed in the data and a fitted covariance matrix that accounts for the constraints imposed by our model. Overall, our simple model fits the data very well: Estimated and fitted values of variances and first-order autocovariances of earnings changes and measurement error are all very similar. In the model, higher-order autocovariances of measurement error are assumed to be identical to first-order autocovariances of measurement error while higher-order autocovariances of earnings changes are assumed to be zero. Both assumptions seem to be relatively good approximations of the data-generating process.

Table 8 Estimated and fitted variance-covariance matrices

The model also does a good job in fitting contemporary and lagged covariances between earnings changes and measurement error. Both in terms of signs and magnitudes, all relevant covariances allowed to be non-zero are fitted very well. The relevant constraints also appear broadly appropriate although some of the covariances constrained to zero in the fitted model—such as \(\text {cov}[\varDelta y_{2}, m_{4}]\), \(\text {cov}[\varDelta y_{3}, m_{4}]\) and \(\text {cov}[\varDelta y_{4}, m_{1}]\)—are estimated to be about 0.002 in absolute terms in the data.

6.3 Implications for the estimation of earnings dynamics

Reconsider the simple model in Eq. (6) and assume that we wanted to estimate the variances of the innovation to the permanent component \(\epsilon _{t}\) and to the transitory component \(\eta _{t}\) but that we only had access to PASS survey data. In other words, assume data were available not on true earnings \(y_{t}\) but only on reported earnings \(x_{t}\). Then, instead of the process for changes in true earnings of Eq. (7), we would need to rely on the following process for changes in reported earnings:

$$\begin{aligned} \varDelta x_{t}=\epsilon _{t}+(1+\alpha )(\eta _{t}-\eta _{t-1})+\xi _{t}-\xi _{t-1} . \end{aligned}$$
(6.6)

In this case, the corresponding moment conditions would be:

$$\begin{aligned} \begin{aligned} \text {var}[\varDelta x_{t}]&=\sigma ^{2}_{\epsilon }+(1+\alpha )^2(\sigma ^{2}_{\eta t}+\sigma ^{2}_{\eta t-1})+\sigma ^{2}_{\xi t}+\sigma ^{2}_{\xi t-1}; \\ \text {cov}[\varDelta x_{t}, \varDelta x_{t-1}]&=-(1+\alpha )\sigma ^{2}_{\eta t-1}-\sigma ^{2}_{\xi t-1}. \\ \end{aligned} \end{aligned}$$
(6.7)

A comparison of Eqs. (10) and (12) gives an indication of the degree to which measurement error biases survey-based estimates of the parameters in a dynamic model of earnings. Estimates of the variance of true earnings changes using Eq. (12) are confounded by two bias terms of opposite sign. On the one hand, the variance is underestimated because of the underreporting of transitory earnings, \(((1+\alpha )^{2}-1)(\sigma ^{2}_{\eta _{t}}+\sigma ^{2}_{\eta _{t-1}})\) with \(\alpha <0\). On the other hand, the white-noise term in measurement error \(\xi _{t}\) results in the overestimation of the variance of true earnings changes due to the additional positive variance component, \(\sigma ^{2}_{\xi _{t}}+\sigma ^{2}_{\xi _{t-1}}\). Two bias terms of opposite sign also arise for the first-order autocovariances of true earnings changes.

Tables 9 and 10 use the GMM estimates from Table 7 to quantify the biases that result from the underreporting of transitory earnings and white noise error. The tables also depict the total biases in estimates of the variance and first autocovariance of earnings growth and, with reference to the fitted variance-covariance matrices from Table 8, the variances of true earnings changes \(\text {var}[\varDelta y_{t}]\). Finally, they list the reliability ratios, i.e., the variance of true earnings changes to the variance of reported earnings changes \(\lambda _{t}=\text {var}[\varDelta y_{t}]/\text {var}[\varDelta x_{t}]\).

Table 9 Bias in estimates of variance of earnings growth

Table 9 focuses on the bias in estimates of the variance of earnings growth. Its first column reports the bias due to underreporting of transitory earnings, the second column gives the bias due to the white-noise component in the measurement error and the third the total bias.Footnote 13 As is evident from these first three columns, in the PASS data the biases due to underreporting of transitory earnings and white noise error are both substantial. In accordance with findings by Pischke (1995), the bias due to white noise error consistently has the opposite sign as the bias due to the underreporting of transitory earnings. In absolute terms, the bias due to white noise error is larger than the bias due to the underreporting of transitory earnings. Therefore, the two biases do not cancel each other out. Instead, relying entirely on the PASS data leads to a substantial overestimation of the overall variance of earnings growth. This is different to what Pischke (1995) finds for the PSID.

At between 0.02 and 0.03, the values for the variance of true earnings changes \(\text {var}[\varDelta y_{t}]\) listed in the fourth column of Table 9 are also slightly larger than the comparable values reported by Pischke (1995), which range from 0.01 to 0.03. This should probably come as no surprise as our linked survey-administrative data span the entire German economy, whereas Pischke ’s (1995) data cover just one single plant. Reliability ratios reported in the fifth column of Table 9 are between 0.38 and 0.54. This is somewhat lower as what is documented by Pischke (1995) who finds a reliability ratio of 0.79 for the first year of his survey and a ratio of 0.54 for the second one.

Table 10 Bias in estimates of first-order autocovariance of earnings growth

Table 10 repeats the exercise underlying Table 9 but for \(\text {cov}[\varDelta y_{t}, \varDelta y_{t-1}]\) instead of \(\text {var}[\varDelta y_{t}]\). The first three columns of Table 10 confirm for \(\text {cov}[\varDelta y_{t}, \varDelta y_{t-1}]\) that the biases due to underreporting of transitory earnings and white noise error are of opposite signs and rather substantial. As in the case of \(\text {var}[\varDelta y_{t}]\), the bias due to white noise error is consistently larger than the bias due to the underreporting of transitory earnings (in absolute terms). As a consequence, the overall bias is negative and substantial.

At below 0.01 in absolute terms, the values for \(\text {cov}[\varDelta y_{t}, \varDelta y_{t-1}]\) reported in Table 10 are relatively small. They are also of a similar order of magnitude to comparable values reported by Pischke (1995) who also documents negative values at or below 0.01 in absolute terms. As a result of the relatively substantial biases and the relatively modest values for \(\text {cov}[\varDelta y_{t}, \varDelta y_{t-1}]\), reliability ratios for the estimated first-order autocovariance of earnings growth tend to be rather low, ranging from 0.18 to 0.33, which is lower than the reliability ratios for the estimated variance of earnings growth. They are also lower than those documented by Pischke (1995) who finds a ratio of 0.49 for the first year of his survey and a ratio of 0.40 for the second one.

Once again following Pischke (1995), Table 11 presents results for the estimation of the variance and first-order autocovariance of earnings changes under the assumption that only PASS survey data potentially contaminated by measurement error are available. In addition to that, the table also depicts results for a decomposition of the variance of earnings changes into a permanent component \(\eta _{t}\) and a transitory component \(\epsilon _{t}\).

Table 11 Implied estimates of the innovation variances of earnings growth

Estimates of the variance and first-order autocovariance of earnings changes (\(\text {var}[\varDelta x_{t}]\) and \(\text {cov}[\varDelta x_{t}, \varDelta x_{t-1}]\), respectively) can be derived from Eq. (12) and are presented in the first and second column of Table 11. The table’s third and fourth columns present the estimates for the variance of the transitory and permanent component of earnings innovations obtained purely from the survey-based data (\(\text {var}[\eta _{t}]^{\text {biased}}\) and \(\text {var}[\epsilon _{t}]^{\text {biased}}\)). To differentiate these variances from the relevant variances obtained from estimating the true process of earnings dynamics, they are labelled by the superscript “biased”. The biased variance of the transitory component \(\text {var}[\eta _{t}]^{\text {biased}}\) is identified from \(\text {cov}[\varDelta x_{t}, \varDelta x_{t-1}]\). Given an estimate for \(\text {var}[\eta _{t}]^{\text {biased}}\), the variance of the biased permanent component \(\text {var}[\epsilon _{t}]^{\text {biased}}\) can be identified from \(\text {var}[\varDelta x_{t}]\):

$$\begin{aligned} \begin{aligned} \text {var}[\eta _{t}^{\text {biased}}]&=-\text {cov}[\varDelta x_{t}, \varDelta x_{t-1}]; \\ \text {var}[\epsilon _{t}^{\text {biased}}]&=\text {var}[\varDelta x_{t}]-2\text {var}[\eta _{t}^{\text {biased}}]. \end{aligned} \end{aligned}$$
(13)

It is illuminating to compare the biased estimates for the variance of the transitory and permanent component of earnings innovations obtained purely from the survey-based data as depicted in Table 11 with the true estimates for the same parameters as summarized in Table 7. This comparison gives an indication whether according to the estimates that rely purely on survey-based earnings the variances of the transitory and permanent component of earnings innovations are over- or underestimated in the PASS survey.

According to the purely survey-based earnings estimates depicted in Table 11, the estimates of \(\text {var}[\eta _{t}]^{\text {biased}}\) range from 0.026 to 0.028. According to the GMM estimates of earnings dynamics and measurement error summarized in Table 7 and already discussed above, the estimates of \(\text {var}[\eta _{t}]\) lie between 0.005 and 0.011. Thus, survey-based earnings data generally overstate the variance of the transitory component of earnings innovations. The opposite is generally the case with regard to the permanent component. According to survey-based earnings data, the estimates of \(\text {var}[\epsilon _{t}]^{\text {biased}}\) are 0.007, 0.008 and 0.013. In contrast, the true estimate of the same parameter \(\text {var}[\epsilon _{t}] = {\hat{\sigma }}^{2}_{\epsilon }\) is 0.013. Thus, similar to what is also found by Pischke (1995), if inferences are made on the basis of survey data with measurement error the transitory component in earnings innovations will be relatively overestimated as compared to the permanent component.Footnote 14

Somewhat in contrast to the relatively encouraging findings by Pischke (1995), our dynamic model of measurement error and earnings presents at best mixed news for the estimation of earnings dynamics with survey data. On the one hand, the negative correlation of measurement error with transitory earnings attenuates the role of white-noise measurement error. On the other hand, reliability ratios are generally low and variance estimates of earnings growth from survey data overstate the true variance. In addition, a decomposition of survey-based earnings data into a permanent and a transitory component overstates the variance of the transitory component of earnings innovations. In our reading, these results suggest caution about using the PASS survey to analyze earnings dynamics.

7 Conclusions

In this paper, we replicate and extend the analyses of the extent of measurement error in longitudinal earnings data by Bound and Krueger (1991) (BK) and Pischke (1995) with German linked administrative and survey data. As compared to BK and Pischke (1995), our findings provide more recent evidence on measurement error in survey data. They also provide first evidence from Germany that makes it possible to put the conclusions by BK and Pischke (1995) into an international perspective. We also make the following methodological contributions: (a) we extend the simple model of measurement error suggested by BK from the two-period to the four-period case, (b) we introduce the methodology for taking account of top-coded administrative earnings information developed by Card et al. (2013) to the measurement error literature and (c) we develop a procedure for merging the correct administrative records to the different waves of the PASS survey (and in a robustness exercise also to the WeLL survey).

Qualitatively, we confirm many of the main results of BK and Pischke (1995). Most crucially, we confirm that in a cross section survey-based earnings data are relatively reliable but that their reliability tends to be much lower when the data are specified in first differences. Other findings include (a) that measurement error in our survey data is not classical but can be more adequately described as mean-reverting (in the sense that measurement errors are serially correlated over 2 years and negatively correlated with true earnings), (b) that the mismeasurement of earnings leads to little bias when survey-based earnings are on the left-hand side of a typical Mincer-type earnings regression, (c) that measurement error in survey-based earnings data is positively correlated from one year to the next and even over an extended period of time, (d) that survey respondents underreport innovations to the transitory component of earnings (i.e., those resulting from transitory events like overtime and temporary layoffs) and (e) that variance estimates of earnings growth from error-ridden data overstate the true variance. Our results are robust to various sensitivity checks and arguably exhibit a high degree of external validity as demonstrated through the use of a second linked survey-administrative data set with a focus on a different part of the earnings distribution.

In order to compare our findings for Germany directly with those already available for the United States, our models impose the same assumptions as BK and Pischke (1995). This approach comes with certain drawbacks. First, BK and Pischke (1995) observe only two periods of survey-based earnings data and therefore need to make relatively specific assumptions regarding measurement error processes and other model features. With data spanning more than two periods, it would in principle be possible to relax some of these assumptions. Second, BK and Pischke (1995) assume that administrative earnings information is free of measurement error and represents true earnings. This is a commonly made assumption. However, it has been challenged by a number of studies, including Kapteyn and Ypma (2007), Abowd and Stinson (2013) and Oberski et al. (2017) who assess the relative reliability of administrative and survey-based earnings data and argue that while reliability is generally higher for administrative data (at least for full-time workers) neither data source represents true earnings. More specifically, Abowd and Stinson (2013) argue that administrative earnings might not always represent true earnings due to filing or processing errors. While the earnings information in the ADIAB data is considered highly reliable, such errors can arguably never be completely ruled out.Footnote 15 The extension of our analysis to models with more general measurement error processes or that allow for measurement error in both survey and administrative data is left for further research.

Further research may also be useful to determine whether our results are generalizable beyond the period covered (2006 to 2010). Germany was hit by the Global Financial Crisis in 2009, which might affect results for the last two periods of our analysis. While our results appear robust across the entire observation period, follow-up research could potentially more explicitly analyze measurement error in earnings data in crisis as compared to non-crisis periods. Finally, while we confirm many of the main results of BK and Pischke (1995) there are also some noteworthy disparities, for instance regarding the role of individual fixed effects in explaining measurement error. An analysis of possible reasons underlying these disparities (e.g. cultural or institutional differences between the United States and Germany) would constitute another promising avenue for follow-up research.

In spite of these caveats and possibilities for extensions, our results leave us confident to encourage empirical analyses using German survey-based earnings data in levels but somewhat more cautious about using the same data in first differences. They also open the door for more research on understanding the precise nature and properties of measurement error in longitudinal earnings data, on the quality of German administrative and survey-based earnings information and on whether findings regarding labor market characteristics and impacts of economic policy depend on using either administrative or survey-based earnings data.