1 Introduction

Tax evasion represents a huge problem for governments, threatening to undermine the overall legitimacy of the tax system. But determining the extent of the problem is far from straightforward. Pissarides and Weber (1989) (PW) introduced the expenditure approach to identify income underreporting by the self-employed in Britain. By using food consumption as an indicator of true income, the degree of underreporting among the self-employed is inferred by comparing the food-income ratios of wage earners and the self-employed for reported income. PW find that the observed income of self-employment households in Britain in 1982 must be multiplied by 1.55 in order to arrive at true income.

The PW study has sparked identification of such scaling factors for several other countries, including Kukk et al. (2019) for many EU countries, Schuetze (2002) for Canada, Johansson (2005) for Finland, Engström and Holmlund (2009) and Engström and Hagen (2017) for Sweden, Martinez-Lopez (2013) for Spain, Paulus (2015) for Estonia, Hurst et al. (2014) for the US, Kim et al. (2017) for Korea and Russia, Nygård et al. (2019) for Norway, and Cabral et al. (2021) for New Zealand.

Given that conventional food expenditure data are usually derived from sample surveys of limited size, which often are contaminated with non-response bias, it is desirable to establish alternatives to the food trace for measuring income underreporting. The present study connects to a number of post-PW studies that seek to find traces of true income by examining information on consumption items other than food. For example, Duncan and Peter (2014) make use of electricity consumption information, Braguinsky et al. (2014) use cars, while Engström et al. (2021) employ the values of pleasure boats.

Here, as in Feldman and Slemrod (2007) (FS), we use donations to charitable organizations for the whole Norwegian population over the period 2012–2017 as an indicator of true income. Access to large-scale administrative register panel data is a clear advantage in studies of income underreporting, as rich administrative data permit some of the main empirical challenges in studies of income underreporting to be addressed and overcome. Previous studies have already emphasized that panel data are important for establishing a measure of permanent income (Kim et al., 2017; Engström & Hagen, 2017). For example, Engström and Hagen (2017) demonstrate that not controlling for transitory income fluctuations in income introduces attenuation bias in estimates of the degree of underreporting, and in their case causes it to be overestimated by around 40 percent.

The present study draws attention to another advantage of employing panel data, namely that the PW and FS specifications for measuring income underreporting can be estimated using the panel data fixed effects model. It is generally acknowledged (for example, by both PW and FS) that an estimate of income underreporting based on the expenditure approach would reflect self-selection.Footnote 1 The self-employed includes individuals that have decided to enter into self-employment precisely because it offers opportunities for underreporting and tax evasion. Then an estimate of income underreporting obtained by the standard expenditure approach and OLS estimation reflects a mix of self-selection and general income underreporting behavior. As fixed effects estimation controls for the unobservables that make individuals self-select into self-employment, it informs about income underreporting short of self-selection. In this perspective, fixed effects estimation results are informative about the underlying causes behind the income underreporting of the self-employed (Kim et al., 2008).

Furthermore, a critical assumption of the expenditure approach is that it requires that the preferences for the consumption item used as trace of true income, conditional on disposable income, are equal for self-employed and wage earners. The assumption of identical intrinsic consumption preferences being the same for self-employed and wage earner households is critical for any choice of trace of true income trace. With respect to the present study, which employs donations or “consumption of generosity” as the consumption item, there could be several reasons for conditional charity-income ratio to vary by occupational choice (Slemrod, 2019). For example, Glazer and Konrad (1996) claim that charitable donations signal wealth (or integrity), a motive which is arguably more relevant for some self-employed people, which, in turn, implies that the donation share (ceteris paribus) is higher for the self-employed than for the wage earners (Feldman & Slemrod, 2007). It follows that standard OLS estimates likely are upward biased because of such omitted variables. Thus, another main argument for employing the fixed effects estimator in studies of income underreporting is that it controls for systematic differences in time-invariant unobservables leading to differences in preferences of individuals belonging to the two groups.

In the following we demonstrate the advantages of employing fixed effects estimation in studies of income underreporting. We present OLS and fixed effects estimates of the PW scaling factor to get from reported to true income, presenting estimates using both the PW and the FS identification methods for data at both individual level and household level. For the latter, we focus on two-adult households, which is standard in the literature.

The Norwegian administrative donation data that we have access to for the present analysis are generated by support for charitable organizations, as in many other countries, being encouraged by making these expenses deductible in the personal income tax system.Footnote 2 In the Norwegian context, this means that recipients of donations (say the Red Cross) report electronically to the tax administration data on whom they have received support from and how much each has donated over the calendar year. This third-party reported information is in turn used to generate pre-filled income tax returns. All donations of more than NOK 500 Norwegian kroner (USD 61; EUR 54) to approximately 400 charities and religious/belief-based organizations on a list of pre-qualified organizationsFootnote 3 are recorded. On average, approximately 350,000 of a total of some 2.3 million households donate each year.

The donation data are linked to several other administrative registers, such as the Register of Income Tax Returns (Statistics Norway, 2019), through a personal ID number. This means that the data include information on several other characteristics of individuals and households, such as income, wealth, age, education, number of children, etc.

As expected, we find that fixed effects estimation yields smaller estimates of income underreporting than OLS. For two-adult households and given our main definition of self-employment, we obtain fixed effects estimates of 1.12 and 1.16, compared to 1.19 for OLS. As fixed effects estimation accounts for self-selection into self-employment by individuals inclined to tax evasion, a lower estimate is anticipated. We may decompose the total effect into a general income underreporting effect of the sector and the effect of self-selection into self-employment and a positive difference between OLS estimates and fixed effects estimates signifies that self-selection plays a role in the income underreporting. But as fixed effects estimation controls for omitted variables in general and given that we employ donation as our trace of true income, there are likely other reasons for finding lower fixed effects estimates too. If there is a positive correlation between individual fixed effects and self-employment with respect to donations, as for example follows from the visibility argument of Glazer and Konrad (1996), the standard OLS estimate is biased upward.

The paper is organized as follows: In Sect. 2 we present the two main versions of the expenditure approach, the original PW method, and the modification by FS. Next, in Sect. 3 we present the administrative donations register that we have had access to for this study. Then, in Sect. 4, we compare the results of OLS and fixed effects estimations. Section 5 concludes the paper.

2 Income underreporting measured by expenditure methods

2.1 The Pissarides and Weber approach

The seminal work of PW demonstrates how information on income underreporting by the self-employed can be obtained using food consumption as an indicator of true income. The basic idea is to use consumer expenditure data to estimate a common Engel curve for food consumption for the self-employed and wage earners, but allowing for a shift in the intercept for the self-employed. Then the excessive food consumption by the self-employed for identical income levels in the two groups is attributed to non-reported income on the part of the self-employed. PW define a scaling factor or a proportionality factor, k, which is the factor by which observed income, y, must be multiplied in order to obtain the true income, \(y^{*}\), of the self-employed, \(k\equiv \frac{y^{*}}{y}\).Footnote 4

More specifically, the PW approach builds on estimating

$$\begin{aligned} \ln c_{h}=\beta _{0}+Z_{h}^{'}\gamma +\beta \ln y_{h}+\delta q_{h}+\xi _{h}, \end{aligned}$$
(1)

where \(c_{h}\) is food consumption for household h, \(y_{h}\) is income for household h, \(Z_{h}^{'}\)is a set of household control variables, \(\beta _{0}\) is a constant, \(\xi _{h}\) is the error term. As the indicator variable, \(q_{h}\), takes the value 1 if there is a self-employed person in the household (otherwise 0),Footnote 5 the parameter \(\delta\) measures the difference in intercepts between the self-employed and the wage earners. The other key parameter is \(\beta\), which measures the slope of the Engel curve for food. Although consumption is usually determined by permanent income, many applications let income, \(y_{h}\), be represented by current income. In that case, current (true) income is assumed to fluctuate around permanent income by a factor g, defined as \(y^{*}=gy^{pe}\), where \(y^{pe}\) is permanent income. PW assume that the coefficients g and k follow lognormal distributions around their means, \(\ln g=\mu _{g}+u\) and \(\ln k=\mu _{k}+v\). Then, after some rearranging, we get an estimate of the adjustment faktor, k, given by the average factor of income underreporting for the self-employed, \(\overline{k}\),

$$\begin{aligned} \bar{k}=\exp \left[ \mu _{kSE}+\frac{1}{2}\sigma _{vSE}^{2}\right] =\exp \left[ \frac{\hat{\delta }}{\hat{\beta }}+\frac{1}{2}\left( \sigma _{vSE}^{2} +\sigma _{uWE}^{2}-\sigma _{uSE}^{2}\right) \right] , \end{aligned}$$
(2)

where \(\sigma _{vSE}^{2}\) is the variance of the error term v for the self-employed, \(\sigma _{uWE}^{2}\) is the variance of the error term u for the wage earners and \(\sigma _{uSE}^{2}\) is the variance of the error term u for the self-employed. Subscripts SE and WE indicate self-employed and wage earners, respectively.

There are at least two complicating factors. First, the variances of Eq. (2) are not known. Second, as already discussed, \(y_{h}\) in Eq. (1) refers to annual income. Given these challenges, PW treat the annual income variable as endogenous and instrument it. They also impose assumptions about variances in order to obtain upper and lower bounds for \(\overline{k}\). PW use income and expenditure data drawn from the British Family Expenditure Survey of 1982, and refer to a general estimate for k of 1.55, meaning that the disposable household income of the self-employed must be multiplied by 1.55 on average in order to yield true income.

Recall that a major advantage of having access to panel data is that measures of permanent income can be constructed in a straightforward manner (Engström & Hagen, 2017). When permanent income, based on aggregating income over several years, is entered into the Engel function, Eq. (2) can be simplified, as \(\sigma _{uWE}^{2}\)= \(\sigma _{uSE}^{2}\). This makes a stronger case for not employing instrumental variables, and hence we do not utilize an IV approach in the following.

Importantly, the focus of the present study is on the key assumption (of the PW approach) that intrinsic consumption preferences are the same for self-employed and wage earner households (Slemrod, 2019). This assumption can be questioned for any choice of trace of true income and certainly in the case where information about charitable donations is used. As unobservables likely are positively correlated to the self-employment dummy variable, standard OLS estimates of the parameter \(\delta\) in the PW model becomes large, and hence the estimate of income underreporting. Moreover, another reason for the parameter \(\delta\) in the PW model may become large is that it reflects self-selection into self-employment, to the extent that agents decide to be self-employed because of the scope for tax evasion (Kim et al., 2008).Footnote 6 This effect is not picked up by standard OLS estimates. However, the PW model can straightforwardly be extended to allow for estimation by the fixed effects estimator, which holds the promise of producing estimates that account for self-selection and other omitted variables.

2.2 The extension by Feldman and Slemrod

In addition to employing register data on donations instead of food consumption, FS exploit information on income in the income tax return data directly, instead of relying on explicit categorization into wage earners and self-employed.Footnote 7 By assuming that a person’s charitable inclinations are unrelated to their source of real income, but not necessarily to the income and sources of income that are declared, underreporting is backed out from differences in the relationship between charitable contributions and reported income. In the spirit of PW, the relationship between donations and wage and salary income represents the non-evasion benchmark. Similarly, any differences between this norm and the relationship between charitable contributions and income earned from other sources, such as income from self-employment, farming and capital, are attributed to income underreporting.

In the FS model, donation, G, is a function of observed income, V, and invisible income, I. The agents decide how much of the latter they report to the tax authorities, denoted R. Given that there is a linear relationship between reported and true income for the invisible part, \(I=k^{FS}R\), we have

$$\begin{aligned} G=G\left( V+\sum _{i}k_{i}^{FS}R_{i},Z^{'}\right) , \end{aligned}$$
(3)

where \(Z^{'}\) is a vector of other household characteristics and subscript i refers to types of invisible income (as FS allow for several income components to be underreported). Thus, FS postulate that there is a common k relationship between reported income and true invisible income. But as FS identify income underreporting in terms of the gross value of one or more income components, and the k of PW is based on household disposable income, the k of FS is different from the k of PW. This is signified here by the superscript FS assigned to the k of Eq. (3). To obtain comparable measures, estimates of \(k^{FS}\) are recalculated into PW results; the technique used is further explained in Sect. 4.3. Note also that only self-employment income is assumed to be underreported in the present analysis, in contrast to in FS, where deviations are reported for several income components. The main reason is that other income components, such as capital income, are predominantly third-party reported in the Norwegian system.Footnote 8

Given that we have information on household composition too, we not only estimate Eq. (3) at the individual level (as FS), but derive k by aggregating income and donations across household members. It can be argued that economic decision-making is predominantly carried out at household level and using the household as the unit of analysis is preferable. For example, one person (of the family) could be in charge of the family’s donations, while another is self-employed and has scope for underreporting.

Like FS, we adopt a log-log specification, and the estimation equation can then be represented as

$$\begin{aligned} \log G=\alpha _{0}+\alpha _{1}\log \left( V+k^{FS}R\right) +\beta Z^{'}, \end{aligned}$$
(4)

which indicates that we are estimating only one k, the one for self-employment income. It follows from this method that the overall relationship between true income and donations is reflected by the estimate of \(\alpha _{1}\). Note that, unlike FS, we do not include a constructed variable (S) in order to differentiate between two types of individuals reporting no invisible income, \(R=0\): those with and those without the opportunity for misreporting.Footnote 9 Other differences from the analysis of FS are that we have information on more control variables, represented by \(Z^{'}\) in Eq. (4), and that donations are third-party reported in our data.

In contrast to FS, who include a representation of the tax price in their empirical specification, equal to one minus the first-dollar marginal tax rate, we do not enter a tax variable in Eq. (4). The reason is that under the dual income tax scheme of Norway, there is a flat tax on the income base from which donations are deductible, currently at 22 percent, which means the price of the first krone given to charity is the same (0.78), independent of being self-employed or wage earner. Regression results (not reported here) confirm that this choice does not affect results.Footnote 10

As in the PW approach, the FS method can provide for permanent income and fixed effects estimation. The permanent income modification is obtained by simply letting V in Eq. (4) be represented by a measure of permanent income. As we do not have access to a ready-made fixed effects procedure for the FS approach, the econometric specification is obtained by taking first differences of the characteristics of all years against the corresponding averages over the time period. We estimate

$$\begin{aligned} \log G-\log \bar{G}=\lambda _{0}+\lambda _{1}\left[ \log (V+kR)-\log (\bar{I} +k\overline{R})\right] +\delta \left( Z-\bar{Z}\right) , \end{aligned}$$
(5)

where the bars symbolize average values. This means that we estimate on differences between average values (across time) and year-specific characteristics for individuals/households. This removes unobserved heterogeneity, similar to as in the standard fixed effects model.

3 Administrative register panel data on donations

3.1 Donation as an indicator of true income

A main reason for turning the attention towards using administrative register data in the measurement of income underreporting is that food consumption datasets are usually small. For example, the food consumption data used in Nygård et al. (2019) came from pooling information from annual versions of the Norwegian Survey of Consumer Expenditure from 2003 to 2009 and for 2012 to obtain a sufficiently large dataset. Despite this, only observations for approximately 6000 households were obtained, of which around 800 are characterized as self-employment households, according to the definition used. Another weakness is that the average response rate to the Survey of Consumer Expenditure is approximately 50 percent.

In contrast, donation data from an administrative register, to which we have access, implies that there is information on charitable giving for the whole population. In our case this means that we have annual records of positive amounts for approximately 350,000 of a total of approximately 2.3 million households. This also signifies that there is a strong prevalence of corner solutions,Footnote 11 as a majority of households do not donate.Footnote 12 As we will return to below, the empirical evidence is obtained by addressing information on donors.

But the donation trace is associated with other complications and, most critically, one may contest that wage earners and the self-employed have the same “consumption of generosity” patterns for the same levels of true income (Slemrod, 2019). First, self-employment income in general may not be spent in the same way as income from other sources, as argued by Lyssiotou et al. (2004). For example, households may decide to use their steady wage income on regular non-luxury goods and then use the self-employment income to buy luxuries. Second, one cannot simply rule out the possibility that the self-employed may be guided by stronger altruistic preferences than those of other people, as discussed by Teal and Carroll (1999). Third, one may argue that the demand for charitable solicitations is not the same across occupations. Tietz and Parker (2014) find, with respect to the US, that the self-employed give substantially more to organizations that address local community issues. As emphasized by Glazer and Konrad (1996), charitable donations may signal wealth or integrity – motives that could be more relevant for the self-employed, as hypothesized by FS. Fourth, a discussion of donations in terms of tax evasion may also open the way for more subtle explanations for links between tax evasion behavior and contributions to charities, such as donation as a “repayment to society” (by the tax evader).Footnote 13 However, a key message of the present study is that fixed effects estimates, which imply controlling for any unmeasured variables that are constant over time, of the individual or the household, are less vulnerable to such confounders.

3.2 Data description

Register data for Norwegian donations have become available because donations of over NOK 500 (USD 61; EUR 54) are tax-deductible. They are deducted from the base for ordinary income (the general income tax base), which means that for 2021 the government pays 22 percent of donations up to a limit of NOK 50,000 Norwegian kroner (USD 6100 and EUR 5400). The tax authorities operate a list of organizations (of about 400 pre-qualified charities and religion/belief-based organizations), support for which makes the individual eligibible for the deduction. Importantly, the data are not only recorded up to the cap, but the charities report the full amounts donated by each individual to the tax authorities. Given that the information is third-party reported to the tax authorities, these data are not weakened by the measurement error associated with self-reporting. This latter phenomenon, often referred to as “endogenous itemizations”, has received substantial attention in analyses based on administrative data from the US; see for example Clotfelter (1980).Footnote 14

The present analysis is based on a comprehensive set of register data, as the donation dataset is linked to other administrative registers, such as the Register of Income Tax Returns (Statistics Norway, 2019), through a personal ID number. This means that we have access to information on several other characteristics, such as income, wealth, age, education, etc. As the data contain information on household formation, we also obtain estimates for the household as the unit of analysis, which is the dominant empirical strategy in the underreporting literature. Importantly, given our empirical strategy, the data are converted in a straightforward manner into a panel dataset.

In order to restrict the analysis to individuals in their prime working age, we condition on age, 25–62 years. As we provide estimates of underreporting for datasets consisting of both individuals and households, the age restriction imposed on the household dataset is implemented by conditioning with respect to the age of the household head—the person in the household with highest income. Ultimately, this means that our empirical investigations are based on approximately 11.7 million observations of individuals over a period of six years and approximately 2.3 million of these individuals donate. This corresponds to a total of approximately 7.6 million observations of households, of which around 2 million donate. Table 1 presents descriptive statistics for the dataset.

FS allow for the possibility of income sources being negative, which corresponds (in our case) to allowing for the possibility of negative self-employment income when estimating the model. It turns out that whether or not a restriction is imposed on positive business income has no effect on the results of the present analysis. Therefore, business owners with negative self-employment income are removed from the samples, for both the PW and the FS estimations.

As fixed effects estimation requires sufficient variation in the explanatory variable, we must make sure that individuals and households shift between the self-employment and wage earner categories in the panel data. We find it reassuring that we observe around 30,000 shifts between self-employment and wage earner or vice versa for both the individual and the household datasets.

Table 1 Descriptive statistics for the donation dataset, 2012–2017

In Table 2 we show descriptive statistics in which we differentiate between wage-earner and self-employment households, providing separate figures for donating and non-donating households. A household belonging to the self-employment group is defined by a household gross income share of at least 25 percent stemming from self-employment (according to the main definition). Table 2 shows that self-employment households both donate somewhat more and have higher disposable income than wage earner households. But these characteristics do not say much about the level of underreporting, as the identification is based on a comparison of wage earners and self-employed for the same income level.

At the outset, as FS, we explore to what extent we can see traces of income underreporting in a simple table depiction of donation patterns. In Table 3 we order individuals (and households) by the ratio of wage income to reported gross income, in ascending order. The first deciles contain those taxpayers who receive little of their income in the form of wages and salaries and high deciles contain those individuals who have the majority of their income source from wages and salaries. Given that self-employment income is underreported, we expect to find, similar to FS, that donation shares (donation as share of gross income) decreases when the share of wage increases. When gross income at low levels of wage income (high levels of self-employment income) is underreported, individuals (or households) appear to have higher charitable inclinations than what is true. Although the levels are lower than as reported by FS, the same pattern is seen in Table 3: donation shares decrease with the wage versus gross income ratio.

Table 2 Descriptive statistics for the donation dataset, 2012–2017. Wage-earner and self-employed households
Table 3 Donation as percentage of total gross income by decile of the ratio of wage income to gross income

4 Comparison of OLS and fixed effects estimates

4.1 PW estimates

In the following we compare OLS and fixed effects estimates of income underreporting, for both the PW and the FS approaches to the measurement of income underreporting. In this section we present the PW estimates, while Sect. 4.3 presents estimates of the FS approach.

Most studies of tax evasion and underreporting use data on households, but we also obtain results for the individual as the unit of analysis. In the household dataset, we focus on two-adults households, which is the common approach in the literature (including PW). Recall that a self-employment household is one in which at least 25 percent of gross income stems from self-employment.Footnote 15

Further, as already discussed, in contrast to when using food consumption as a trace of true income, the donation measure involves a large number of corner solutions, i.e., individuals and households who not donate. Here, we show results for donors, i.e., we restrict our sample to donating individuals and households. The results when all observations are used in the estimations, i.e., including the non-donating households, do not deviate much from the results presented in the following.

We also draw attention to the fact that the estimation results for k for individuals are not directly comparable with the results for households, given that most estimates of the literature are for households. In Table 4 this is expressed by referring to \(k^{IND}\) for estimates obtained directly from the individual dataset, while k refers to the conventional PW estimate – the mean scaling factor for the household disposable income of the self-employed. To convert results from \(k^{IND}\) to k(PW), we adjust disposable income of the self-employed individuals by letting individual incomes be adjusted by \(k^{IND}\). Then we obtain household-level k’s by comparing the average disposable household income of self-employment households with and without the adjustment for individual income underreporting. The converted values of k are reported in the second row of Table 4. It follows that although the individual estimates of k are large, for example \(k^{IND}=1.35\) for the permanent income specification, we obtain estimates comparable to the household-level PW estimates which are substantially smaller, at 1.22. This follows from not all household members being self-employed in a “self-employment household”. The converted k’s are not marked by level of significance, but it follows from the z-values of the estimated k’s that the converted k’s are significantly different from 1 too.

We show results for specifications where both annual income and permanent income are used as income measures. As already discussed, another major advantage of having access to panel data is that measures of permanent income can be established in a straightforward way. Thus, results are provided which also testify to this use of panel data. The control for permanent income is a measure of average disposable income over the period 2012–2017, when all incomes are measured in 2017-prices. According to Kim et al. (2017), a six-year average provides a sufficiently long period for controlling for transitory variations in annual income. As expected, see the reasoning in Engström and Hagen (2017),Footnote 16 the permanent income specification results in smaller values of k. We note that for the preferred permanent specification, we obtain estimates for the PW scaling factor (k) of 1.22 (individuals) and 1.19 (households). This is somewhat higher than what Nygård et al. (2019) found for Norway when using consumption of food as the trace of true income; they report estimates in the range 1.14–1.16.Footnote 17

Table 4 PW estimates, OLS estimates. Individual and household data, donors

Next, these estimates are compared to results for fixed effects versions of the PW technique. As already discussed, there are factors that likely implies that fixed effects estimates are lower than standard OLS estimates. First, as fixed effects estimation accounts for self-selection into self-employment by individuals inclined to evade taxes, this leads to smaller estimates of income underreporting. Furthermore, given that we use donation as trace of true income, OLS estimates of income underreporting could be upward biased as the dummy variable for self-employment of Eq. (1) is positively correlated with the fixed effects, resulting in an omitted variable bias. We expect that the specific characteristics of the donation behavior of the self-employed, se discussion of them in Sect. 3.1, contributes to overshooting by OLS.

Given this reasoning, we find, as expected, that the fixed effects estimates of k are lower than the corresponding OLS estimates, see Table 5. Fixed effects estimates are 1.14 and 1.12 for individuals and households, respectively, which are clearly below the OLS estimates reported in Table 4, which are 1.22 and 1.19 for individuals and households, respectively (for the preferred permanent income specification). Given the calculated z-scores (reported in both Tables 4 and 5), accounting for statistical uncertainty does not undermine this conclusion.Footnote 18 Moreover, we also note that the fixed effects estimation results (for charitable donations) are somewhat below the OLS estimates obtained for Norwegian data on consumption of food, reported in Nygård et al. (2019). The permanent income OLS estimate of Nygård et al. (2019), at 1.16, is larger than the estimate for two-adult households of Table 5, at 1.12.Footnote 19

Table 5 PW estimates, fixed effects regressions. Individual and household data, donors

4.2 Sensitivity check with respect to the definition of self-employed

In contrast to the FS methodology, of which the results we will return to shortly, results of the PW approach are most likely sensitive to the definition of self-employment. Given that the allocation of observations into self-employed and wage earners is essential in the PW approach, we have taken a closer look at how results vary with respect to the definition of self-employment, also because the fixed effects estimation depends on observations shifting occupations. In Table 6 we therefore present results for other definitions of self-employment. In addition to the definition used so far, where self-employment, both at the individual and the household level, is defined as having at least 25 percent of gross income from self-employment, we also report estimates of k for 15 percent and 40 percent restrictions.

We see that the estimate for k is relatively large for the 40 percent threshold and the individual dataset, 1.19, but it is still below the corresponding (not reported) OLS (pooled) estimate of 1.22. Moreover, Table 6 reports the number of units that shifts their status during the period we have data for. Although the number of shifts is low compared to the total number of observations, we benefit from the large amount of observations in administrative register data, which gives many observations of change in occupational status too.

It follows from the way we assign occupational status under the PW apporach that an individual or a household may shift occupational category because of small increases or decreases in self-employment income. This hardly reflects any real change in the self-employment status and results may be biased because of these artefacts. To test the sensitivity of results with respect to marginal changes in type of income, we obtain fixed effects estimates when we also restrict to individuals having increased or decreased their self-employment income by more than NOK 100,000 (USD 12,000; EUR 11,000) to qualify for an occupational shift. However, we find that this additional data restriction only has small effects on estimates. With reference to the 25 percent self-employment definition (middle column of Table 6), the estimate of k increases to 1.18 and 1.13 for individuals and households, respectively, when the restriction is enforced.

Table 6 Fixed effects regression estimates for k for different definitions of self-employment, represented by self-employment income as percentage share of gross income

4.3 FS-estimates

Next, we explore whether the same pattern is observed for the FS approach as just described for the PW approach. As discussed in Sect. 2.2, k(FS) and k(PW) are not directly comparable, and in Tables 7 and 8 we first report estimates for k(FS) as indicated by the superscript FS. These estimates refer to underreporting in terms of gross income, see Eq. (4). To move from underreporting in terms of addition to gross income components (FS) to underreporting in terms of addition to disposable income (PW), we employ a tax-benefit model (Aasness et al., 2007), calculating the increase in average post-tax income that corresponds to the increase in (gross) business income, and obtaining estimates of k analogous to the k of PW (no superscript).Footnote 20

As in the PW approach, the results of Table 7 clearly demonstrate the importance of controlling for permanent income when using the standard expenditure approach in the identification. For example, the estimate of k for two-adult households is reduced from 1.22 to 1.19 when annual income is replaced with permanent income.Footnote 21 But more importantly, we again find that the fixed effects estimates are smaller than the OLS estimates. For individuals, Tables 7 and 8 show that estimates of k are 1.28 and 1.16 for OLS and fixed effects estimation, respectively (when focusing on the result of the permanent income specification for the OLS alternative). For two-adult households, the estimates are 1.19 and 1.09, respectively.

Finally, one may ask what the lower estimates means in terms of overall revenue loss. According to Nygård et al. (2019) the tax revenue from the personal income tax would have been approximately NOK 8 billion higher if the self-employed reported all their (true) income. Compared to the main fixed effects estimate of PW approach, which is 1.12, an estimate of the revenue loss for the year 2017 is approximately NOK 7.2 billion. The main FS fixed effect estimate is in the same range as found in Nygård et al. (2019); the corresponding revenue loss is NOK 9.5 billion (in 2017).

Table 7 FS estimates, OLS estimates. Individual and household data, donors
Table 8 FS estimates, fixed effects regression results. Individual and household data, donors

5 Conclusion

Considerable attention is devoted to obtaining estimates of the hidden economy, including the extent to which the self-employed underreport their income. Since the introduction of the expenditure approach of Pissarides and Weber (1989) (PW), it has been standard to back out measures of underreporting based on excess food consumption by the self-employed compared to wage earners, for the same level of reported income. But the standard approach of using food consumption as a trace of true income suffers from expenditure survey datasets being small and likely exposed to non-response bias. Thus, if we can find traces of true income other than the conventional one of food consumption, the scope for empirical investigations increases. Accordingly, there are several examples of studies employing information on other consumption items in order to obtain measures of underreporting, such as electricity use and spending on boats and cars.

In the present paper, as in Feldman and Slemrod (2007) (FS), we direct attention at the use of donations to charitable organizations in this type of work. Given that donations are reported in income tax returns in many countries, and therefore can be derived from administrative registers, the datasets would typically be much larger than for food expenditures. In the present analysis, we exploit data for approximately 350,000 Norwegian donors each year over six years (2012–2017).

These data come with additional advantages. We benefit from the donation register data being linked, through a personal ID number, to information from several other administrative registers, such as the Register of Income Tax Returns. Most importantly, from the perspective of the present study, the data hold a panel dimension, which opens up for fixed effects estimation of the Engel function of the expenditure approach. As OLS estimates of income underreporting likely reflect that agents self-select into self-employment in order to evade tax, employing panel data and using fixed effects estimation is helpful to for obtaining estimates without the self-selection component. In light of this reasoning we expect obtaining fixed effects estimates of income underreporting that are smaller than the OLS estimates, reflecting that self-employment is attractive for persons inclined to evade taxes.

However, there are reasons to expect that fixed effects estimates also pick up other omitted variables following from the expenditure approach. In general, there are likely differences between wage earners and self-employed with respect to the consumption of the trace of true income used for identification. In the present study, where we employ donation as trace of true income, we refer to a number of reasons for the relationship between donations and true income for wage earners and the self-employed to differ. For example, the self-employed may donate more as a means of signalling (Glazer & Konrad, 1996). We argue that fixed effects estimation is preferable given such measurement problems, as OLS estimates most likely are upward biased because the dummy variable for self-employment of the Engel curve (used in the estimation) is positively correlated with the fixed effects.

Previous studies have already emphasized that panel data allows for the establishment of a measure of permanent income (Kim et al., 2017; Engström & Hagen, 2017). Moreover, Engström and Hagen (2017) demonstrate that the degree of underreporting is substantially overestimated when permanent income is not used as a measure of income. The results of the present study similarily suggest that not controlling for fixed effects leads to overestimation of underreporting. When focusing on results for households (which is common in the literature) and the permanent income specification, we find OLS estimates with a scaling factor of 1.19 (the same as for the PW and FS approaches), whereas fixed effects are clearly smaller; 1.12 and 1.16 for the PW and FS techniques, respectively. We expect that the difference between OLS and fixed effects estimates are explained by both self-selection and measurement problems due to differences in the donation behavior of self-employed and wage earners. Whereas the latter reflects estimation bias, the former may be seen as adding to the information on the causes behind income underreporting.

Given these results, we are inclined to conclude that controlling for fixed effects is important for measuring the magnitude of the income underreporting problem. As governments seek to obtain precise information about the extent of of the problem, one implication is that analysts should put more effort into gaining access to panel data when exploring the issue. In this perspective, we hope and expect that more countries will follow the examples of the Nordic countries and produce large-scale panel datasets. We argue that this is important also from an econometric identification perspective.