Introduction

Income pooling within the household means that the individual consumption of each member does not depend on the individual income contributions to the household budget. Therefore, a shift in the income contribution share should not alter individual consumption—given that the household budget stays constant. Income pooling within couple households does not only have important implications for the consumer behavior and the intra-household labor supply allocation, but it also determines the needs and the equivalence scale of the household. These issues may have consequences for the design of social benefits and for poverty politics. But the presence or absence of income pooling is also relevant for income taxation. In Germany, married couples are treated as a single tax unit with joint assessment and the offsetting of income differences between spouses. This treatment implies perfect income pooling of the spouses and grants, in a progressive tax system, a lower average tax rate for the couple than individual tax assessment (in the presence of income differences between the spouses). In consequence, marginal tax rates are also shared within the couple, which would be justified in terms of the ability-to-pay principle of taxation (“Leistungsfähigkeitsprinzip”) in the presence of income pooling. If income pooling is violated, the marginal tax rate for the second earner would be too high, while it would be too low for the first earner, which would also excessively distort the individual leisure-consumption decision of both partners.

The research question is particularly relevant for Germany, as the marginal net income surplus for the second earner (which is predominantly the woman) from additional labor supply is very low compared to other OECD countries (Becker, 2022). But many countries have similar joint taxation elements in their income tax assessment for married couples, for example France, Poland and up to a certain income threshold also the United States. Other countries like Austria or Sweden switched more or less completely to an individual tax assessment with the motivation of both labor market participation of women and individual income considerations. This topic has not yet been examined under an income pooling perspective but there is a large literature with respect to optimal labor supply conditions for married couples (e.g., Bick & Fuchs-Schündeln, 2017).

Income pooling in the theoretical economic framework means in its simplest form that the household faces a single utility function which is maximized by its members. This so-called “unitary” model (Becker, 1991) implies that total household budget is the relevant determinant for the individual consumption in the household while the individual contribution shares to the household income do not matter. However, Browning et al. (2010) showed theoretically that local income pooling can even exist in non-cooperative household models, when both partners contribute to a public good. Therefore, it is an empirical question which type of household dominates in reality and to which extent income pooling exists.

Since expenditures are typically observed only on the household level in common household surveys, data on personally allocable consumption expenditures is rarely available. So, many studies testing the income pooling hypothesis, also the one at hand, rely on expenditures on clothing and footwear (see Section “Literature Background”). These categories are often separately available for women, men and children in the data and can therefore be attributed to persons. Although the demand for clothing and footwear can only be a proxy for individual (private) consumption, a non-zero marginal effect with respect to changes in individual’s income share of total household income would reject income pooling within the household in a model setting which also provides the demand for a composite (public) good. The limitation of the empirical results is that these expenditure categories account only for a small share of the consumption budget, on average 4.6 percent in the used data set. Endogeneity issues in this context and possible confounders that limit the explanatory power of clothing and footwear consumption are discussed in this study.

On this basis, the paper contributes new evidence on testing the income pooling hypothesis for Germany with survey data on household expenditures within the framework of a Quadratic Almost Ideal Demand System (QUAIDS) (see Banks et al., 1997). The paper adds evidence from structural demand system estimation applied to pooled data of the income and consumption survey for Germany (Einkommens-und Verbrauchsstichprobe, EVS) for the years 2008 and 2013. Endogeneity issues of the household budget and women’s income contribution share are handled in an instrumental variables setting. The analysis also tries to identify price effects despite limited price variation in the data. This structural consumer demand model setup also allows to calculate justified tax differentials for the individual marginal income tax rates in terms of deviations from income pooling within the household (in which case the tax differential would be zero). Under the assumption that the relative marginal deviation between the individual’s income effect from household’s income effect on individual consumption is the same for all private goods, the estimated effect on clothing and footwear consumption could be used to calculate the tax differential. However, as these commodities represent only a small share of the consumption budget, derived tax differentials for policy recommendations should be based on broader measures of individual consumption that also involve durable goods and intertemporal consumption decisions. Despite this highly relevant question for tax policy, this specific connection between the estimable degree of income pooling and taxation has not been addressed yet in the literature to the best of my knowledge. Additionally, heterogeneity in the effects regarding the marital status and the presence of children in the household, as well as between former East and West German federal states is examined.

Similar to former results in the literature, the income pooling hypothesis is broadly rejected, which implies a relationship between the individual income contribution share and individual consumption (Bourguignon et al., 1993; Lundberg et al., 1997; Phipps & Burton, 1998; Ward-Batts, 2008). The magnitude of differences for married couples in the spouses’ income effects on individual consumption is in the range of the one found in Phipps and Burton (1998). However, I find significant differences regarding the marital status, the presence of at least one child in the household and whether the household is located in a former West or East German federal state. Married couples and couples with children are more closely related to the acceptance of the hypothesis than unmarried couples without children, a result that was previously only reported descriptively in a survey question study (Bonke & Uldall-Poulsen, 2007). Unmarried couples in former East German federal states are closer to income pooling than in former West German states. A negative effect of women’s income contribution on men’s clothing and footwear consumption is confirmed in all specifications, which in turn means a positive effects on women’s consumption and the composite good. While these results provide new insights to consumption behavior in couple households, the external validity transferred to total consumption must be interpreted with caution given the mentioned assumptions.

Transferred to total consumption the findings would suggest that the marginal tax rate differentials for couples with children should not be significantly different from zero, which means that joint tax assessment can be justified in terms of the ability-to-pay principle of taxation for these types of households. However, as the result is based only on clothing and footwear consumption, its application to actual tax policy is limited. Further research using broader measures of individual consumption is therefore necessary.

Literature Background

There is a class of papers that uses structural household consumption models to test the income pooling hypothesis with micro data. Bourguignon et al. (1993) tested the hypothesis within a structural model of consumption functions on French survey data and rejected it. Browning et al. (1994) reject the unitary household model with Canadian survey data in a structural framework and identify a household sharing rule of resources. They find that personal expenditures is significantly affected by the share of income a spouse contributes to total household income. Phipps and Burton (1998) test the pooling hypothesis in a demand system also with Canadian data and find mixed results for different expenditure data. Income pooling cannot be rejected e.g. for housing but on the other side, wives’ clothing consumption increases more strongly with their personal income and wives are more likely to spend their income on childcare than husbands. Bütikofer and Gerfin (2017) find that the sharing rule is significantly influenced by the ratio of wife’s hourly wage relative to husband’s hourly wage.

The study of Lundberg et al. (1997) belongs to a class of papers that uses a policy change as natural experiment. A child allowance was transferred to wives in the UK starting in 1977. The authors find strong evidence of a shift toward greater expenditures on women’s and children’s clothing due to the reform which is not in line with the pooling hypothesis. A more recent reform of child and working tax credits in 2003 in the UK was used by Fisher (2016) to analyze the effects on spending patterns. He finds significant positive effects on expenditures related to children. Ward-Batts (2008) combines a structural model with the exogenous variation of the UK reform in 1977 and confirms the findings of Lundberg et al.

Another strand in the literature uses survey questions that are directly related to pooling in the household. Bonke and Uldall-Poulsen (2007) exploit Danish survey data and find that most couples fully or partly pool their income. They also show that the probability of income pooling depends on several household characteristics as e.g. the duration of marriage and the existence of children in the household. Bonke and Browning (2009a) use the same data and report that two-thirds of couple households answers that they pool their resources. However, a small part of them indicates inconsistency if other answers are taken into account. Bonke and Browning (2009b) show that the most important correlate with the relative financial satisfaction of women in couples is their income share of household income.

Intra-household allocation of resources is also examined in experimental settings. Attanasio and Lechene (2002) use a welfare program designed as a field experiment in Mexico, which transferred money to mothers, to look at the outcomes of the correspondents. They find that women gained more influence in the decision-making process of the household due to the shift of resources. Beblo and Beninger (2017) use experimental data on 95 German couples and conclude that the hypothesis is rejected for more than a half of the couples, also noting that couples with higher household income and higher education are more likely to pool their resources.

The link between income pooling and taxation has not yet been directly examined empirically in the literature to the best of my knowledge. Related studies are e.g. Büttner et al. (2019) who focused on how the intra-household income distribution affects tax planning of a married couple. They find that couples tend to resign from a tax-minimizing treatment if it negatively affects the individual net income of the second earner (which is often the woman). There is also a vast indirectly related literature, which examines optimal income taxation for married couples but under the aspect of labor supply and time-allocation. An important result in this literature is the optimality of negative tax jointness, which means that the marginal tax rate of an individual should fall with its spouse’s income (e.g., Alesina et al., 2011; Gayle & Shephard, 2019).

Model and Empirical Strategy

An example for consistent income pooling within the household applies if one partner reduces working time for childcare, which reduces his or her income, while the other partner increases working time to compensate the income reduction. Individual consumption measured as expenditures for goods and services solely consumed by one partner should not be affected by this shift in income contribution (given that preferences for the goods do not change, which also means that childcare expenditure should not change because of the shift). This general test is embedded in a structural household demand system, controlling for the total consumption budget, prices and taste shifters. Additionally, the model is extended to allow for endogeneity of the individual income contribution share and the budget.

The Model

The structural framework for the test of the income pooling hypothesis is the Quadratic Almost Ideal Demand System (QUAIDS) (see Banks et al., 1997). The QUAIDS is often used in the literature to model consumer demand with household data and is based on price-independent generalized logarithmic (PIGLOG) preferences with Engel curves that are modeled as budget shares being a quadratic function of the log-budget.Footnote 1 It has the advantage of a flexible underlying utility function and allows imposing the restrictions of a consistent demand system like homogeneity and symmetry.

By applying PIGLOG preferences and separability, household’s decision over budget and demand can be theoretically modeled as a two-stage budgeting process (Deaton & Muellbauer, 1980). Therefore, household’s consumption decisions are made by setting the budget at the first stage and then consumption demand is chosen at the second stage. Household’s total labor supply, which is necessary to generate the consumption budget, is not modeled explicitly in this setting. Instead, education and wages of both partners determine the total budget and the individual contributions to it at the first stage. At the second stage, consumption demand respectively households’ preferences for each good are determined. This setting is econometrically modeled as an instrumental variables setting, where the second stage is the QUAIDS demand system. Endogeneity issues and the first stage is described in Subsection “Endogeneity”.

Introducing the demand system: For each \(i=1,\dots ,N\) goods and the corresponding budget shares \({w}_{i}\), the QUAIDS forms the following non-linear system of equations:

$$w_{i} = \alpha_{i} + \mathop \sum \limits_{j} \gamma_{ij} \ln p_{j} + \beta_{i} \ln \left[ {\frac{m}{a\left( p \right)}} \right] + \frac{{\lambda_{i} }}{b\left( p \right)}\left\{ {\ln \left[ {\frac{m}{a\left( p \right)}} \right]} \right\}^{2} + u_{i}$$
(1)

for \(i=1,..,N\) goods and \(j=1,..,N\) with consumption budget \(m\), prices \({p}_{i}\) and price indices

$$a\left(p\right)={\alpha }_{0}+\sum {\alpha }_{i}\ln{p}_{i}+\frac{1}{2}\sum \sum {\gamma }_{ij}\ln{p}_{i}\ln{p}_{j}$$
$$b\left(p\right)= \prod {p}_{i}^{{\beta }_{i}}$$

The model allows computing budget and (un)compensated price elasticities, as well as cross-price elasticities between different good prices. Although the system is non-linear due to its price indices, it can be estimated easily using the method of the Iterated Linear Least Squares Estimator (ILLE) which imposes conditional linearity on the parameters (Blundell & Robin, 1999).Footnote 2 Further explanatory variables that account as taste shifters for household consumption like demographic characteristics can be added to the equations.

Application

The test of the income pooling hypothesis is performed with administrative household microdata containing consumption expenditures (see Section “Data and Descriptives”). The budget shares \({w}_{i}\) in the QUAIDS model are calculated by dividing the expenditures of consumption good \(i\) by the total consumption budget within each household. Expenditures are in general observed on household level in the data and therefore not assignable to the individuals in couple households. The only expenditure categories which allow the assignment to individuals in the household are those for clothing and footwear. These categories are observed for adults by gender (explicitly for persons aged 14 or older) and additionally for children. Therefore, clothing and footwear consumption is the private good separately available for each partner, while all non-durable consumption will enter the model as the public good. Since clothing and footwear categories have only a small share in the consumption budget (see Section “Data and Descriptives”), this is a limitation of the analysis and the following policy implications. However, this limitation has been widely accepted in the literature as other personally assignable expenditures are typically not observed in classic consumption surveys (see e.g. Bourguignon et al., 1993; Browning et al., 1994; Lundberg et al., 1997; Phipps & Burton, 1998; Ward-Batts, 2008; Bose-Duker et al., 2021). Clothing and footwear consumption is therefore only a proxy for private consumption, but under the assumption that the deviation from household’s marginal income effect due to the individual marginal income effect is the same for all private goods, this allows to conclude about the degree of income pooling.

Note additionally that the preferences for the public good can vary between the partners. The marginal income effects in the model are always a results from mixed preferences in the household. The only additional assumption for the public good is, that both partners receive utility from all included goods, so that household’s utility is a weighted sum from individual utility functions. In general, aggregation of the included goods and services to a commodity group requires the assumption that preferences for them are weakly separable or the generalized composite commodity theorem is fulfilled (Lewbel, 1996). In each case, under income pooling, there should also be no effect of the individual income contribution on public good consumption.

Although the demand system can be modeled as detailed as the expenditure categories in the data allow, the underlying utility function with weakly separable preferences makes it possible to aggregate the single goods to commodity groups. This attribute is useful to keep the estimation feasible by reducing the number of price effects in the model because of the general problem with small variation in prices, which occurs in demand system estimation on pooled cross-sectional data. As the focus is on testing the income pooling hypothesis, there is also no major objection to restrict the demand system to three commodity groups.

I use quarterly data from two survey years, 2008 and 2013, which leaves the price variation to eight points in time. The aggregation of single expenditure categories to commodity groups features the attribute of computing Stone-Lewbel prices for the groups to increase the variation in prices (Lewbel, 1989). With the assumption of constant expenditure shares within a commodity group (implying Cobb–Douglas preferences in the group), the prices of the single goods are weighted with their expenditure shares in the commodity group. Since these shares vary for every household, price variation increases with the use of Stone-Lewbel prices which enables to identify price effects despite the inclusion of quarterly and yearly time dummies in the estimation. However, the small variation over time remains a challenge for the estimation, especially if the commodity groups consist of only a few goods. As a results, the standard errors of the estimated price effects are expected to be rather high.

For the unambiguous assignment of the spending, the approach focuses on mixed-gender couple households with no further adults and uses the personal expenditures on clothing and footwear to test the hypothesis. To avoid a wrong assignment of clothing and footwear expenditures for older children in the household to the categories for adults, the sample is restricted to households with children aged below 14 (or without children).

The budget \(m\) therefore contains the spending on non-durable consumption including the expenditures on clothing and footwear of both partners. The share of gross income contributed by the woman to the household gross income is introduced as \(s\). Accordingly, the share of income contributed by the man to the household income is \(1-s\). Income can thereby stem from different income sources, not only labor but also transfers, pension income, business income and so on. If income is pooled and individual consumption only depends on the household budget, commodity prices and taste shifters but not on the individual income contribution, then the parameter on \(s\) should not be significantly different from zero. The variable can be added to the system of equations \((1)\) in the same way as taste shifters and other control variables \({x}_{k}\) by entering equation \((1)\) in:

$$\alpha_{i} = \alpha_{i,0} + \mathop \sum \limits_{k} \alpha_{i,k} x_{k} + \varphi_{i} s$$
(2)

The hypothesis is obviously rejected if \({\varphi }_{women}\ne 0\) in the equation for women’s clothing and footwear. But as the adding-up restriction of demand systems is imposed in the estimation (\({\alpha }_{women}+{\alpha }_{men}+{\alpha }_{composite}=0\)), this would in principle allow \({\varphi }_{women}=0\), while \({\varphi }_{men}\) in the equation for men’s clothing and footwear and \({\varphi }_{composite}\) in the composite good equation are different from zero. But this result would imply that household consumption patterns depend on the income contribution share of women, which would also reject the hypothesis and therefore has to be tested. The parameters \({\varphi }_{i}\) measure therefore the deviation from perfect income pooling.

The demographic control variables consist of dummies for the number of children in the household, quartic polynomials of age of both partners, a dummy for marriage, dummies for agglomeration level of the place of residence, time dummies (quarter, year), dummies for the federal state and a dummy for owner-occupied housing.

Endogeneity

There are some potential sources of endogeneity in the model, which are addressed in an instrumental variables (IV) approach. A classic endogeneity issue in demand systems is related to the budget \(m\), which stands in the denominator of the expenditure share on the left-hand side of the equations and depends on the consumption preferences. The common solution for this issue can be implemented by using the disposable household income and its quadratic term as instruments in a Two-Stage Least Square (2SLS) type of estimator for a system of equations (see e.g., Blundell & Robin, 1999). The basic idea follows the augmented regression framework. In the first stage, budget \(m\) is regressed on the exogenous control variables \({x}_{k}\) and the instruments. Then, the residuals of this regression are added to every equation in the system via \((2)\) as additional control variables. Blundell and Robin (1999) show that under the assumption that the error term \({u}_{i}\) of \(\left(1\right)\) can be orthogonally decomposed into the residuals from stage one and a white noise term, the augmented regression estimator is identical to the 2SLS estimator. Since the assumption of exogenous labor supply within the household must be somehow relaxed in the approach at hand, the disposable income is not an appropriate instrument. Instead, the gross wages of both partners are assumed to be exogenous (given completely inelastic labor supply of the household) and taken as instruments. The specific modeling will be discussed later.

The exogeneity of the commodity prices may be challenged in an analysis like the one at hand. The small but existent time variation of the consumer prices used in the model stem from the years 2008 and 2013. There was a reform of the standard rate of value-added tax in Germany in 2009, which affected many commodities including the expenditures on clothing and footwear. The rate was increased by three percentage points from 16 to 19 percent, which can be seen as an exogenous variation in prices given an elastic supply curve. Despite this, the prices can at least be seen as measured with error with regard to the individual facing them and the construction of Stone-Lewbel prices with constant within-group shares to increase price variation may introduce endogeneity and measurement error. However, Hoderlein and Mihaleva (2008) showed that the use of Stone-Lewbel prices provides precise and economically plausible results compared to the use of standard consumer prices in demand models.

Another potential endogenous regressor can be seen in the women’s share of income contribution \(s\), as the preferences for clothing and footwear and therefore the household consumption pattern as a whole could affect the labor supply decision of the couple. For example, if women with strong preferences for clothing and footwear work more compared to their partners than women with lower preferences for these goods, the coefficient on the share of income contribution would be upwards biased and reject the hypothesis although the household members pool their income. Thus, women’s income contribution is endogenous to the labor supply allocation of both partners, which in turn can be endogenous to consumer preferences.Footnote 3 Another potential endogeneity issue stems from the matching of the couples. The preferences for clothing and footwear of partner A may influence the match with partner B, for example, because of partner B’s income. This could also distort the test on income pooling as the considered couples may systematically vary in their unobserved characteristics (see Lundberg et al., 1997).

The standard approach in household demand analysis assumes separability between consumer demand and labor supply (e.g., Banks et al., 1997). This is also a useful assumption in the type of literature, which examines the identification of the sharing rule—the shares of resources that are jointly or privately consumed in the household (e.g., Browning et al., 2013). Separability can be theoretically modeled as a two-stage budgeting process (Deaton & Muellbauer, 1980). I assume a constant labor supply of the household, but an endogenous distribution within the household. So, at the first stage, the within household labor supply decision is made, which determines leisure, non-durable and durable consumption of the household members (and savings, which are future consumption). At the second stage, non-durable consumption is allocated on goods and services. The separability assumption allows focusing on non-durable consumption and to treat the labor supply decision of the first stage independently. This assumption does not have to be relaxed in the analysis at hand to test the hypothesis. Total labor supply of couple households remains separable from consumption but the separability from the distribution of labor supply within couple households is relaxed. Therefore, it is needful to tackle the endogeneity issues linked to the share of gross income contributed by the woman \(s\).

At the first stage, I estimate the two equations:

$$s = \Phi^{11} X_{1} + \Theta^{11} Z_{1} + \upsilon_{1}$$
(3)
$$\mathrm{ln}(m) = {\Phi }^{21}{X}_{1}+{\Phi }^{22}{X}_{2}+{\Theta }^{21}{Z}_{1}+{\Theta }^{22}{Z}_{2}+{\upsilon }_{2}$$

where \({X}_{1}\) and \({X}_{2}\) are vector-subsets of the exogenous variables, which also enters the demand system at the second stage in \((2)\). The vectors \({Z}_{1}\) and \({Z}_{2}\) are subsets of the instrumental variables (excluded in the demand system). \(\Phi\) and \(\Theta\) are parameter vectors.

In the first equation, women’s contribution share \(s\) depends on the subset of control variables \({X}_{1}\) and the instruments \({Z}_{1}\). The instruments in \({Z}_{1}\) are dummies for the type of school graduation of the man and the woman, interaction terms between them, as well as dummies for the type of highest educational/vocational graduation of both partners and again their interactions.Footnote 4 The idea here is that education is separable from the preferences for non-durable consumption and can be left out in the demand system. However, it influences the share of income contribution ex-ante by bargaining position of the partners in the household labor supply decision and is also assumed to be correlated with the match of couples apart from preferences for consumption. The vector-subset \({X}_{1}\) contains all exogenous variables of the demand system except for the marriage dummy, the dummies for the number of children and the dummy for owner-occupied housing. These variables are denoted as vector \({X}_{2}\) and only appear in the second equation. The reason is that they are assumed to be potentially endogenous to the share of income contribution e.g. via the tax benefits of joint assessment of married couples in Germany if the share is far away from 0.5, which is also part of the research question and will be further examined in the heterogeneity analysis. Therefore, vector \({X}_{2}\) only appears in the budget equation at the first stage as the variables are in principle important for attributes that influence the household income and ultimately the consumption budget.

The instruments in \({Z}_{2}\) are man’s and woman’s gross wages in logs, which are empirically derived from the data on individual gross income and working time. A classic Heckman model (Heckman, 1979) is estimated separately for men and women to impute the wages.Footnote 5 The wages are left out of the first equation because women’s income contribution share and the wages are all derived from the information on individual gross income, which creates a dependency by construction.

The two equations are overidentified as there are much more instruments than endogenous variables and can be estimated by seemingly unrelated regressions (SUR) to have efficiently estimated standard errors for the F-tests of the instruments. The first stage can then be linked to the second stage, which is the demand system, in an augmented regression framework like the one used in Blundell and Robin (1999). Thus, the predicted residuals from \((3)\), \({\widehat{\nu }}_{1}\) and \({\widehat{\nu }}_{2},\) are included in \((2)\) to account for the endogeneity of \(s\) and \(m\). Tests on the exogeneity of \(s\) and \(m\) can be derived from the estimated coefficients on \({\widehat{\nu }}_{1}\) and \({\widehat{\nu }}_{2}\) in the demand system. This test on exogeneity is combined with a test for overidentifying restrictions and Shea’s partial R2 to further check for the validity of the instruments.

Data and Descriptives

The model is estimated with two pooled cross-sections of data from the income and consumption survey for Germany (Einkommens-und Verbrauchsstichprobe, EVS) for the years 2008 and 2013. This administrative data set is a representative sample of households in Germany containing detailed information on income and expenditures. Each survey year features about over 40,000 households. The households are observed for one quarter equally distributed over all four quarters and with quarterly income and expenditure information. While consumption expenditures are only reported at the household level, income information is available individually for every household member. Very rich households are not included in the data as it prevents households with a quarterly household net income of more than 18,000 euro per quarter to enter the sample. However, this should not have a great impact on the average marginal effects regarding a consumption analysis.

As already described in the previous section, I focus on the demand analysis of non-durable consumption and explicitly the expenditures for clothing and footwear expenditures, which are observed for women and men separately. Expenditures on durables are manually compiled from all goods and excluded from the budget \(m\). The non-durable expenditures contain the categories food, drinks, tobacco, heating and electricity, mobility, articles of daily use, health expenditures, childcare, spending for leisure activities and other smaller items. Housing expenditures are also included and can either be actually paid rents without heating and electricity costs or imputed rents for owner-occupied houses and flats. The imputed rents are calculated by the German Federal Statistical Office (Statistisches Bundesamt) and already implemented in the EVS data sets.

Price data is supplemented to the survey data with official consumer price indices provided by the German Federal Statistical Office. While most expenditure categories in the EVS data refer to the two-digit and the four-digit price indices, especially the categories for men’s and women’s clothing and footwear are on a more disaggregated level but without an exact match in the available price data. I therefore use available ten-digit prices as proxies for these categories.Footnote 6 The monthly price indices are averaged over the quarter to fit the quarterly expenditure data. Afterwards, household-specific Stone-Lewbel prices are constructed with the price data by weighting the prices with the respective expenditure shares for every commodity group to increase price variation (see previous section for details). There are eight points in time, which create price variation (quarterly data for two years). Additional regional price variation only comes from the prices for housing, which are differentiated available by federal states.

The basic sample of the analysis is restricted to mixed-gender couple households (who declare in the data to be a couple) with exactly two adult persons and optionally children below the age of 14. Additional criterion is the presence of income from occupation in the household from at least one partner. This restriction excludes the households from the analysis, which completely rely on transfer income. The reason is that these households could be systematically different in their preferences and consumption decision-making from households with at least one occupied partner. I end up with 29,461 households, 15,367 from the year 2008 and 14,094 from the year 2013.

The existence of zero expenditures in households could potentially be seen as a data issue. In the crucial equations for clothing and footwear expenditures, the share of zeros is 6.9 percent for women, while it is 15.4 percent in the equation for men and the composite good has no zero-expenditures. A large share of zeros of the dependent variables because of a corner solution (zero consumption) can result in a biased estimation that can be fixed with a censored regression model. But in this case, the existence of zero expenditures in clothing and footwear is rather a problem of errors in variables due to infrequency of purchase. Everybody purchase clothing sometime, but some individuals are not observed purchasing. So, if the error term in the IV approach of the demand system is uncorrelated with the explanatory variables, this measurement error in the dependent variable would yield an imprecise but still consistent estimate of the parameters.Footnote 7

Table 1 shows the descriptives of the sample. The total budget of non-durable consumption expenditures is about 2608 euro per month at the mean, which is 73.6 percent of total spending on consumption in the sample. The average gross income of the couple households is 4627 euro per month, of which the women’s contribution share is 34.2 percent.

Table 1 Sample descriptives

A women’s contribution share of zero is found in about ten percent of the households. In principle, this should not be a problem in the first stage of the 2SLS approach if there is sufficient correlation between \(s\) and the instruments. However, since the households with a women’s income contribution share of zero may systematically vary in their preferences for consumption, the model is also run solely with the sample, in which both partners work, to compare the estimated coefficients as a robustness check, which can be found in Appendix E.Footnote 8

Results

The QUAIDS model is firstly estimated with the ILLE ignoring endogeneity issues regarding the women’s income contribution share \(s\) and the budget \(m\). The results are discussed and compared to those of the 2SLS implementation in the augmented regression setting, which is presented secondly. Parameters of interest are, besides the one for \(s\), the price and budget elasticities for the demand system, which can be derived from the estimated parameters.

Results for the QUAIDS Model (Neglecting Endogeneity)

Table 2 shows the result for the demand system by using the ILLE ignoring endogeneity issues and imposing homogeneity in prices. While budget effects are highly significant in all three equations, the price effects are only significant in the equations for women’s clothing and footwear and for the composite commodity group. Importantly, the coefficients \({\varphi }_{i}\) for women’s income contribution share are significant in all equations with a positive sign for women’s clothing and footwear and a negative sign for both other commodity groups. The system-wide joint test of the coefficients is also highly significant with a chi-squared statistic of 211 (with two degrees of freedom because one equation must be dropped). This result implies a rejection of the income pooling hypothesis because a higher income contribution share of the woman means a higher consumption of women’s clothing and footwear and a lower consumption of men’s for a given household budget.

Table 2 Estimation results for the demand system neglecting endogeneity

To classify the quantity of the effect, it can be evaluated at the mean of the expenditure shares. A switchover from zero income contribution to being the sole income earner would increase the expenditures on women’s clothing and footwear by 25 percent or nearly 20 euro per month. Simultaneously, consumption of men’s clothing and footwear drops by about 19 percent and the composite good by 0.4 percent. So, the substitution happens mostly between the two private goods in this model.

Other control variables are left out in Table 2 but are included in all equations.Footnote 9 Importantly, the controls for the presence of children show consistent signs in a way that more children in the household reduce the private consumption of both partners. A hint on the differences in preferences between married couples and unmarried couples is the significant coefficient of the dummy for marriage. In this model, private consumption is lower for married couples, but further heterogeneity is evaluated in Subsection “Heterogeneous Effects for Married Couples and the presence of Children”.

Price and budget elasticities of the demand system can also be derived from the estimated parameters of Table 2.Footnote 10 They are presented in Table 10 in the Appendix.

Results for the QUAIDS Model with Endogeneity

The model can be augmented by allowing for endogeneity of the women’s income contribution share and the expenditure budget. Following the approach presented in Section “Model and Empirical Strategy”, the two endogenous variables are regressed in a first-stage SUR model on the instruments. The predicted residuals from the first stage are subsequently inserted in the QUAIDS model.

The estimation results from the first stage are presented in Table 3. Since there are in total 48 dummies and interaction terms of the instruments school graduation and highest educational/vocational graduation of both partners, the table is shortened by leaving out the results for the interaction terms.Footnote 11 Most instrumental dummies are clearly significant, although an interpretation is not meaningful without the interaction effects. The wages, which only appear in the budget equation, are also strongly significant. The F-tests in both equations on joint significance of the instruments do not indicate a weak instruments problem. Additionally, Shea’s Partial R2 (Shea, 1997) for the women’s income contribution share \(s\) is about 0.086 suggesting a properly high correlation with the instruments.

Table 3 Estimation results for the simultaneous equation model (first stage)

The test of the income pooling hypothesis in the demand system with endogenous regressors remains a significant rejection, although the effect of women’s income contribution share on women’s clothing and footwear consumption is smaller compared to the first model (Table 4). The direct effect from a shift in the contribution share from zero to one is now 17.4 percent more consumption compared to 25 percent in the first model but still significant at the 5 percent level. The coefficients are also still jointly significant at the 1 percent level in the system of equations. While the significant effect on the consumption of the composite good vanishes, the negative one on men’s clothing and footwear becomes even more negative inducing a strong rival relationship between the two private goods.

Table 4 Estimation results for the demand system with endogenous budget and endogenous women’s income contribution share

However, the coefficients \({\varphi }_{i}\) do not differ substantially from those of the first model. Accordingly, the test on exogeneity of women’s income contribution share, which is the test on the joint significance of the included residuals, is only significantly rejected at the 10 percent level with a p-value of 0.06. A somewhat different picture shows the test on exogeneity of the budget, which is strongly rejected.

The corresponding elasticities are presented in Table 5, where the dependent variables (quantities of demand) can be found in the lines and the columns refer to the exogenous variables (the budget and the prices). The budget elasticities are highly significant in all equations indicating that the demand for clothing and footwear is budget elastic. They also vary significantly from those of the model without endogeneity (see Table 10 in the Appendix). In contrast to the parameters, the own-price elasticities have to be tested against the null of − 1, which would indicate an estimated own-price effect of zero and Cobb–Douglas preferences. The compensated own-price elasticities are consistently negative for all commodity groups with a high own-price elasticity for women’s footwear and clothing of − 2.0. Substitutional relationships are found between all goods, whereas significant symmetric effects are only confirmed between women’s clothing and footwear and the composite good.Footnote 12

Table 5 Estimated elasticities for the demand system with endogeneity

Heterogeneous Effects for Married Couples and the Presence of Children

While the income pooling hypothesis is rejected by estimating one marginal effect for all couple households, there are still important questions left open regarding the heterogeneity in the household context. To deal with these questions, the model will be extended with interaction effects for married status and the presence of children in the household. Additionally, differences in the effects between the former East and West German federal states are explored. The underlying model is the augmented regression, which accounts for potential endogeneity and allows in principle for adding interaction terms without estimating a different first-stage equation.Footnote 13 An important limitation of the sub-group results appears regarding the interpretation. The differences must be interpreted rather descriptive than causal as the endogeneity of fertility choices, or the marital status is not addressed. So, if a couple gets a child, it does not necessarily change pooling behavior with the estimated coefficients. In general, different coefficients between the groups only signal different behavior but the groups could also differ before the event of a birth.Footnote 14

Four different interaction models are estimated with different specifications and presented in Table 6. The first model incorporates an interaction term for married couples allowing women’s income contribution to have varying effects for married and unmarried couples. The effects on both “private” goods are significantly higher for unmarried couples. This result is confirmed with the system-wide chi-squared test which has a much higher statistic for unmarried couples (main coefficient), although the hypothesis remains rejected for married ones (main coefficient and interaction effect combined). Married couples are thus nearer to the theoretical construct of pooling.

Table 6 Heterogeneous effects for married couples and the presence of children

Interestingly, there are substantial differences between the former East and West German federal states. The effect on women’s clothing and footwear is smaller for unmarried couples in East Germany compared to West Germany, while the effect on men’s clothing and footwear is much larger. Consequently, there is more substitution between the composite good and men’s private good. The test signalizes a less strong rejection in the East. However, the status of marriage reduces the effects in both regions bringing about a low chi-squared statistic.

A similar effect compared to the one found for married couples appears in the model with an interaction term for the presence of at least one child in the household (below 14 years old). There is no effect left on women’s clothing and footwear consumption but a high effect on men’s. Without children the substitution happens almost exclusively between the private goods. But contrary to the model with a term for marriage, the expenditure shares of the private goods are both significantly lower with the presence of at least one child which indicates a large preference shift toward the composite commodity group. This is plausible because the expenditures on goods for children are contained in this group. However, it could be the case that all pure privately consumed goods for the adults are equally devaluated with the presence of children, which means there is still explanatory power to the test indicating substitution between men’s consumption and the composite good dependent on women’s income contribution. The chi-squared statistic has a similarly low value as in the models with an interaction term for marriage.

The combination of marriage and the presence of at least one child confirms the found results. Interestingly, the constellation which is nearest to perfect income pooling according to the chi-squared test is an unmarried couple with at least one child. Though, this result is mainly driven by higher standard errors, as the differences between this case and the combination with marriage are not significant.

The Link to the Taxation of Couple Households

The estimated parameters from the structural demand system on income pooling can be used to evaluate a justified differential in the marginal income tax rates between the partners. If income pooling is hurt, marginal tax rates should not be equal in terms of the ability-to-pay principle of taxation (“Leistungsfähigkeitsprinzip”). The marginal tax rate for the second earner would be too high, which would also distort the individual leisure-consumption decision. This follows from the fact that the second earner does not participate accordingly from an income increase of the first earner as what household’s utility function would suggest.

Under the assumption that parameters \({\beta }_{i}\) and \({\lambda }_{i}\) measure household’s “true” income effects on consumption of the private goods and the public good under the assumption of perfect income pooling, the parameters \({\varphi }_{i}\) measure its deviations from this assumption. Under perfect income pooling, equitable income taxation with joint assessment yields a shared marginal tax rate. In this case, individual as well as composite good consumption does not depend on who contributed to the household income but only on the aggregate. If budget \(m=({y}_{W}+{y}_{M})(1-t)\), where \({y}_{W}\) is woman’s gross income, \({y}_{M}\) is the man’s gross income and \(t\) is the average tax rate, then the marginal effect of one more euro of woman’s income on expenditure for good \(i\) in the QUAIDS isFootnote 15:

$$\frac{{\partial p_{i} q_{i} }}{{\partial y_{W} }} = \mu_{i} + w_{i}\, {\text{with}}\,\mu_{i} = \beta_{i} + \frac{{2\lambda_{i} }}{b\left( p \right)}\ln \left( {\frac{m}{a\left( p \right)}} \right)$$
(4)

If income pooling does not hold, then expression (4) applied for woman’s private good becomes:

$$\frac{{\partial p_{W} q_{W} }}{{\partial y_{W} }} = \mu_{W} + w_{W} + \varphi_{W} \frac{{y_{M} }}{{y_{W} + y_{M} }}\,{\text{where}} \frac{{y_{M} }}{{y_{W} + y_{M} }} = s_{M}$$
(5)

Note that \({s}_{M}\) is men’s share of gross income and coefficient \({\varphi }_{W}\) is here the marginal effect of the woman’s income share in the demand equation for women’s private consumption. The marginal effect of one more euro of man’s income on the expenditure of woman’s private good would just be:

$$\frac{{\partial p_{W} q_{W} }}{{\partial y_{M} }} = \mu_{W} + w_{W} - \varphi_{W} \frac{{y_{W} }}{{y_{W} + y_{M} }}\,{\text{where}} \frac{{y_{W} }}{{y_{W} + y_{M} }} = s_{W} = 1 - s_{M}$$
(6)

So, parameter \(\varphi\) represents the wedge in the marginal propensity to consume between the partners compared to the unitary model with perfect income pooling. The justifiable relative differential in marginal income tax rate can then be calculated in the two private-good-equations by dividing the estimated values \({\widehat{\varphi }}_{i}\) with \(i\in W,M\) by the respective marginal income effects given the income pooling assumption, which can be derived from Eq. (4). Given \({\widehat{\varphi }}_{W}\ne -{\widehat{\varphi }}_{M}\), there is also a distortion of the contribution to the composite (public) good compared to the situation with income pooling, which could also be targeted instead of the distortion of private good. This distortion is expected to be reduced with the respective tax differentials concerning the private goods but does not necessarily have to disappear. However, firstly, since the composite good is no pure public good, this measure is less precise in detecting a violation of income pooling. Secondly, the assumed policy goal in this context is only an equitable taxation in terms of the ability-to-pay principle, thus further violations of income pooling are not policy relevant.Footnote 16

A positive and significant \({\widehat{\varphi }}_{W}\) would request a lower marginal tax rate for the woman compared to the shared tax rate because of the on average lower individual gross income of women. This would yield a reallocation of the leisure-consumption decisions of both partners within the household (total labor supply remains unchanged per assumption).Footnote 17 If this adjustment of the individual marginal tax rates also results in a lower average tax for the second earner and a higher one for the first earner, it would imply an external redistributional effect that has also consequences for the intra-household allocation of resources.

The correspondent tax differentials for the estimated coefficients of the analysis can be found in Appendix F. Accordingly to the heterogeneous effects found in the previous section, the findings would suggest that the marginal tax rate differentials for couples with children should not be significantly different from zero, which means that joint tax assessment can be justified in terms of the ability-to-pay principle of taxation for these types of households. However, as clothing and footwear consumption represents only a small share of total private consumption, it serves only as proxy for the latter. Therefore, policy recommendations based on the estimated parameters in this analysis have limitations. Further research using broader measures of private consumption that also involve intertemporal consumption decisions is necessary.

Conclusion

The validity of the income pooling hypothesis has important implications for social and tax policy as well as for inequality research. In this paper, I provide a test of the income pooling hypothesis using administrative cross-sectional survey data on German couple households. I use information on expenditures and individual incomes to test the hypothesis in a structural consumer demand system. While most expenditures are only observed at the household level in the survey, expenditures on clothing and footwear are separately available for women and men and can be taken as proxies for individual consumption within the couple household. The limitation is that these expenditure categories account only for a small share of the consumption budget, on average 4.6 percent in the used data set.

According to the hypothesis, household consumption decisions should only depend on the household budget, prices and taste shifters. The individual income contribution share should therefore have no effect on consumption patterns, which can be tested within the framework of a Quadratic Almost Ideal Demand System (QUAIDS). I expand the model by controlling for endogeneity of the expenditure budget and the individual income contribution shares in an instrumental variables approach. Additionally, heterogenous effects are evaluated according to household attributes.

Although the hypothesis is broadly rejected, which implies a relationship between the individual income contribution share and individual consumption, there are significant differences regarding the marital status, the presence of at least one child in the household and whether the household is located in a former West or East German federal state. Married couples and couples with children are more closely related to the acceptance of the hypothesis than unmarried couples without children. Unmarried couples in former East German federal states are closer to income pooling than in former West German states. A negative effect of women’s income contribution on men’s clothing and footwear consumption is confirmed in all specifications, which in turn means a positive effects on women’s consumption and the composite good.

The approach in principle allows to calculate tax differentials for the individual marginal income tax rates accounting for the distortion of the private good consumption. The findings would suggest that the marginal tax rate differentials for couples with children should not be significantly different from zero, which means that joint tax assessment can be justified in terms of the ability-to-pay principle of taxation for these types of households. However, as the result is based only on clothing and footwear consumption, its application to actual tax policy is limited. Further research using broader measures of individual consumption is therefore necessary.