Introduction

The relationship between family size and food consumption has been studied in the literature, using cross-sectional data. Specifically, consumption of food is a very relevant analysis, given that it is usually employed to measure living standards of families (poverty and inequality comparisons) and, consequently, the relationship is very important from a policy perspective.

The theoretical foundations of the relationship between family size and food consumption were initially provided by Barten (1964), whose model was subsequently employed by Deaton and Paxson (1998). The Barten (1964) model assumes that families consume both public and private goods and that the most plausible source of economies of scale is the presence of public goods that can be shared within the family (e.g., housing or fuel). In the presence of public goods, a couple family is better-off than a single family with the same per capita income or expenditure, because the resources made available by sharing family public goods can be used to acquire a greater quantity of both private and public goods. When there are family scale economies associated with public goods, larger families should have comparatively higher per capita private consumption, such as food, in comparison to smaller families. However, this theoretical prediction has not been confirmed by the empirical literature. In fact, much of the evidence is persistent in confirming the opposite; i.e., food expenditure per capita falls as the number of family members rises (the Deaton and Paxson puzzle).

Several authors have explored the relationship between family size and per capita food consumption using cross-section data. Gan and Vernon (2003) postulated the possibility that food is more public than other private goods (e.g., returns to scale in food consumption, particularly in food preparation). In this sense, the food share increases with family size, relative to more public goods and decreases relative to more private goods. However, these authors provide no evidence of greater economies of scale in food than in clothing or transportation. Crossley and Lu (2018) develop a slight modification of Barten’s model, adding home production of food and two types of food that differ in their preparation time. The authors illustrate the case in which economies of scale in food preparation is plausible, but the effect is difficult to identify due to data limitations. Other unsuccessful explanations of the paradox between the theory and the evidence are bulk discounts (Abdulai, 2003; Gibson & Kim, 2018; Griffith et al., 2009), errors in food expenditure data correlated with family size (Brzozowski et al., 2016; Gibson, 2002; Gibson & Kim, 2007), the possibility of different wastage schemes depending on size for different families, the effects of the price elasticity of food, and the effects of the quality of food (Deaton & Paxson, 1998).

Given that prior evidence using cross-sectional data is far from reconciled with the theory through applied analysis, we explore the attractiveness of a rich panel data on consumption (a family can be observed up to a maximum of 27 consecutive quarters) to analyze the relationship between family size and food consumption. We base our work on the Barten Model and estimate the Quadratic Almost Ideal Demand System (QUAIDS) to derive own-price and income elasticities. The attractiveness of our Spanish panel data on consumption comes from the availability of quarterly consumption data, which allows us to explore two additional methodological channels to reconcile theory and evidence that have not been previously studied in depth.

First, we observe that the control of family unobserved heterogeneity (time differences in tastes for food across families) does not provide evidence according to the theory. Second, we employ our long-time series of family data to estimate a Quadratic Almost Ideal Demand System model (QUAIDS, Banks et al., 1997). The income and price elasticities allow us to test the underlying assumption of the economies of scale conditions derived by Deaton and Paxson (1998, 2003); that is to say, the extent to which the own price food elasticity is less than the income elasticity. We test this by assuming the existence of two goods, i.e., food and housing, but in the context of several commodities. The estimation of the QUAIDS allows us to find estimated elasticities agreeing with the theory for 585 observations, with these being characterized as having per capita income in the two highest centiles of the distribution.

The remainder of the paper is organized as follows. Sect. “Theoretical Model of Family Size and Consumption” describes Barten’s model. Sect. “Data and Baseline Estimations” describes the panel data and shows baseline estimations. Sect. “Explorations of the Relationship Between Family Size and Food Consumption Using Panel Data” presents possible explanations of the paradox by using our panel data, i.e., applying single-equation fixed effects estimators to the Engel curves and estimating complete demand systems. Finally, Sect. “Conclusions” presents our main conclusions.

Theoretical Model of Family Size and Consumption

Barten’s model (Barten, 1964), used by Deaton and Paxson (1998), assumes that families consume both public and private goods, and that economies of scale are generated by the presence of public goods that can be shared within the family. According to this model, a family with n adult members allocates its total expenditure x across two goods, a private (e.g., food f) and a public good (e.g., housing h), optimizing the following problem:

$$\underset{{q}_{f}{q}_{h}}{\text{max}}nv\left(\frac{{q}_{f}}{{\phi }_{f}\left(n\right)},\frac{{q}_{h}}{{\phi }_{h}\left(n\right)}\right)$$
$$s.t. { p}_{f}\left(\frac{{q}_{f}}{n}\right)+{ p}_{h}\left(\frac{{q}_{h}}{n}\right)=\frac{x}{n}$$
(1)

where \({\phi }_{f}\left(n\right)\) and \({\phi }_{h}\left(n\right)\) are the scaling functions showing some commodity-specific scale-economies so that effective family size for the consumption of each good is not n but rather \({\phi }_{i}\left(n\right)\), i = f, h. For pure private (f) and pure public (h) goods, the utility function is \(nv\left(\frac{{q}_{f}}{n},{q}_{h}\right)\). Solving (1), the per capita food demand function is

$$\frac{{q}_{f}}{n}=\frac{{\phi }_{f}\left(n\right)}{n}{g}_{f}\left(\frac{x}{n},\frac{{p}_{f}{\phi }_{f}\left(n\right)}{n},\frac{{p}_{h}{\phi }_{h}\left(n\right)}{n}\right)$$
(2)

Taking logs and differentiating with respect to ln n we get the condition for per capita food consumption to increase with family size, holding constant the per capita expenditure:

$${\Upsilon }^{*}=\frac{\partial \text{ln}\left({p}_{f}{q}_{f}/n\right)}{\partial n}={\sigma }_{h}\left({e}_{fx}+{e}_{ff}\right)-{\sigma }_{f}\left(1+{e}_{ff}\right)$$
(3)

In Eq. (3), \({e}_{ff}\) and \({e}_{fx}\) are the own-price and income elasticities of food; \({\sigma }_{i}=1-\frac{\partial \text{ln}{\phi }_{f}\left(n\right)}{\partial \text{ln}n}\) for i = f, and h is the commodity-specific (technological) economy of scale measure. A pure private good has \({\sigma }_{i}=0\) and a pure public good has \({\sigma }_{i}=1\). \({\Upsilon }^{*}\) is the elasticity of per capita food consumption with respect to family size, and there are economies of scale when the condition given in (3) is greater than zero:

$${\Upsilon }^{*}={\sigma }_{h}\left({e}_{fx}+{e}_{ff}\right)-{\sigma }_{f}\left(1+{e}_{ff}\right)>0$$
(4)

At constant per capita expenditure, per capita food consumption increases with family size when (1) food has limited substitutes, i.e., \({e}_{ff}\) is small in absolute value and lower than \({e}_{fx}\); (2) food has significantly less economies of scale than housing, i.e., \({\sigma }_{f}/{\sigma }_{h}\) is small. In other words, in the basket of goods of a family, there are private commodities like food with low own- and cross-price elasticities where the income effect dominates, and thus the per capita consumption of this good should increase with family size.

Even under the assumption that since there are few substitutes for food, cross-goods interrelationships could hide some effect that would solve the puzzle between the theory and the evidence and, consequently, Horowitz (2002) and Deaton and Paxson (2003) reformulate the previous expression to take into account the situation when the number of commodities is more than two and there are substitution and complementary effects among them, and between food and the other goods in the context of demand systems:

$$\bar{\sigma }\left( {e_{{fx}} + e_{{ff}} } \right) - \sigma _{f} \left( {1 + e_{{ff}} } \right) < \sum\limits_{{k \ne f}} {\bar{e}_{{fk}} } \left( {\sigma _{k} - \bar{\sigma }} \right)$$
(5)

where \({\overline{e} }_{fk}\) is the compensated elasticity of the demand for food with respect to the price of good k, and \(\bar{\sigma }\) is the budget share weighted average of the economies of scale parameters for goods in the system, except food.

Data and Baseline Estimations

Data

We use the Spanish Permanent Consumption Survey (EPC), from the second quarter of 1977 to the fourth quarter of 1983, carried out by the Spanish National Statistics Institute (INE). Despite that it is an old dataset, it is very useful in the current context as it covers a high time dimension, as a family can be observed up to a maximum of 27 consecutive quarters. The survey contains expenditure information for 130 items and an important range of socio-demographic variables. To the best of our knowledge, this is the longest panel on family consumption expenditures, covering a wide range of commodity groups. Almost two-thirds of the families remain in the survey for at least 3 years (12 quarters), and this allows us not only the estimation of models with the possibility to control for unobserved heterogeneity, but also to follow families experiencing changes in their structure (e.g., family size).

We select families with fewer than 7 missing quarters on food expenditure, using the largest sample period for each family. We omit families with no food expenditure information. We restrict the sample to families observed for at least 12 quarters, without missing values for the relevant variables, so that we have changes in family size for some families. This sample selection leaves us with an unbalanced panel (Tables 1 and 2) of 1,452 families and a total of 28,926 observations.

Table 1 Number of families per period remaining in the sample
Table 2 Number of families per quarter

Baseline Estimations

We estimate Engel curves using both a non-parametric and parametric approaches. For the non-parametric approach, we closely follow the empirical strategy of Deaton and Paxson (1998) and we fit Engel curves for families of different sizes and composition, estimating the following equation:

$$\int E\left({w}_{f}|i,z\right)g\left(z\right)dz$$
(6)

where \({w}_{f}\) is the share of food consumption, i is an index describing the composition of the family, z is the log of per capita expenditure, and g(z) is a nonparametric kernel estimate of density, so (6) adjusts food share averages for each type of family, weighted by density of per capita expenditure.

Figure 1 shows kernel estimates of food shares for 1-adult and 2-adult families with no children. Observations are weighted by density of the per capita expenditure. We include confidence intervals (defined at the 95% level) to see if the differences between the two kernel distributions are statistically significant. We observe that in most values of the total per capita expenditure, the average food share is larger in smaller families. Thus, when we analyze the relationship between family size and food shares by comparing 1-adult and 2-adult families at constant per capita expenditure, we see that the food share (and therefore the per capita expenditure on food) declines with family size, which confirms the Deaton-Paxson puzzle. Figure 2 show similar patterns, although we now consider more types of families. When we consider childless families from 1 to 4 adults, we observe that as the number of adults increases, at constant per capita expenditure, the average food share decreases, especially between 1-adult, 2-adult, and 3-adult families. This also confirms the Deaton-Paxson puzzle.

Fig. 1
figure 1

Non-parametric Engel curves, 1-adult and 2-adult families with no children. Data comes from the Spanish Permanent Survey of Consumption EPC), years 1977 to 1983. Food shares are defined as the expenditure on food out of total expenditure. Observations are weighted by density of the per capita expenditure

Fig. 2
figure 2

Non-parametric Engel curves by number of adults, families with no children. Data comes from the Spanish Permanent Survey of Consumption EPC), years 1977 to 1983. Food shares are defined as the expenditure on food out of total expenditure. Observations are weighted by density of the per capita expenditure

Table 3 provides average food shares (and standard errors) for different types of families. In the first set of families, we compare families with no children, and as the number of adults increases, the food share decreases. Similar patterns emerge when we compare one adult-one child families versus two adults-two child families, as the average of food share decreases from 0.50 to 0.47. When we compare one adult-two child families versus two adults-four child families, the average food share decreases from 0.59 to 0.51, respectively. This points to larger families having lower per capita food expenditure, indicating that, at constant per capita expenditure, the food share, and, therefore, the per capita expenditure on food, declines with family size, thus confirming the Deaton-Paxson puzzle.

Table 3 Average food share for each type of family

We also apply a parametric approach for the estimation of Engel curves. To that end, we follow the specification proposed by Deaton and Paxson (1998), as follows:

$${w}_{hf}=\beta \text{ln}\frac{{x}_{i}}{{n}_{i}}+\gamma \text{ln}{n}_{i}+\sum_{k=1}^{K-1}{\eta }_{ik}\frac{{n}_{ik}}{{n}_{i}}+\zeta {V}_{i}+{\pi }_{t}+{\varepsilon }_{i}$$
(7)

where \({w}_{if}\) is the food share of total consumption by family i, \(\frac{{x}_{i}}{{n}_{i}}\) is per capita expenditure with \({n}_{i}\) being the size of family “i”, and \(\frac{{n}_{ik}}{{n}_{i}}\) are the ratios to family size of the number of males and females of different ages (ratio of male children aged 0–5, 6–11 and 12–17, ratio of female children aged 0–5, 6–11 and 12–17, the ratio of males aged 18–64 and the ratio of females aged 18–64). \(V\) is a vector of socio-demographic controls, including dummy variables to control for whether the family head (FH) works in an agricultural sector, is a blue collar worker, has primary education, FH has tertiary education, owns the home, and lives in a rural area (see Table 4 for summary statistics of the socio-demographic characteristics of the households). We also include aggregate effects common to all families through annual and quarterly dummies.

Table 4 Summary statistics of explanatory variables

We estimate Eq. (7) using Ordinary Least Squares (OLS) and Instrumental Variables (IV), since consumption (on food and total consumption) could be measured with errors and thus correlated with unobservable characteristics in (6). We follow Gibson (2002) and use proxies for income as instruments for total expenditure. We use the average number of school years of all adults in the family, the age of the head of family, a dummy variable that indicates whether the family has a second home, and the per capita expenditure of the previous year.

Columns 1 and 2 of Table 5 show the results of estimating Eq. (7) with both OLS and IV for the whole sample of families. Family size appears to exert a negative and statistically significant effect on the food budget share, holding per capita expenditure constant. An increase of 1 per cent in the logarithm of family size decreases the food budget share by 6 per cent. IV estimates show the same pattern as OLS estimates. Over- and under-identification tests indicate the adequacy of the instruments. Thus, these results point to the Deaton-Paxson puzzle. We estimate Eq. (6) for a sample of families with more than one adult (Columns 3 and 4 of Table 5) and with a sample of families where the head is employed (trying to control for separability between consumption of food and labor supply), and we observe very similar results, since the puzzle is reflected in the results.

Table 5 Estimates of the food Engel curve

Explorations of the Relationship Between Family Size and Food Consumption Using Panel Data

In this section, we explore the panel structure of our data to analyze the relationship between family size and food consumption. Panel data allows us to consider the issue of the unobserved heterogeneity of families and, additionally, to estimate demand systems that allow us to derive income and own-price elasticities for testing Eq. (4).

The Role of Unobserved Heterogeneity

Deaton and Paxson (1998) acknowledge that larger families may have different tastes for food. If tastes for food differ in larger families, this could explain the puzzle. However, none of the prior studies have been able to consider differences in tastes for food across families, given that those studies use a cross-section of families and thus cannot control for the unobserved heterogeneity of families. Given that we have a panel of families, we can apply panel data models to parametrically estimate the Engel curve. To that end, the previous parametric specification of the Engel equation (Eq. 6) is improved to allow for time-invariant unobserved family heterogeneity.

We determine this over time, considering the unobserved family heterogeneity if food expenditure per capita falls as the family size rises, when per capita expenditure is held constant. To that end, we use the long-time panel of families (T = 27), and estimate the following augmented Engel curve:

$${w}_{iht}={\alpha }_{i}+{\beta }_{i}\text{ln}\frac{{x}_{ht}}{{n}_{ht}}+{\gamma }_{i}\text{ln}{n}_{ht}+\sum_{k=1}^{K-1}{\eta }_{htk}\frac{{n}_{htk}}{{n}_{i}}+\zeta {V}_{ht}+{\pi }_{t}+{\vartheta }_{h}+{\varepsilon }_{iht}$$
(8)

where i is the commodity, h is the family and t the period. \({\vartheta }_{h}\) is the individual family unobserved component. In the case of food shares, we estimate (8) for total food, and for food at home (ingredients) and food away from home.

Table 6 shows the results of estimating (8) for the full sample of families. Regarding the results for all food, we observe that the coefficient for family size becomes negative—but non-statistically significant—which is also applicable to the group of food at home. Regarding food away from home, the coefficient is positive and statistically significant. Thus, despite that the results change for the groups of food, and food at home, these results do not help to explain the puzzle, as the theoretical prediction implies that the coefficients should be positive and statistically significant. This conclusion is clearer when we consider the sample of families with more than one adult. The GMM panel data estimates show that the coefficient for family size becomes negative and statistically significant, which is also applicable to the group of food at home. For the group of food away from home, the coefficient is positive and statistically significant. The latter results are in line with the results shown in Table 5, indicating that consideration of the family unobserved heterogeneity (non-time-varying differences in tastes for food) in the estimation of the Engel equation does not help to resolve the paradox between theory and evidence.

Table 6 Coefficients of the log(family size) on budget shares, panel data GMM estimation

The Estimation of Own-Price and Income Elasticities with a Demand System

According to Barten’s model, the condition for per capita food consumption to increase with family size, holding constant the per capita expenditure, is captured by: \({\sigma }_{h}\left({e}_{fx}+{e}_{ff}\right)-{\sigma }_{f}\left(1+{e}_{ff}\right)>0\), where \({e}_{ff}\) and \({e}_{fx}\) are the own-price and income elasticities of food, and \({\sigma }_{i}=1-\frac{\partial \text{ln}{\phi }_{f}\left(n\right)}{\partial \text{ln}n}\) for i = f, and h is the commodity-specific (technological) economy of scale measure. Given that a pure private good has \({\sigma }_{i}=0\) and a pure public good has \({\sigma }_{i}=1\), and if food (or ingredients) has low own- and cross-price elasticities, we should expect that the income effect dominates, and thus the per capita consumption of this good should increase with family size.

The fact that we count with a panel of families observed during a long period allows us to estimate demand systems, since prices vary so as to allow identification of their effects. We propose to estimate a flexible system in terms of income and price responses: the QUAIDS (Quadratic Almost Ideal System, Banks, see Banks et al., 1997) from which we derive price and income elasticities.

The QUAIDS system is often used in the literature to model consumer demand with family data and is based on price-independent generalized logarithmic (PIGLOG) preferences, with Engel curves that are modelled as budget shares being a quadratic function of the log–budget. It has the advantage of a flexible underlying utility function and allows imposing the restrictions of a consistent demand system, like homogeneity and symmetry. For each i = 1,2, …, N goods and the corresponding budget shares \({w}_{i}\), QUAIDS forms the following non-linear system of equations:

$${w}_{iht}={\alpha }_{ih}+\sum_{j}{\gamma }_{ij}ln{p}_{jt}+{\beta }_{i}ln\left[\frac{{x}_{ht}}{a({p}_{t})}\right]+\frac{{\lambda }_{i}}{b({p}_{t})}{\left\{ln\left[\frac{{x}_{ht}}{a({p}_{t})}\right]\right\}}^{2}+{v}_{it}$$
(9)

for i = 1, 2, …, N goods and j = 1, …, N with consumption budget \({x}_{ht}\) for family h and period t and prices \({p}_{it}\). Price indices are:

$$a\left({p}_{t}\right)={\alpha }_{0}+\sum {\alpha }_{ih}ln{p}_{it}+\frac{1}{2}\sum \sum {\gamma }_{ij}ln{p}_{it}ln{p}_{jt}$$
(10)
$$b\left({p}_{t}\right)=\prod {{p}_{it}}^{{\beta }_{i}}$$
(11)

Further explanatory variables that account for taste shifts in family consumption, such as demographic characteristics, are added to \({\alpha }_{ih}\), so we include a linear specification \({\alpha }_{ih}={\alpha }_{ih}\left({V}_{ht}\right)\), where \({V}_{ht}\) contains the same variables included in (7). We decompose \({v}_{it}\) into a fixed effect and a mixed error.

To calculate income and price elasticities, we derive (8) with respect to \(ln{x}_{h}\) and \(ln{p}_{j}\) (we omit in the expression of elasticities subindex t) to obtain:

$${\mu }_{ih}=\frac{\partial {w}_{ih}}{\partial ln{x}_{h}}={\beta }_{i}+\frac{2{\lambda }_{i}}{b(p)}\left\{ln\left[\frac{{x}_{ht}}{a(p)}\right]\right\}$$
$${\mu }_{ijh}=\frac{\partial {w}_{ih}}{\partial ln{p}_{j}}={\gamma }_{ij}-{\mu }_{ih}\left({\alpha }_{ih}+\sum_{k}{\gamma }_{jk}ln{p}_{k}\right)-\frac{{\lambda }_{i}{\beta }_{j}}{b(p)}{\left\{ln\left[\frac{{x}_{ht}}{a(p)}\right]\right\}}^{2}$$

Then, income elasticities are given by \({e}_{iih}=\frac{{\mu }_{ih}}{{w}_{ih}}+1\) and uncompensated price elasticities by \({e}_{ijh}^{U}=\frac{{\mu }_{ijh}}{{w}_{ih}}-{\delta }_{ij}\), with \({\delta }_{ij}=1\) if i = j and \({\delta }_{ij}=0\) if i ≠ j. Using the Slutsky conditions, we express the compensated price elasticities as \({e}_{ijh}^{C}={e}_{ijh}^{U}+{e}_{iih}{w}_{jh}\). With figures of elasticities at hand, we can test whether (4) holds, assuming hypothetical values for the commodity-specific (technological) economy of scale measure for the six commodities of our demand system (e.g., food, alcohol and tobacco, clothing, housing, services, and other goods). Since both price and income elasticities are family-specific, in the case of the condition being satisfied, we can identify the sample of families where it holds.

First, we use average values of the income and own-price elasticities of food (Table 7) to establish whether the condition holds. When we only consider the income and price elasticities of food and we assume \({\sigma }_{f}=0\) and \({\sigma }_{h}=1\), the difference \({\sigma }_{h}\left({e}_{fx}+{e}_{ff}\right)-{\sigma }_{f}\left(1+{e}_{ff}\right)\) is significantly equal to zero for our estimated figures; that is to say, the compensated own-price elasticity in our sample of families is, on average, significantly equal to the income elasticity.

Table 7 Estimated demand elasticities, QUAIDS model

We then calculate the condition using the compensated cross-price elasticities for all the goods of the system by using the following \({\sigma }_{k}\) for food, alcohol and tobacco, clothing, housing, services and other goods, 0, 1, 0.75, 1, 0.75 and 0.75, respectively. We prove that condition (4) holds in only 585 observations using the parameter estimated with the QUAIDS model. These families are characterized by having per capita income in the two highest centiles of the distribution, showing no differences in family size. Those households have a higher per capita expenditure, indicating a general negative relationship between per capita food expenditure and family size for rich families. Those rich households are also characterized by a higher proportion of household heads with tertiary education, and are less likely to reside in rural areas.

Conclusions

We explore the relationship between family size and food consumption by using a Spanish panel data containing rich information about consumption. Deaton and Paxson (1998) used cross-sectional data to examine expenditure data from a range of developed (US, UK, France) and developing countries (Thailand, Pakistan, South Africa) and found that, since there are few substitutes for food, the price elasticity of food is low-, and the-income effect dominates.

We particularly take advantage of panel data to estimate a flexible demand system, QUAIDS, from which we derive the price and income elasticities, which allow us to test the theoretical condition derived from the existence of economies of scale. Results indicate that this condition expresses a positive relationship between per capita food expenditure and family size holds in only 585 observations, with these families being characterized by having per capita income in the two highest centiles of the distribution. The economic behavior of families depends on the specific economic situation of the country and, consequently, our results should be interpreted in the context of the particular economic crisis of the 1970s and 1980s, as a consequence of the economic shocks due to the very high oil prices that affected Spain and the rest of world.