1 Introduction

We consider the empirical analysis of consumption decisions by groups of individuals who have to reach a consensus on spending a joint budget. A most notable example of such a group is the household of which the members pool their income, after which they must agree upon the allocation of this aggregate income. Samuelson (1956) initiated this consensual approach to modeling household consumption behavior, by assuming a (household level) social welfare function that is maximized under a household budget constraint (see also Donni 2007a; Lundberg and Pollak 2007 for recent surveys of the literature following this approach). Footnote 1

This paper deals with individual consumption in a multi-person group and is framed in terms of the collective consumption model introduced by Chiappori (1988). This model assumes a collective choice process and explicitly recognizes the multi-person nature of this process, with each individual decision maker (group member) characterized by an own utility function that represents her/his rational preferences. Footnote 2 It only assumes that the observed group consumption is the Pareto efficient outcome of a bargaining process. Footnote 3 The model then defines collectively rational behavior as maximizing a weighted sum of the group member utility functions, with the weights representing the bargaining power of the individual group members. Interestingly, these bargaining weights may vary depending on the prices, income levels and other exogenous variables characterizing the choice situations. Footnote 4 The collective model is increasingly used in empirical studies of household behavior (see, for example, Vermeulen 2002; Donni 2007b for surveys).

We make an empirical as well as a methodological contribution. The empirical contribution is that we apply a revealed preference methodology in combination with experimental data for assessing the empirical goodness-of-fit of the collective model. More specifically, we will focus on revealed preference conditions in the tradition of Afriat (1967), Diewert (1973), and Varian (1982). These conditions enable checking consistency of a given data set with a particular specification of the collective consumption model. In the spirit of Varian (1982), we will refer to this checking procedure as ‘testing’ data consistency with collective rationality. Apart from goodness-of-fit, we will also consider the discriminatory power of alternative specifications of the collective consumption model. Indeed, a fair comparison of different behavioral models must complement a goodness-of-fit analysis with a power analysis: favorable goodness-of-fit results, indicating few violations of the behavioral restrictions, have little meaning if the behavioral implications have low power, i.e. optimizing behavior can hardly be rejected.

Our use of revealed preference tests and experimental data distinguishes our empirical study from existing studies. Revealed preference tests are entirely nonparametric, which means that they do not require a prior parametric structure for the consumption model (e.g. individual utility functions). This contrasts with other studies, which are typically parametric in nature. Strictly speaking, these studies simultaneously test the consumption model under study as well as a (non-verifiable) parametric structure that is imposed on the model. The use of nonparametric tools does not require such a priori’s. In addition, the laboratory nature of experiments effectively avoids the often controversial preference homogeneity assumptions (excluding e.g. changing preferences) and data measurement problems that are associated with using ‘real life’ data. In fact, it has been argued that revealed preference testing tools are especially useful within an experimental context; a particularly convincing case is provided by Sippel (1997), who focused on individual rationality. Next, and specific for our own study, the experimental set-up allows for obtaining information on consumption quantities for the individual group members; such information is typically not available in ‘real life’ data sets. For example, household data sets usually only contain consumption quantity information at the level of the aggregate household as a whole, and do not reveal the individual members’ consumption quantities. In this respect, we also refer to our discussion in Sect. 4, which discusses the relevance of this study for analyzing household consumption data.

As for our methodological contribution, we present testing tools for collective rationality under alternative assumptions regarding variation of bargaining weights across different choice situations. This extends earlier work of Cherchye et al. (2007, 2011), who developed revealed preference conditions for the collective consumption model that do not include such assumptions. As such, these conditions allow for a huge variation of the bargaining weights across periods. For example, the model allows the full bargaining power to shift from one group member to another between any two consecutive periods. Because the realistic nature of such power shifts may be questioned, we propose a revealed preference methodology for testing the collective consumption model under restricted variation of bargaining weights. As we will indicate, adding such weight restrictions implies revealed preference tests with higher discriminatory power, which is particularly convenient when focusing on the empirical performance of the collective model.

At this point, it is worth to distinguish our approach from the one of Cherchye et al. (2009). These authors suggest revealed preference conditions for the collective consumption model that impose prior assumptions regarding the so-called sharing rule (or within-group income distribution) underlying observed household behavior. This sharing rule is often interpreted as an indicator of bargaining power (see, for example, Browning et al. 2006). As such, the approach presented here can be conceived as complementary with the one proposed by Cherchye, De Rock and Vermeulen. Footnote 5

One final remark applies to our following empirical analysis. The revealed preference tests on which we focus are not traditional statistical tests, which are characterized by standard errors and so allow for statistical inference. Our tests are ‘sharp’ tests: they check whether or not the data pass the revealed preference conditions exactly. If the data do not pass the conditions, then the model under study is rejected. Footnote 6 As a result, we cannot use the usual statistical methods for comparing the empirical validity of different specifications of the collective model. By contrast, we will follow a recent proposal of Beatty and Crawford (2011) that compares different behavioral models on the basis of a so-called ‘predictive success’ measure, which is specially tailored for the type of revealed preference tests that we consider here. As we will explain in Sect. 4, this measure of predictive success simultaneously accounts for the goodness-of-fit and discriminatory power of a particular model specification.

The rest of the paper unfolds as follows. Section 2 introduces the revealed preference tests for collective rationality. Section 3 presents the experimental design. Section 4 presents the results of our empirical analysis, and subsequently discusses the usefulness of this study for analyzing household consumption data. Section 5 summarizes and concludes.

2 Rational consumption decisions: revealed preference conditions

In this section we present the rationality conditions for consumption behavior that will be used in our empirical study. To set the stage, we first introduce the unitary rationality condition, which we will also use for evaluating the individual choices in our experimental analysis. Next, we consider two versions of collective rationality, i.e. with and without restrictions on the variation of the bargaining weights across choice observations.

2.1 Unitary rationality

Our empirical analysis starts from a finite set of T observed choices consisting of N-vectors of quantities \({\mathbf{q}_{t}\in \mathbb{R}_{+}^{N}}\) and prices \({\mathbf{p}_{t}\in \mathbb{R} _{++}^{N}. }\) Let

$$ S^{un}=\left\{ \left( {\mathbf{p}}_{t};{\mathbf{q}}_{t}\right) ,t=1,\ldots,T\right\} $$

represent the corresponding set of observations that is used in the analysis of unitary rationality.

Given the set S un, unitary rationality means that observed behavior can be rationalized in terms of a single decision maker maximizing a single (‘unitary’) utility function U. Throughout, we will assume utility functions are continuous, concave, monotonically increasing and non-satiated. Formally, we get the following condition for unitary rationality:

Definition 1

(unitary rationalization, UR) Let \(S^{un}=\left\{ \left( \mathbf{p}_{t};\mathbf{q}_{t}\right) ;t=1,\ldots,T\right\} \) be a set of observations. A utility function U provides a unitary rationalization (UR) of S un if and only if for each observation \(t=1,\ldots,T: U\left(\mathbf{q}_{t}\right)\) equals

$$ \max_{{\mathbf{z}}\in {\mathbb{R}}_{+}^{n}}U\left( {\mathbf{z}}\right) \, \hbox{s.t.}\, {\mathbf{p}}_{t}^{\prime } {\mathbf{z}}\leq {\mathbf{p}}_{t}^{\prime }{\mathbf{q}}_{t}, $$

with \(\mathbf{z}\) representing affordable consumption quantities for the prices \(\mathbf{p}_{t}\) and budget \(\mathbf{p}_{t}^{\prime }\mathbf{q}_{t}. \)

Varian (1982), based on Afriat (1967), has demonstrated that such a data rationalizing utility function exists if and only if a solution exists for the so-called Afriat inequalities. This is contained in the next result:

Proposition 1

Let \(S^{un}=\left\{ \left( \mathbf{p}_{t};\mathbf{q}_{t}\right) ;t=1,\dots,T\right\} \) be a set of observations. The following statements are equivalent:

  1. (i)

    There exists a utility function U that provides a UR of S un;

  2. (ii)

    For all \(s,t\in \{1,\ldots,T\}, \) there exist numbers \({U_{t}, \lambda_{t}\in \mathbb{R}_{++}}\) that meet the Afriat inequalities

    $$ U_{s}-U_{t}\leq \lambda_{t}{\mathbf{p}}_{t}^{\prime }\left( {\mathbf{q}}_{s}- {\mathbf{q}}_{t}\right). $$

In this result, the equivalence between statements (i) and (ii) means that there exists a rationalizing utility function U if and only if the set S un satisfies a number of inequalities defined in the unknowns U t and λ t . These last inequalities are commonly referred to as ‘Afriat inequalities’ corresponding to the set S un. Intuitively, these Afriat inequalities allow us to obtain an explicit construction of the utility levels and the marginal utility of income associated with each observation t: they define a utility level U t and a marginal utility of income λ t (associated with the observed income \(\mathbf{p}_{t}^{\prime }\mathbf{q}_{t}\)) for each observed \(\mathbf{q}_{t}.\) We remark that the Afriat inequalities in (2) are linear. Thus, we can use standard linear programming techniques to verify if there exists a unitary rationalization for S.

In what follows, we will consider an extended version of the rationality condition in Proposition 1. The extended condition accounts for optimization errors in the following way: it requires ‘nearly’ optimizing behavior rather than ‘exactly’ optimizing behavior (see Afriat 1973; Varian 1990 on the usefulness of considering such nearly optimizing behavior in empirical revealed preference analysis). Formally, the extended version of the rationality condition uses \(e\in [0,1], \) and replaces (2) by

$$ U_{s}-U_{t}\leq \lambda_{t}{\mathbf{p}}_{t}^{\prime }\left( {\mathbf{q}}_{s}-e {\mathbf{q}}_{t}\right). $$

Clearly, e = 1 makes (3) coincide with (2). In our following empirical exercise, we allow for small optimization errors by additionally considering e = 0.975 and e = 0.95. In general, lower values for e imply weaker unitary rationality conditions. We remark that using (3) instead of (2) preserves the linear nature of the restrictions.

2.2 Collective rationality without bargaining weight restrictions

Our empirical investigation of the collective consumption model will focus on two-member groups or ‘dyads’. Like above, we assume an observed set of T dyad choices, consisting of quantities \(\mathbf{q}_{t}\) and prices \(\mathbf{p}_{t}. \) Now we also use information on the observed within-dyad allocation of the quantities \(\mathbf{q}_{t}; \) our experimental set-up allows us to obtain this information (see Sect. 3) Specifically, for each observed bundle \(\mathbf{q}_{t}, \) we know the individually consumed quantities \({\mathbf{q}_{t}^{1},\mathbf{q}_{t}^{2} \in \mathbb{R}_{+}^{N}}\) such that

$$ {\mathbf{q}}_{t}{\mathbf{=q}}_{t}^{1}+{\mathbf{q}}_{t}^{2}, $$

Thus, our analysis of collective rationality will use the set of observations

$$ S^{co}=\left\{ \left( {\mathbf{p}}_{t};{\mathbf{q}}_{t},{\mathbf{q}}_{t}^{1}, {\mathbf{q}}_{t}^{2}\right) ,t=1,\ldots,T\right\} $$

The collective model assumes that the preferences of the two group members, which are defined in terms of the privately consumed quantities, can be represented by utility functions U 1 and U 2. The collective consumption model (only) assumes Pareto efficient within-group allocations. Formally, we obtain the following condition for collective rationality.

Definition 2

(collective rationalization, CR) Let \(S^{co}=\{\left(\mathbf{p}_{t};\mathbf{q}_{t},\mathbf{q}_{t}^{1}, \mathbf{q}_{t}^{2}\right) ,t=1,\ldots,T\}\) be a set of observations. A pair of utility functions U 1 and U 2 provides a collective rationalization (CR) of S co if and only if for each observation \(t=1,\ldots,T:\) there exists a Pareto weight \({\mu _{t}\in \mathbb{R}_{++}}\) such that \(U^{1}\left( \mathbf{q}_{t}^{1}\right) +\mu_{t}U^{2}\left( \mathbf{q}_{t}^{2}\right)\) equals

$$ \max_{\left( {\mathbf{z}}^{1},\, {\mathbf{z}}^{2}\right) \in \left({\mathbb{R}}_{+}^{n}\right)^{2}}U^{1}\left( {\mathbf{z}}^{1}\right) +\mu_{t}U^{2}\left({\mathbf{z}}^{2}\right) \, s.t.\, {\mathbf{p}}_{t}^{\prime }\left({\mathbf{z}} ^{1}+{\mathbf{z}}^{2}\right) \leq {\mathbf{p}}_{t}^{\prime }{\mathbf{q}}_{t}, $$

with \(\mathbf{z}^{1}\) and \(\mathbf{z}^{2}\) representing affordable consumption quantities for the prices \(\mathbf{p}_{t}\) and budget \(\mathbf{p}_{t}^{\prime}\mathbf{q}_{t}.\)

Thus, the collective consumption model generalizes the unitary model by describing group behavior as maximizing a weighted sum of the individual member utilities. The Pareto weight μ t then represents the relative ‘bargaining power’ of member 2 (vis-à-vis member 1) in observation/situation t. Importantly, this bargaining power may vary depending on the specific observation at hand.

The following result provides a revealed preference characterization of collectively rational behavior.

Proposition 2

Let \(S^{co}=\left\{ \left(\mathbf{p}_{t};\mathbf{q}_{t}, \mathbf{q}_{t}^{1},\mathbf{q}_{t}^{2}\right) ,t=1,\ldots,T\right\} \) be a set of observations. The following statements are equivalent:

  1. (i)

    There exists a pair of utility functions U 1 and U 2 that provide a CR of S co;

  2. (ii)

    For all \(s,t\in \{1,\ldots,T\}, m=1,2\) there exist numbers \({U_{t}^{m}, \lambda_{t}^{m}\in \mathbb{R}_{++}}\) that meet the collective Afriat inequalities

    $$ U_{s}^{m}-U_{t}^{m}\leq \lambda_{t}^{m}{\mathbf{p}}_{t}^{\prime }\left( {\mathbf{q}}_{s}^{m}-{\mathbf{q}}_{t}^{m}\right). $$

The interpretation of statement (ii) in this result is similar to the one of statement (ii) in Proposition 1. Just like for individual rationality, collective rationality requires finding a solution for Afriat inequalities. In this case, we get a set of inequalities for each individual member defined in terms of the given personalized quantities and prices. Like before, they allow for an explicit construction of (in casu member-specific) utilities (U m t ) and marginal utilities of income (λ m t ). In the collective case, the marginal utilities of income pertain to the income shares of the individual group members associated with the observed intragroup allocation (i.e. \(\mathbf{p}_{t}^{\prime }\mathbf{q}_{t}^{m}\) for each member m).

Similar to before, our following experimental analysis will focus on an extended version of the inequalities (6) that accounts for optimization error, i.e.

$$ U_{s}^{m}-U_{t}^{m}\leq \lambda_{t}{\mathbf{p}}_{t}^{\prime }\left( {\mathbf{q}}_{s}^{m}-e{\mathbf{q}}_{t}^{m}\right) $$

with e = 1, 0.975 and 0.95. The resulting inequalities can again be verified through linear programming techniques.

Two final remarks are in order. Firstly, the above collective consumption model is the so-called egoistic model, in which each member m's utility function only depends on the own private consumption \(\mathbf{q}^{m}. \) This means that we do not allow (i) for consumption externalities (i.e. U m does not depend on \(\mathbf{q}^{l}, l\neq m\)) or (ii) public consumption within the group. Browning and Chiappori (1998) and Chiappori and Ekeland (2006, 2009) introduced a general collective consumption model that does account for publicly consumed quantities and privately consumed quantities associated with externalities; Cherchye et al. (2007, 2011) developed the revealed preference condition for this general model. We see more refined experiments that specifically focus on consumption externalities and publicly consumed quantities as an interesting avenue for future research. However, we also believe that a simple experiment that focuses on the egoistic model constitutes a useful first step towards considering such more refined settings.

At this point, it is worth indicating that this ‘egoistic’ specification of the collective model actually encompasses a wider class of member-specific utilities, which model altruism in a specific way: it also includes so-called caring preferences of the individual group members, which depend not only on the member’s own (egoistic) utility but also on the other member’s utility. Chiappori (1992) argues that the empirical implications of caring preferences are indistinguishable from those of egoistic preferences. As such, while we will not indicate this explicitly in the following discussion, our empirical conclusions for the egoistic model directly carry over to the (more general) caring model.

A second remark pertains to our maintained assumption that individual utility functions are concave (representing convex preferences). In this respect, Cherchye et al. (2010a) have recently shown that this concavity assumption is actually testable for the egoistic model that we consider here. They established a testable revealed preference condition for the model that applies under general (possibly non-convex) preferences. If we apply this condition to the experimental data presented in Sect. 4, then we conclude that this last condition cannot be rejected for our dyads under study.Footnote 7 In other words, the egoistic model cannot be rejected if we allow for non-concave utility functions. Or conversely, any rejection of the egoistic model that is reported further on may also be attributed to non-convex individual preferences rather than a rejection of the egoistic model per se.

In this study, we maintain the assumption of concave utilities because it allows us to represent Pareto efficient group behavior as maximizing a weighted sum of individual utilities, with the Pareto weights μ t indicating the relative bargaining power of the individual group members (see Definition 2). This representation no longer holds if we drop the concavity assumption. As a matter of fact, this weighted sum representation forms the very basis for our methodological contribution that we present next, which develops a revealed preference characterization of collectively rational behavior under restricted Pareto weights μ t .

2.3 Collective rationality under bargaining weight restrictions

The collective rationality condition discussed above does not impose any restriction on the bargaining weights in different choice situations/observations. If this condition cannot be rejected for a given set of observations, a natural next question asks whether the data also pass a more restricted condition that does impose restrictions on the bargaining weights variation. In this section, we propose a methodology that allows for testing such a condition.

Formally, we consider the following condition of collective rationality under bargaining weight restrictions.

Definition 3

(restricted collective rationalization, a − CR) Let \(S^{co}=\{(\mathbf{p}_{t}; \mathbf{q}_{t}, \mathbf{q}_{t}^{1}, \mathbf{q}_{t}^{2}), t= 1,\ldots,T\}\) be a set of observations and consider \({a \in \mathbb{R}_{+}.}\) A pair of utility functions U 1 and U 2 provides an a-restricted collective rationalization (a − CR) of S co if and only if for each observation \(t=1,\ldots,T: \) there exists a constant \({C\in \mathbb{R}_{+}}\) and weights \({\mu_{t}\in \mathbb{R}_{++}}\) such that \(U^{1}\left(\mathbf{q}_{t}^{1}\right) +\mu_{t}U^{2}\left( \mathbf{q}_{t}^{2}\right) \) equals

$$ \begin{aligned} &\max_{\left( {\mathbf{z}}^{1},{\mathbf{z}}^{2}\right) \in \left({\mathbb{R}}_{+}^{n}\right) ^{2}}U^{1}\left( {\mathbf{z}}^{1}\right) +\mu_{t}U^{2}\left( {\mathbf{z}}^{2}\right) \, s.t.\, {\mathbf{p}}_{t}^{\prime}\left( {\mathbf{z}} ^{1}+{\mathbf{z}}^{2}\right) \leq {\mathbf{p}}_{t}^{\prime }{\mathbf{q}}_{t}\\ &\quad\hbox {and }\frac{C}{(1+a)}\leq \mu_{t}\leq C(1+a), \end{aligned} $$

with \(\mathbf{z}^{1}\) and \(\mathbf{z}^{2}\) representing affordable consumption quantities for the prices \(\mathbf{p}_{t}\) and budget \(\mathbf{p}_{t}^{\prime }\mathbf{q}_{t}.\)

The only difference with Definition 2 is the additional restriction (8) on the bargaining weights μ t . In the limiting case when a = 0, Definition 3 constraints the bargaining weight to be a constant number C for all observations. For other choices of a, the definition allows for some variation of μ t , but only in a prespecified (small) range. Finally, taking a arbitrarily large makes Definitions 2 and 3 coincide. As such, Definition 3 allows us to consider a whole spectrum of collective consumption models.

The following proposition presents a revealed preference characterization of collectively rationality under bargaining weight restrictions.Footnote 8

Proposition 3

Let \(S^{co}=\left\{\left(\mathbf{p}_{t};\mathbf{q}_{t}, \mathbf{q}_{t}^{1},\mathbf{q}_{t}^{2}\right) ,t=1,\ldots,T\right\}\) be a set of observations and consider \({a\in \mathbb{R}_{+}. }\) The following statements are equivalent:

  1. (i)

    There exists a pair of utility functions U 1 and U 2 that provide an a -CR of S co;

  2. (ii)

    For all \(s,t\in \{1,\ldots,T\}, m=1,2\) there exist numbers \({U_{t}^{m}, \lambda_{t}^{m}\in \mathbb{R}_{++}}\) that meet the collective Afriat inequalities

    $$ U_{s}^{m}-U_{t}^{m}\leq \lambda_{t}^{m}{\mathbf{p}}_{t}^{\prime }\left( {\mathbf{q}}_{s}^{m}-{\mathbf{q}}_{t}^{m}\right) $$
    $$ \hbox {and }\frac{1}{(1+a)}\leq \frac{\lambda_{t}^{1}}{\lambda_{t}^{2}}\leq (1+a). $$

The interpretation of the condition in Proposition 3 is directly analogous to the one of Proposition 2. The only difference is the additional restriction (10), which corresponds to the weight restriction (8) in Definition 3. We note that the restriction (10) can be rewritten in linear form as

$$ \begin{aligned} \lambda_{t}^{2}&\leq (1+a)\lambda_{t}^{1}\hbox { and}\\ \lambda_{t}^{1}&\leq (1+a)\lambda_{t}^{2}. \end{aligned} $$

Thus, we can again use linear programming techniques to verify the a − CR condition in Proposition 3.

In our following analysis, the unitary model will serve as a benchmark when evaluating the goodness-of-fit of various collective models. In this respect, it is important to point out that the a -CR condition in Proposition 3 (for any value of a) is not nested with unitary rationality condition in Proposition 1: consistency with the first condition does not imply consistency with the second condition, or vice versa. In our opinion, this nonnestedness conclusion makes it all the more interesting to empirically compare the empirical performance of the unitary model and the collective model (for alternative values of a). We will carry out such a comparison in our empirical analysis.

Example 1 illustrates this nonnestedness result by considering the a -CR condition for a = 0.Footnote 9 We believe this case is particularly interesting because it may appear to some that the collective rationalization condition for a = 0 is empirically equivalent to the unitary rationalization condition. More precisely, that the constant bargaining weight implies that we can aggregate the individual preferences into a single utility function, and, thus, that we obtain a unitary rationalization. Our Example 1 shows that this reasoning is incorrect. Specifically, it ignores the observed intragroup allocation that is given by the personalized quantities; this intragroup allocation implies additional collective restrictions which are not captured by the unitary rationalization condition.

We emphasize that observing the intragroup allocation (i.e. the set S co contains the personalized quantities \(\mathbf{q}_{t}^{m}\)) crucially drives this non-nestedness result. Specifically, the result breaks down if the intragroup allocation is not observed (i.e. the set of observation is S un rather than S co). In this case, the collective rationality condition would require that there exists at least one specification of the personalized quantities \(\mathbf{q}_{t}^{1}\) and \(\mathbf{q}_{t}^{2}\) (with \(\mathbf{q}_{t}^{1}+\mathbf{q}_{t}^{2}=\mathbf{q}_{t}\)) such that the correspondingly defined set \(\{\left(\mathbf{p}_{t}; \mathbf{q}_{t},\mathbf{q}_{t}^{1},\mathbf{q}_{t}^{2}\right) , t=1,\ldots,T\}\) satisfies statement (ii) in Proposition 3.Footnote 10 One can verify that, for a = 0, this collective rationality condition is exactly equivalent to the unitary rationality condition.

Example 1

Suppose a first situation with 2 observations and 2 goods. The set S un includes the following aggregate quantities and prices

$$ \begin{aligned} {\mathbf{q}}_{1} &=\left( 2,1\right) ^{\prime },{\mathbf{p}}_{1}=\left( 2,1\right) ^{\prime }, \\ {\mathbf{q}}_{2} &=\left( 1,2\right) ^{\prime },{\mathbf{p}}_{2}=\left( 1,2\right) ^{\prime }. \end{aligned} $$

The set S co additionally contains the personalized quantities

$$ \begin{aligned} {\mathbf{q}}_{1}^{1} &=\left( 2,0\right) ^{\prime }\hbox { and }{\mathbf{q}}_{1}^{2}=\left( 0,1\right) ^{\prime }; \\ {\mathbf{q}}_{2}^{1} &=\left( 1,0\right) ^{\prime }\hbox { and }{\mathbf{q}}_{2}^{2}=\left( 0,2\right) ^{\prime }. \end{aligned} $$

Using linear programming it is now easy to verify that the set S co meets the 0-CR condition in Proposition 3 while the set S un does not meet the UR condition in Proposition 1. Next, suppose a second situation with again 2 goods and 2 observations. In this case, the set S un includes

$$ \begin{aligned} {\mathbf{q}}_{1} &=\left( 2,1\right) ^{\prime },{\mathbf{p}}_{1}=\left( 1,2\right) ^{\prime },\\ {\mathbf{q}}_{2} &=\left( 1,2\right) ^{\prime },{\mathbf{p}}_{2}=\left( 2,1\right) ^{\prime }; \end{aligned} $$

while the set S co additionally contains

$$ \begin{aligned} {\mathbf{q}}_{1}^{1} &=\left( 0,1\right) ^{\prime }\hbox { and }{\mathbf{q}}_{1}^{2}=\left( 2,0\right) ^{\prime }; \\ {\mathbf{q}}_{2}^{1} &=\left( 1,0\right) ^{\prime }\hbox { and }{\mathbf{q}}_{2}^{2}=\left( 0,2\right) ^{\prime }. \end{aligned} $$

Then one can verify that the set S un satisfies the UR condition, but the set S co does not meet the 0 − CR condition.

As a side result, Example 1 also shows that the within-group income distribution may remain constant even if the distribution of the bargaining power varies. Specifically, as we indicated above, the first data set is consistent with a constant bargaining power of the individual group members in the two choice observations. However, the relative income share of the two members varies over the two observations. Similarly, we could construct a data set characterized by a constant income distribution but a varying bargaining power. This shows that, in general, a constant income distribution does not imply a constant bargaining power, and vice versa.

3 Experimental design

Participants in our experiment were 102 undergraduate students (53 women). Ages ranged from 18 to 25 years (mean value = 21.02; standard deviation = 1.72). As we wanted to analyze collective choice behavior, both men and women were asked to sign up for an experimental session together with either a male or a female friend or a romantic partner. This led to a sample containing four types of dyads, namely, male dyads or two male friends (12 in total), female dyads or two female friends (14 in total), mixed dyads or one male and one female friend that do not bound romantically (13 in total), and romantic dyads or one man and one woman who were in a romantic relationship together (12 in total).Footnote 11 Given the small sample sizes for the different dyad types, we will only report test results for the full sample in what follows. Test results for specific dyad types are available from the authors upon request.

Participants were scheduled to come to the laboratory in groups of eight (i.e., four dyads). Each participant was assigned a seat in a partially enclosed cubicle, and worked individually for the main part of the session. Dyads were asked to engage in one experimental task together. Participants were rewarded with money and with a commodity bundle for their cooperation. Each dyad received money and a commodity bundle with a combined value of € 20.

Our experiment is similar in design to the one of Harbaugh et al. (2001), who used a revealed preference methodology to investigate individual rationality for young children. Upon entering the laboratory, participants were given the opportunity to taste small quantities of red wine, orange juice, and M&Ms (i.e. a type of chocolate candy). They were truthfully told that they would be making consumption decisions with respect to these three commodities later on, and that we wanted them to familiarize themselves with the commodities. Participants were then presented with 9 choice problems, i.e. T = 9. Each choice consisted of the three commodities red wine, orange juice, and M&Ms, i.e. N = 3. We selected these goods following previous studies that used revealed preference methodology for analyzing individual rationality (see, for example, Sippel 1997; Harbaugh et al. 2001).

Each choice problem was characterized by a different price regime; the prices of the three commodities are shown in Table 1. The participants were asked to indicate how much they wanted to obtain from each commodity, given that the total budget they could allocate to the three commodities was € 10. In order to avoid indivisibility issues, we created very small units (i.e. a unit of red wine is 1 centiliter, a unit of orange juice is 3 centiliters, and a unit of M&Ms is 5 grams); in addition, participants had the option to select non-integer quantities. We note that the price-income regimes in our experiment imply a high power of our rationality tests (i.e. a high probability of detecting irrational behavior), essentially because there is no variation of income (€ 10) but a lot of variation of prices. For example, Blundell et al. (2003) apply a similar idea in their ‘maximum power sequential path’ procedure for maximizing the power of their nonparametric tests. We will return to the power of our tests in the next section.

Table 1 Experimental design—prices for the 9 choice problems

Participants were asked to make the 9 allocation decisions twice: once individually and once together with their friend. The order in which both sets of decisions were to be made was counterbalanced: one half of the dyads first made the decisions individually and only afterwards collectively, whereas the other half of the dyads first made the decisions collectively and only afterwards individually; changing the order in this way did not yield significantly different results in terms of (individual or collective) rationality. Table 2 presents summary information on the budget shares corresponding with the individuals’ and the dyads’ choices under the 9 price regimes; this expenditure information also allows for reconstructing the corresponding (mean) quantities that have been chosen under the different price regimes in Table 1.

Table 2 Experimental results—budget shares for the 9 choice problems

In case of collective decision-making, participants were asked to indicate for each of the three commodities which percentage of their demand was intended for each individual. This provides the personalized quantity information that we use for the collective rationality tests discussed above.

The decision problems participants were faced with were supposed to mimic real life difficulties that both individual consumers and groups often encounter when having to pick their optimal commodity bundles out of the available budget sets. To enhance the external validity of our study, participants were told that, when all experimental sessions were over (maximum two weeks after they participated), they would actually receive one of the commodity bundles they had put together. They were also told that they would be informed through E-mail about where and when to pick it up. For practical reasons, we picked this bundle randomly from the set of decisions that participants had made collectively (and we thus ignored the individually chosen bundles), although they were not informed of this beforehand. The knowledge that each choice ostensibly had the same chance of actually being implemented was supposed to give economic significance to otherwise merely hypothetical decisions, thus providing participants with an incentive for making choices that truly represented their preferences.

As making the allocation decisions required a considerable amount of calculation (multiplying prices and demand for each commodity and adding up to check whether the budget is exhausted), participants were encouraged to use a calculator to check their decisions. Participants could also spend as much time as they liked on their decisions and were free to compare, reconsider, and correct previous choices. As such we maximally avoid dynamic and intertemporal issues (such as learning, punishment strategies, etc.), which falls in line with the static nature of the collective model under consideration. When the participants felt that the decisions they had made represented their actual preferences, the experimenter provided them with the instructions for the next task. In “Appendix A”, we include the instructions that were handed out to the participants.

Before considering our actual test results, it is interesting to have a closer look at how dyad members shared the income that was given to them. Table 3 provides results for the 9 choice problems in our experiment. To define the income shares, we make use of the personalized quantity information that has been reported; using the notation of Sect. 2, each member m’s income share is defined as \(\mathbf{p}_{t}^{\prime }\mathbf{q}_{t}^{m}. \) Table 3 gives summary information on income share differences in relative terms, i.e. absolute differences between individual income shares divided by the total dyad income (€ 10). It reports mean values, as well as corresponding standard deviations and maximum values. In addition, for each choice problem, the column ‘equal split’ provides the percentage of dyads that apply equal sharing.

Table 3 Experimental results—income sharing for the 9 choice problems

We find that dyads opt for equal sharing in about 30–40% of all choices. This means that there is an unequal distribution of resources in about two thirds of all choice situations. We believe this is quite substantial, especially when taking into account the laboratory nature of our experiment. In fact, in some cases the income inequality is very large: in one instance, the (maximum) difference between members’ income shares amount to no less than 80% of total income. Furthermore, the mean values of relative income differences are rather pronounced and there is considerable variation across choice observations (see the standard deviations). All this makes us believe that this is a useful data set for demonstrating the practical usefulness of the revealed preference methodology introduced above.

As a concluding remark, we indicate that the original experiment also included an additional group of dyads that were confronted with almost identical decision problems (i.e. they had to state their demand for red wine, orange juice, and M&Ms, given the same relative price variations as presented in Table 1 and a budget of € 10), but with the additional option of receiving in cash any amount of the budget they wanted to in each decision situation; the price of this additional ‘cash’ commodity equals 1 for all choice problems. The test results for this ‘4-commodities’ group of dyads turned out to be qualitatively similar to the results for the group that did not have the cash option. Therefore, to keep our discussion compact, we will not incorporate the test results for the 4-commodities group in the following section. These results are available upon simple request.

In fact, the cash option for the 4-commodities group also served as a validity check of our experimental design. Specifically, if the dyads found the experiment not appealing (because of the commodities and/or their prices), then the cashed amount would equal their total budget in each observation; this would guarantee that they received the money with certainty. Because participants effectively spent a large share of their total budget on the three original commodities (rather than keeping it as cash), we conclude that our design does pass this validity test.Footnote 12

4 Test results

We will first report on the rationality of the individual choices. Indeed, individually rational behavior is a prerequisite for collectively rational behavior. As a result, a first test for rational collective behavior is whether the unitary model (which assumes a single decision maker) adequately describes the observed individual choice behavior. Subsequently, we will consider the empirical goodness-of-fit of the unitary model for describing the collective choices. This will set the benchmark for the collective consumption model, which will be considered in the last section.

We will consider the goodness-of-fit as well as the discriminatory power of all models under evaluation. We believe both aspects are important when comparing the empirical performance of alternative consumption models. In particular, we address the adequacy of different models by following the proposal of Beatty and Crawford (2011). These authors propose a ‘predictive success’ measure to evaluate the overall empirical performance of specific behavioral models in the context of revealed preference analysis.

More specifically, Beatty and Crawford’s measure of predictive success simultaneously accounts for the two empirical performance aspects mentioned above. Goodness-of-fit (‘fit’) is measured as the percentage of study subjects (individuals or groups) passing the rationality test for the model under study. Next, discriminatory power (‘power’) is measured as the probability that the test detects behavior that is irrational; to compute this power, we will model irrational behavior as random behavior (see below). Fit and power values lie between 0 and 1. Predictive success is then defined as

$$ \hbox {predictive success = power} - (1 - \hbox{fit}). $$

The value of this measure always lies between −1 and 1. A value close to 1 indicates a model with approximately 100% power and 100% fit, i.e. the best possible scenario. This means that (almost) all data pass the rationality test, even though the test effectively detects (almost) any deviating (i.e. random or irrational) behavior. By contrast, a value close to −1 implies a model with approximately 0% power and 0% fit, i.e. the worst possible scenario. In this case, the test effectively allows for (almost) any observed behavior and yet the data fail to pass. Finally, a value of 0 corresponds to a model with a rejection rate for the observed behavior (= 1 − fit) that exactly equals the expected rejection rate if behavior were random (= power). Essentially, this means that the rationality test does not allow for distinguishing observed behavior from random behavior. We believe this 0 value provides a useful benchmark value. A ‘good’ model must at least have a predictive success rate that is positive. In general, a higher predictive success rate reveals a better empirical performance of a given model.

At this point, it is worth indicating that the predictive success measure defined above actually assigns an equal weight to discriminatory power and goodness-of-fit. This equal weighting may seem arbitrary to some. Interestingly, however, Beatty and Crawford (2011) show that this weighting scheme has an interesting axiomatic characterization.Footnote 13 We believe this provides a convincing theoretical foundation for our focus on the (equally weighted) predictive success measure as it was originally presented by Beatty and Crawford.

4.1 Individual rationality test

To set the stage, we discuss the test results for individual rationality, based on the rationality condition in Proposition 1. As explained before, we account for optimization error by focusing on the extended version of the Afriat inequalities in (3). In these inequalities, lower values of e account for greater optimization error. In our application, we consider the models with e = 1 (‘100% optimization’), 0.975 (‘97.5% optimization’) and 0.95 (‘95% optimization’).

Table 4 presents the results. Let us first consider the goodness-of-fit values for the three models that we consider. We find that all models provide an adequate description of observed behavior. Not surprisingly, lower values for e imply higher pass rates. But even the 100% optimization model does rationalize the behavior of slightly more than 90% of the individuals under consideration. The fit of the other models amounts to approximately 100%. These results are consistent with those of other, similar experiments on individual rationality (e.g. Harbaugh et al. 2001; Andreoni and Miller 2002).

Table 4 Individual rationality—pass rate, power and predictive success

Let us then consider power. As indicated above, we measure power as the probability of detecting random behavior. We model random behavior using the bootstrap method for panel data as described by Andreoni and Miller (2002) and applied by Harbaugh et al. (2001) within a similar experimental context.Footnote 14 The method essentially mimics random behavior for each price regime (or budget) by drawing randomly from the observed set of choices under that price regime (i.e. 102 choices under 9 different price regimes). This gives information on the expected distribution of violations under random choice, while incorporating information on the participants’ actual choices. All bootstrap results reported in this paper are based on Monte Carlo-type simulations that include 50,000 iterations.

Table 4 shows that random behavior leads to rejections of the 100% optimization model in 82% of the cases. This confirms that our experimental design leads to a powerful test of the model. Power decreases rather drastically when e decreases. For example, the 95% optimization model rejects rationality of random behavior in only about 60% of the cases.

Finally, Table 4 gives the predictive success rates for the three individual rationality models that we consider. All models have a predictive success rate that is substantially above 0, which is comforting. Next, we find that the 100% optimization model substantially outperforms the 97.5 and 95% optimization models. The explanation is that discriminatory power is substantially higher for the former model than for the latter two models, even though the goodness-of-fit results show an (albeit much less pronounced) opposite pattern.

Our overall conclusion is that the individual rationality model cannot be rejected for our specific choice setting. The ‘best’ model is the 100% optimization model, which has a predictive success rate of about 75%. Next, on the basis of the results in Table 4, we can reasonably assume that participants in our experiment satisfy individual rationality. This provides a useful motivation for our following exercises, which investigate rationality of the same individuals in collective choice settings.

4.2 Unitary rationality test for dyads

We first study rationality of collective choices in terms of the unitary model. Like before, we consider the condition in Proposition 1, and account for optimization error by focusing on the Afriat inequalities in (3); we again use e = 1 (‘100% optimization’), 0.975 (‘97.5% optimization’) and 0.95 (‘95% optimization’). We compute the power of the different models by using a similar bootstrap method as before. The only difference is that, for each price regime, we now mimic random behavior by drawing randomly from the observed 51 collective choices.

Table 5 reports our findings. Overall, these results are closely similar to the ones in Table 4. First, goodness-of-fit is rather high for the three model specifications. Next, power is substantial for the 100% optimization model, but decreases considerably for the 97.5 and 95% optimization models. All this yields that the 100% optimization model has the best predictive success rate and, thus, can be regarded as the ‘best’ unitary model for describing the observed collective behavior.

Table 5 Unitary rationality—pass rate, power and predictive success

We conclude that the unitary model does a good job in describing observed dyad behavior in our experiment. For example, the predictive success rate of the 100% optimization model is about 72%. Next, we compare the empirical performance of this model with the one of the collective model with varying bargaining weight restrictions.

4.3 Collective rationality test for dyads

Let us then consider the empirical performance of the collective model. As an introductory note, we recall that our results for the individual choices provide do not reject the hypothesis of individual rational behavior (see our discussion of Table 4). This suggests that we can safely maintain individual rationality as an assumption in our following tests of collective rationality. Following this interpretation, the collective rationality test results can thus be seen as checking the validity of the Pareto efficiency hypothesis for the group (in casu dyad) decision process. As discussed above, Pareto efficiency is effectively the distinguishing hypothesis of the collective consumption model.

We test the collective rationality condition in Proposition 3 for the observed dyad choices; and we study varying restrictions for the bargaining weights by considering multiple values for a situated between 0 and arbitrarily large (or free; see Table 6). We recall that a = 0 corresponds to the collective model with constant bargaining weights; while a = free implies no weight restriction at all (i.e. the conditions in Propositions 2 and 3 coincide). Generally, lower values for a correspond to stronger weight restrictions (see Definition 3). Next, we again account for optimization error by using the Afriat inequalities in (7) with e = 1 (‘100% optimization’), 0.975 (‘97.5% optimization’) and 0.95 (‘95% optimization’). Finally, power is measured in the same way as for the unitary model.

Table 6 Collective rationality—pass rate, power and predictive success

Table 6 presents our results. We first consider goodness-of-fit. The model’s goodness-of-fit increases substantially when allowing for some optimization error. For all values of a, the 97.5% optimization model significantly outperforms the 100% optimization model: the former model has a fit above 90% (and often close to 100%) for any specification of a, while the fit of the latter model varies between 62 and 85% (depending on a). Next, we find that the 95% optimization model (only) slightly improves upon the 97.5% optimization model, which is obvious given that the latter has already a fit above 90%. Lastly, we observe that goodness-of-fit improves for higher a. This is not unexpected as higher values for a imply less stringent empirical restrictions.

As for power, we find that all models perform reasonably well. As expected, power decreases with the value of e. However, this power decrease as a function of e is generally less pronounced than for the unitary model (compare with Table 5). We also observe that power decreases when a increases, which is again not surprising. Interestingly, for all values of e the discriminatory power of the collective model (for any value of a) substantially exceeds the power for the unitary model. This is due to the experimental design that allows us to observe the personalized quantities. In our opinion, this suggests the empirical usefulness of our methodology to assess several specifications of the collective consumption model by means of experimental data. Such an assessment effectively allows us to meaningfully analyze the empirical validity of the different models.

Next, a most notable observation is that many specifications of the collective model (often substantially) outperform the ‘best’ unitary model in terms of predictive success. A lot of specifications have a predictive success of no less than 85%. Generally, our results suggest that the ‘best’ specifications of the collective model allow for a limited variation of the bargaining weights (i.e. low value for a) in combination with a small optimization error (i.e. e = 0.975). These specifications systematically combine a high fit value (about 95%) with a high power value (about 90%). In our view, these favorable results for specifications with restricted (but not constant) bargaining weights are particularly appealing. Indeed, as we discussed in the introductory section, such specifications may effectively provide a more realistic description of group consumption behavior than the model specification that does not include such restrictions.

As a final exercise, we have investigated the possible impact of some specific dyad features on our collective rationality results. Specifically, we conducted an analysis of variance (ANOVA) that included the following information for the participants in our experiment: duration of the period (months) that friends/partners know each other, different indicators of the degree of friendship, and alternative variables capturing the way in which friends/partners interact with each other. However, we did not detect any significant effects. A possible explanation may be the rather small size of our sample.

4.4 Towards household consumption data

Next, we examine the relevance of our findings for the analysis of household consumption behavior. Our study provides a useful complement of existing studies that assess the empirical validity of the collective model by using a parametric specification and/or ‘real life’ data. Deviating from these studies, we address the same question by using nonparametric (revealed preference) analytical tools in combination with experimental data. As such, our tests are ‘pure’ in that they avoid (1) a (non-verifiable) parametric structure that is imposed on the model, and (2) preference homogeneity assumptions and data measurement problems that are specific to ‘real life’ data.

We recall from our discussion in Sect. 2 that our analysis has focused on the egoistic model, in which each member’s utility function only depends on the own private consumption (excluding consumption externalities and public consumption). The reason is that empirical applications of the collective model based on real life data mostly assume this model. In this respect, we note that our first experimental test involved an unsophisticated consumption setting, with a very limited number of commodities and a low budget. Interestingly, our findings suggest that the egoistic model effectively constitutes a useful tool for describing collective choice behavior in such a simple setting: we obtain reasonably high predictive success rates for collective rationality models that allow for a limited variation of the bargaining weights. This can be seen as a minimal validity check for using the egoistic model in more sophisticated (real life) settings. Of course, the nature of our experiment only allows us to draw suggestive conclusions in this respect.

One final remark applies to using the newly proposed bargaining weight restrictions in the context of household data. It follows from our discussion in Sect. 2 that the practical application of such restrictions (through linear programming) actually requires that the personalized consumption quantities (\(\mathbf{q}_{t}^{m}\) for each member m) are observed.Footnote 15 This may be problematic in a household context: we argued in the Sect. 1 that household data sets usually only contain information on the aggregate household consumption and not on the individual consumption. In this respect, however, we must add that data sets with personalized quantity information are increasingly available in the literature (see, for example, Bonke and Browning 2006; Browning and Gørtz 2006; Cherchye et al. 2010b). For such data sets our methodology is directly applicable, which may thus obtain a vigorous revealed preference analysis of household consumption behavior in terms of the collective model.

5 Summary and conclusion

We have provided a first revealed preference test of the collective consumption model on the basis of experimental data. By using revealed preference methodology and experimental data, we avoid the usual problems associated with parametric tests (e.g. non-verifiable parametric structure) and the use of ‘real life’ data sets (e.g. preference heterogeneity). We have proposed a testing methodology that allows us to restrict the variation of the bargaining weights across choice observations in the revealed preference analysis.

Our empirical analysis has focused on the goodness-of-fit as well as on the discriminatory power of the unitary model and alternative specifications of the collective model (with varying bargaining weight restrictions). Adopting a recent proposal of Beatty and Crawford (2011), we have used a ‘predictive success’ measure to evaluate the overall empirical performance of a specific behavioral model. This measure also enabled us to compare this empirical performance for alternative model specifications.

Our results indicate that including bargaining weight restrictions may (often substantially) increase the discriminatory power of the revealed preference tests. In our opinion, this is a most interesting observation, as low power is a frequently cited concern for revealed preference analysis. Bargaining weight restrictions imply additional structure for the decision process that can easily be motivated on a priori grounds. Indeed, as indicated in the Introduction, it often seems reasonable to consider very large bargaining power shifts as unrealistic. As such, we believe our methodology to restrict bargaining weight variation provides a natural way to increase the power of the revealed preference analysis of collective consumption models.

At a more specific level, our results suggest that the choices made in our experiment are best described by a collective model that allows for a limited variation of the bargaining weights. This model specification has higher predictive success than the specification with constant bargaining weights and the specification with unlimited variation of the bargaining weights. In our opinion, this is a useful finding as the model with restricted (but not constant) bargaining weights may effectively be considered as a realistic model of group consumption behavior. Interestingly, this collective model also outperforms the unitary model in terms of predictive success. In particular, the model is characterized by substantially more discriminatory power than the unitary model, while the difference in terms of goodness-of-fit is much less pronounced.

At a more general level, our analysis provides further empirical support for considering non-unitary models to describe the behavior of multi-person groups (such as households). In particular, they motivate the use of group consumption models that explicitly recognize the individual preferences within the group. Next, our study demonstrates the usefulness of experimental analysis of group consumption behavior. In particular, such an analysis easily allows for obtaining information on consumption quantities for the individual group members, which enhances the power of the analysis.

As for follow-up research, we believe future work may fruitfully focus on the experimental analysis of group consumption processes in more complicated consumption settings. In this first study, we considered a rather simple choice setting with only two group members, three commodities and a very low joint budget. It would be interesting to contrast our findings here with test results that focus on more complicated settings that focus on more group members, many goods and/or a larger group budget. In doing so, one can also conceive settings that involve publicly consumed goods. For example, one can then investigate whether group decisions on these public goods are consistent with the ‘cooperative’ Pareto efficiency concept or rather the ‘noncooperative’ Nash equilibrium concept.Footnote 16 In a similar vein, future experimental studies may focus on consumption externalities associated with privately consumed quantities.

In this respect, one may also analyze alternative (non-consensual) group consumption models that account for independent individual decision making (see, for example, Grossbard-Shechtman 1984, 2003; Grossbard 2010). This would allow one to assess (and compare) the empirical performance of different models proposed in the literature on ‘New Home Economics’. We believe that our study illustrates the potential of using revealed preference methodology in combination with experimental data for addressing this type of questions, which can provide additional insight into the appropriate modeling of the consumption behavior of multi-person groups.