We use data collected by GfK, a German-based market-research consultant with an affiliate in Sweden. GfK has assembled a consumer-scan-panel that follows grocery shopping choices of 3000 households across Sweden. The data was collected with an electronic scanner and web-based diary entries. We use observations on each household shopping trip from January 2007 to January 2010. Not all participating households buy coffee. The dataset that we use consists of an unbalanced panel of 2782 households.
The participating households are chosen as a representative sample of the Swedish population, but were sampled using non-probabilistic methods typical for this type of market research data.Footnote 4 We observe household characteristics such as the age and level of education of the reference shopper and household annual income. Panel A in Table 1 compares the household characteristics of the sample with national averages in Sweden. There are only small differences compared to the national averages. In sample average annual income is 371,970 SEK, which is higher than the national average of 350,300 SEK during the period. The average reference shopper is slightly older than the average age of the population: 50.6 versus 48.9 years, respectively, (the average age of those 18 years and older, since the reference persons in the panel were all 18 or older). The share of households with a university education is lower in the sample than the national average: 33 versus 36%, respectively. The average size of the sampled household is 2.28, compared to the national average of 1.97.
Table 1 Summary statistics—households and their purchasing behavior
The households in our dataset appear to have diligently reported their retail coffee purchases, as seen in panel B of Table 1. On average, households purchased coffee in retail stores on 7.1 occasions per year, and the average annual household expenditure on retail coffee was 325 Swedish crowns (approximately 36 Euro using July 2008 exchange rates). In 2008, average coffee consumption in Sweden was 9.4 kg/capita/year.Footnote 5 Of this, roughly 60% was bought through retail channels for household consumption. The remaining 40% was consumed at work or in restaurants and cafes. Around 12% of the total consumption was instant coffee, which is almost exclusively sold retail. This means that, if our sample was representative and fully diligent in reporting all purchases, we would expect them to consume approximately 4.5 kg/capita/year. Our sample of households purchased an average of 3.9 kg/capita/year, which is close to the expected level of consumption. As with any Homescan data, some degree of under reporting is expected. Einav et al. (2010) compared the recorded purchasing behavior of US households in the Homescan data administered by AC Nielsen, with the purchasing behavior reported by stores. Overall, the authors found evidence that households are diligent and that Homescan data are a valuable source of information.Footnote 6
GfK questionnaires are completed by households when they join the consumer panel and then again every January, and cover a range of issues related to household shopping preferences. There are 35 questions in the questionnaire, and many questions have multiple alternative responses. One subset of questions relates to household choices of different types of products. We made use of one question regarding organic labeled products. The question was: “When I buy groceries I try, to the extent feasible, to buy organic products”. The respondent can tick one of six boxes; box 1 indicates “Totally Disagree”, box 5 indicates “Totally Agree”, and box 6 indicates “Don’t Know”.
On the product side, the data was matched to European Article Numbers (EANs), providing a description of each coffee product bought by the household, including the package size, brand name, whether it was labeled organic, Fairtrade, as well as other product characteristics. Table 2 summarizes the characteristics of the coffee products in our sample. Around 7% of the available choices were organic. We use data on purchases of all ground and bean coffee. Instant coffee was excluded.
Table 2 Summary statistics—coffee product characteristics, stores and choice sets
The data on households and the data on product varieties are linked via a database of market transactions. These market transactions describe the price and quantity purchased for each variety of coffee on a particular shopping trip date at a particular store by a particular household. There are 11 grocery store chains, each with varying store formats, which we group into four different classes: large supermarket, supermarket, discount store, and other. The combined dataset, therefore, includes household statistics, coffee product descriptions and a record of market transactions.
Choice sets
The dependent variable in our estimation of the demand system is the choice made by the consumer. For each shopping trip by each household, we construct a set of coffee products from which the consumer chooses, i.e., the choice set. The choice variable is discrete and binary: it is equal to 1 when a household purchases a particular variety and equal to 0 otherwise.
Homescan data provides observations on choices actually made by the consumer, but does not provide observations on choices that are not made. Hence, we cannot directly observe the coffee varieties amongst which the household can choose from a given shopping trip. However, the data are detailed enough for us to make use of observed coffee purchases by other households. We therefore construct the choice set for each shopping trip using the purchasing data of other household purchases from the same chain and store format (11 chains across 4 store format types is 44 combinations in all) for a given type of municipality (4 types) within a three month window. For example, a shopper buying a coffee product in a large store belonging to the “ICA” chain, in Stockholm in early June of 2008 faces a choice set of 30 coffee products. We observe only the choice made by this particular shopper. We identify the other 29 coffee products that are part of this choice set from the choices of other shoppers buying coffee at large stores belonging to the “ICA” chain, in Sweden’s largest cities, between mid-April and mid-July of 2008. A manual comparison with the assortment in some selected stores pointed to our generated choice sets as giving a generally accurate representation of the assortment.
There is limited variety in brand-organic combinations facing households when they shop. For example, at the 10th percentile of observations in the sample there was one organic coffee in the choice set. At the median, a household faced 2.5 organic coffee varieties in their choice set. This suggests that while organic coffee is widely available in Sweden, only a few organic coffees make their way into household choice sets.
A total of 43,252 shopping trips were observed in our data. However, the construction of choice sets expanded the size of the dataset to a total of 1,260,081 observations. The descriptive statistics in Table 2 are therefore based on the full, expanded sample.
We observed the actual price of the coffee product when it was purchased. However, estimating the demand system means we also need to infer the price of products at a store that were not purchased. To do this, we used a hedonic regression to generate prices for all of the products in the choice set (i.e., the price of the products that were not purchased by the household). The hedonic regression was run on the 42,143 observations on price. We regressed price per 100 g of coffee on brand fixed effects (35 in all), store fixed effects (by chain and store format: 44 in all), coffee country of origin fixed effects (5 in all), bean roast (3 types in all), monthly fixed effects (34 months), municipal-type fixed effects (4 in all), package size fixed effects (4 types), package type (5 types), and a fixed effect for decaffeinated coffee. The adjusted R
2 of this regression is 0.51 and the F-statistic for the joint significance of all variables is 320.15. Table 3 summarizes the results of this hedonic regression.
Table 3 Estimating the hedonic price for coffee, 2007–2009
The organic label has a positive and statistically significant coefficient at 0.791 SEK per 100 g of coffee. There is important variation in the estimated value associated with each brand. The omitted (reference) brand in the hedonic regression is Gevalia. The estimated coefficients for each brand therefore give an indication of the value of the brand relative to Gevalia. Lavazza, which is profiled as a high quality luxury Italian coffee, has the highest brand coefficient at 5.2 SEK per 100 g. In contrast Euro Shopper, which is a discount brand, has the lowest coefficient at −2.127 SEK per 100 g. The brand coefficients are in line with expectations.
The hedonic regression is used to predict the hedonic price. The mean hedonic price for the entire sample (Table 2) is around 52 Swedish crowns per kg, with considerable dispersion between the highest and lowest prices.