The residential sector contributes to more than 20% of global energy use (International Energy Agency, 2019). Governments and international institutions view improvements in energy efficiency among the key tools to reduce the energy intensity of the residential sector. While more energy-efficient products have a higher upfront cost, their lower consumption has the potential to make them better investments over their lifetime. However, the literature has documented the existence of an energy efficiency gap (Jaffe & Stavins, 1994), whereby agents’ inability to recognize such trade-offs leads to an underinvestment in more energy-efficient technologies.

Although the magnitude of the energy-efficiency gap has been questioned (Allcott and Greenstone, 2012), energy efficiency remains a key policy. In an effort to improve awareness of energy efficiency, various information tools are in place (Markandya et al., 2015; Ramos et al., 2015; Solà et al., 2021). Among the most well-known and widely adopted are energy efficiency labels (Collaborative Labeling and Appliance Standards Program, 2005; Energy Efficient Strategies, 2014). Examples include the U.S. “EnergyGuide", the EU “Energy Label", Australia’s star “Energy Rating" and the “EnergyStar" logo. The motivation for energy labels rests on the assumption that making energy information more readily available to consumers facilitates the comparison among different products as well as between purchasing price and operating costs, ultimately leading to better energy investment decisions.

Energy efficiency labels generally provide consumption estimates in physical units (kWh/annum) based on average energy use. However, it has been shown that people are likely to make mistakes when translating physical consumption into expenditures and savings (Allcott, 2011a, 2013; Brounen et al., 2013; Davis and Metcalf, 2016; de Ayala and Solà, 2022; Heinzle, 2012; Sammer and Wüstenhagen, 2006). Several studies have been conducted to assess the efficacy of energy labels and whether reframing energy information improves effectiveness, with mixed results (Andor et al., 2020; Blasch et al., 2019; Boyano et al., 2020; Boyano and Moons, 2020; Carroll et al., 2021; Davis and Metcalf, 2016; Heinzle, 2012; Heinzle and Wüstenhagen, 2012; Houde, 2018; Jain et al., 2018; Newell and Siikamäki, 2014; Shen and Saijo, 2009). Few papers investigate differential impact across appliances (Denny, 2022; Jain et al., 2018; Shen and Saijo, 2009; Solà et al., 2021, 2023; Stadelmann and Schubert, 2018), or the heterogeneity in consumers’ response to energy label (Houde, 2018; Jain et al., 2018, 2021). And none tests the same experimental treatment across countries. This highlights the need for large-scale evaluations on the impact of energy efficiency labelling on consumer choices (Allcott and Greenstone, 2012).

This paper tries to fill this gap by using an online randomised discrete choice experiment (RDCE), which adopts the same methodology in four countries, to investigate whether different ways of framing energy efficiency/consumption affect consumers’ preferences for energy efficiency and whether this effect is the same everywhere. Specifically, we ask respondents from Canada, Ireland, the United Kingdom and the United States to express their preferences for tumble dryers which vary over a number of attributes.

Information on energy efficiency/consumption is reported in three forms. As our benchmark, we use the letter scale of the EU Energy Label (for Ireland and the United Kingdom) and the EnergyStar logo (for Canada and the United States), with products being assigned to an energy class, or being given the EnergyStar, based on their physical energy consumption (kWh/annum). In a first manipulation, we convert the physical value into its monetary equivalent (the 10-years energy costs), based on average usage and national electricity prices. In a second manipulation, we derive individual-specific energy consumption according to self-reported use patterns. Also in this case, we express it in monetary terms for a 10-years time span. To the best of our knowledge, this is the first paper to test the provision of long-term monetary energy consumption information using the same experiment in a multi-country setting.

One of the core motivations behind efficiency labels is that inducing consumers to purchase more energy-efficient products will make them better off, irrespective of the external impact on the environment (Allcott & Knittel, 2019). Therefore, providing energy information in a clear and accessible way is of fundamental importance. We reframe energy consumption as the long-term cost of electricity because it should offer a more meaningful representation of energy information to individuals than physical energy consumption.

Previous studies have investigated the effect of providing monetary energy information in a variety of contexts: from TV sets in Germany (Heinzle, 2012); to vehicles in the United States (Allcott & Knittel, 2019), Norway (Brazil et al., 2019) and in a multi-country setting (Codagnone et al., 2016); to refrigerators in India (Jain et al., 2021), Switzerland (Blasch et al., 2019), Germany (Andor et al., 2020) and Greece (Skourtos et al., 2021); to the residential market in Ireland (Carroll et al., 2020, 2021). Some papers consider a range of different appliances (Denny, 2022; Solà et al., 2021, 2023; Stadelmann & Schubert, 2018).

This paper focuses on tumble dryers as they are one of the highest energy-consuming household appliances. Tumble dryer consumption depends solely on actual usage and derives from just one “fuel", namely electricity. Also, tumble dryers present the broadest range of ratings on the market with models carrying a ’C-rating’ still available for purchase at the time of the experiment (2018). This is not the case for other appliances where the lowest available rating is often A+. The combination of these factors makes tumble dryers the appliance for which it is most likely to observe an effect of monetary energy information. On top of that, none of the countries in the study have monetary labels for tumble dryers. The European Energy Label is a color-coded letter scale based on physical consumption, and while the United States and Canada provide annual energy cost labels for several appliances this is not the case for tumble dryers. Therefore, our treatments present new information in all contexts.

Table 1 Studies investigating the provision of monetary energy information

Considering the literature specifically on tumble dryers, Kallbekken et al. (2013) show that lifetime electricity costs reduce the average energy consumption of purchased products at retail stores in Norway only if coupled with staff training. The Department of Energy and Climate Change (2014) does not observe any effect in the United Kingdom. Carroll et al. (2016); Denny (2022) find no significant improvement of providing five-years and 10-years energy expenditures in Ireland, respectively. Stadelmann and Schubert (2018) show no positive effect of lifetime electricity costs in Switzerland. Solà et al. (2023) find that in Spain monetary information have a positive effect only when provided both by sale staff and on energy labels, and only for A+++ appliances. Table 1 summarizes the the methodologies and results of previous studies investigating the provision of monetary energy information. The findings of these studies are mixed and it is unclear whether this is attributable to different core products, methodologies or countries. By adopting a common framework which considers the same product and the same treatments in four countries, we are able to overcome this ambiguity.

Our paper builds on two main strands of literature. First, it is related to the literature on energy efficiency information (Allcott, 2011b, 2013; Allcott & Rogers, 2014; Allcott and Sweeney, 2016; Allcott & Taubinsky, 2015; Ayres et al., 2013; Brounen et al., 2013); and, more specifically, to that focusing on energy labels and their effectiveness (Andor et al., 2020; Blasch et al., 2019; Carroll et al., 2016; Denny, 2022; Heinzle & Wüstenhagen, 2012; Houde, 2018; Newell & Siikamäki, 2014; Sammer & Wüstenhagen, 2006; Shen & Saijo, 2009; Solà et al., 2021; Stadelmann & Schubert, 2018).

Second, our work draws from the literature on the framing of information and its impact on intertemporal choices (Kahneman & Tversky, 1984; Loewenstein, 1988; Loewenstein & Prelec, 1992; Lowenstein & Thaler, 1989; Tversky & Kahneman, 1981). Over the years, research on information framing has been applied to several contexts, including health (Block & Keller, 1995; Meyers-Levy & Maheswaran, 2004; Rothman & Salovey, 1997; Rothman et al., 1993), tax compliance (Hasseldine & Hite, 2003; Holler et al., 2009), and environmental behaviour (Loroz, 2007; Ropert Homar & Knežević, 2021; Van de Velde et al., 2010). In the context of energy efficiency, studies have investigated the effect of providing physical versus monetary energy information (Anderson and Claxton, 2014; Andor et al., 2020; Jain et al., 2021; McNeill & Wilkie, 1979), short-term versus long-term cost forecasts (Carroll et al., 2021; Heinzle, 2012), generic versus state-specific energy prices (Davis and Metcalf, 2016), and personalised information (Allcott and Knittel, 2019). Our paper contributes to the current debate by helping to shed light on the reasons behind the mixed effects evidenced by previous studies.

The remainder of the paper is organized as follows. Section 1 introduces the discrete choice theory and our experimental design. Section 2 describes the data and investigates the differences between the four countries in our sample. Section 3 presents the results of the analysis. Section 4 discusses some limitations and avenues for future research; and Section 5 concludes.

Methodology

DCE Overview

Discrete choice experiments (DCE) have gained popularity as a tool to elicit agents’ preferences for goods and services, since they help overcome some of the limitations presented by revealed preferences (RP) data. DCEs are a stated preferences (SP) method, usually involving surveys in which respondents are presented with repeated choice situations (called choice sets) comprising the comparison between two or more alternatives that vary over several attributes.

This type of experiment facilitates the measurement of non-use values, as well as the utility attached to individual attributes, which can be difficult to retrieve from revealed preferences data that often suffers from collinearity between attributes (Adamowicz et al., 1994; Carroll et al., 2021). In addition, it gives the experimenter a greater degree of control and flexibility than RP methods, coupled with the possibility to accommodate for the randomization between various treatments. The main drawback, as for any SP method, is represented by the hypothetical nature of the task. In most cases, the decisions people make do not have any real-world consequence (e.g. they do not actually purchase the product they selected among the array of alternatives), which introduces the possibility of hypothetical bias.

DCEs can be used to evaluate willingness-to-pay, to assess non-monetary valuation, to provide insights on consumers’ preferences, and to test the effectiveness of new policies. They were developed in the marketing literature (Louviere and Woodworth, 1983). Over the years, they have been applied to a number of other fields, including health (see Ryan et al., 2008, for a review of the literature), transport economics (Greene & Hensher, 2003; Hensher & Louviere, 1983), or environmental economics (Adamowicz et al., 1994; Aravena et al., 2014; Hanley et al., 1998). In the energy economics literature, DCEs have been used to study preferences for power generation (Rivers & Jaccard, 2005) and fuel mix (Komarek et al., 2011); to investigate willingness to pay (WTP) for energy efficiency improvements (Banfi et al., 2008; Carroll et al., 2016) and financial instruments for their adoption (Revelt & Train, 1998); and to evaluate the effectiveness of energy efficiency information and labels (Carroll et al., 2021; Davis & Metcalf, 2016; Heinzle, 2012; Heinzle & Wüstenhagen, 2012; Newell & Siikamäki, 2014; Sammer & Wüstenhagen, 2006; Shen & Saijo, 2009).

Empirical Strategy

DCEs are based on Lancaster’s characteristics theory of demand (Lancaster, 1966), according to which agents derive utility not from the good or service per se but from its characteristics (Lancsar & Louviere, 2008). Their empirical analysis follows random utility theory (McFadden, 1974), which posits that the utility consumer i derives from choosing good j can be decomposed into an explainable component (\(V_{ij}\)) and a random component (\(\varepsilon _{ij}\)):

$$\begin{aligned} U_{ij} = V_{ij} + \varepsilon _{ij}. \end{aligned}$$
(1)

The explainable or systematic component can then be expressed as a function of the good’s attributes (or at least some of them, \(X_{ij}\)) and the consumer’s individual characteristics (\(Z_{i}\)):

$$\begin{aligned} V_{ij} = X_{ij}' \beta + Z_{i}'\gamma , \end{aligned}$$
(2)

where \(\beta \) and \(\gamma \) are vectors of marginal utilities coefficients to be estimated.

While utility is not directly observed (it remains a latent quantity), we can assume that consumers choose the alternative that gives them the greatest utility out of all the available options. Therefore, the probability that agent i chooses alternative k is:

$$\begin{aligned} \begin{aligned} P(Y_{i} = k)&= P(U_{ik}> U_{ij}) \\&= P(V_{ik} + \varepsilon _{ik}> V_{ij} + \varepsilon _{ij}) \\&= P(V_{ik} - V_{ij} > \varepsilon _{ij} -\varepsilon _{ik}), \forall j \ne k. \end{aligned} \end{aligned}$$
(3)

For this to be estimable, a joint probability distribution for \(\varepsilon _{ij}\) needs to be specified. Typically, the error component is assumed to be independently and identically distributed as an extreme value type 1 random variable, thus resulting in a conditional logit form for the choice probabilities:

$$\begin{aligned} P(Y_{i} = k) = \frac{e^{\mu V_{ik}}}{\Sigma _{j=1}^{J} e^{\mu V_{ij}}} = \frac{e^{\mu X_{ik}' \beta + Z_{i}'\gamma }}{\Sigma _{j=1}^{J} e^{\mu X_{ij}' \beta + Z_{i}'\gamma }}, \end{aligned}$$
(4)

where \(\mu \) is a scale parameter inversely proportional to the variance of the error distribution which cannot be identified and is conventionally set to unityFootnote 1 (Lancsar and Louviere, 2008).

The standard conditional logit, however, presents some limitations. The assumption of the error term being independent and identically distributed (iid) implies that independence of irrelevant alternatives (IIA) is a key feature of the model. In addition, the preference parameters (\(\beta \)s) are assumed to be the same for all agents. Over the years, different models have been adopted to overcome these limitations. We decide to use a mixed multinomial logit model (or mixed logit for simplicity) for our analysis in light of its flexibility. As McFadden and Train (2000) demonstrate, any random utility model can be approximated with a mixed logit model.

The mixed logit model relaxes IIA and allows for heterogeneity of attribute coefficients across individuals (while keeping them constant for the same individual). In addition, it is efficient with repeated choices and therefore can accommodate the panel structure of the data thanks to its flexible substitution patterns which allow for within subject correlation (Lancsar & Louviere, 2008; Revelt & Train, 1998). The individual parameters are obtained by including an individual-specific stochastic component (\(\delta _{i}\)):

$$\begin{aligned} \beta _{i} = \overline{\beta } + \delta _{i}, \end{aligned}$$
(5)

where \(\overline{\beta }\) is the population mean (Lancsar and Louviere, 2008). Since, differently from the standard conditional logit model, the mixed logit does not have a closed form solution, it is estimated through maximum simulated likelihood.

Experiment Design

The DCE experimental design was carried out in JMP using the software’s Bayesian procedures, which allow for assumptions regarding the direction and variance of utility for each attribute. In particular, with JMP, we assume a utility range of one (split evenly across attribute levels) and a variance of 0.25. There were no dominant alternatives. Such a design enables us to assume, for example, that price is negatively correlated with utility whereas the number of stars in consumer rating is positively correlated.

The final design contained 32 pairs of choices — called choice sets (CS) — which were split across four blocks, leading to eight choices per respondent. Respondents were randomly assigned into one of the four blocks. Each choice set consisted of two tumble dryers and an opt-out alternative. Including an “opt-out" or “neither" alternative is desirable in contexts where respondents are presented with hypothetical pairs, since its absence would force them to choose between potentially unappealing options, a choice that might not be made in a real world scenario (Lancsar and Louviere, 2008).

Respondents were randomly assigned to one of three groups which differed in the way in which energy information is displayed — namely control with the customary energy information, treatment 1 with generic energy expenditures, and treatment 2 with personalised energy expenditures. Figure 1 reports the structure of the DCE, highlighting the points of randomization.

Fig. 1
figure 1

Structure of the experiment

The tumble dryers presented in each choice set vary over five attributes, which were chosen on the basis of previous research on household electric appliances (Carroll et al., 2016; Heinzle, 2012; Heinzle and Wüstenhagen, 2012; Sammer and Wüstenhagen, 2006; Shen and Saijo, 2009), through focus groups and in consultations with salespersons at retail stores.Footnote 2 The selected attributes are:

  1. (i)

    Price. Price is based on the range of models available on the market on electrical retailers websites in each country at the time of experimental design.

  2. (ii)

    Brand. Brand is characterized as either “established" or “new". An established brand is one with more than five years of activity that has developed a solid relationship with its customers. A new brand is one which has been operating for less than five years and has still not developed a solid relationship with the customers. Such a categorization was chosen to facilitate comparisons, as the survey was conducted in four different countries with different leading brands and attitudes.

  3. (iii)

    Capacity. Capacity is measured in kilograms (kg) for the Irish and British versions and cubic feet (cu ft) for the Canadian and US ones.

  4. (iv)

    Customer rating. Customer rating takes the form of a typical star rating. On electrical retailer websites there are almost no products with less than 3-star ratings. Therefore we use the range 3-5 stars in the experiment.

  5. (v)

    Energy efficiency. Energy efficiency is based on physical energy consumption (kWh/annum), also consistent with typical products available on electrical retailer websites.

At the beginning of the DCE, all attributes were presented and described to respondents with the aid of images. A summary of the attributes and their levels in each country is reported in Table 2. The way these attributes and levels were introduced to respondents is displayed in Figs. 2-9 in Appendix A.

Table 2 Attributes and levels by country and treatment groups

Participants were randomly assigned to one of three groups that differed in the way in which the energy efficiency attribute is presented. In the control group, energy efficiency is presented in the form of the typical energy label customary in the respective country: that is, the letter scale of the EU Energy Label for Ireland and the United Kingdom, and the EnergyStar logo for Canada and the United States. Tumble dryers were assigned a letter from C to A+++ or the EnergyStar logo based on their physical energy consumption (kWh/a) as shown in Table 2.

Treatment 1 frames energy efficiency as the 10-years energy costs according to the formula:

$$\begin{aligned} Energy \ cost = kWh/a \times \textit{national electricity price} \times 10 \ \textit{years}, \end{aligned}$$
(6)

where the physical energy consumption is considered for an average of 160 cycles per year,Footnote 3 and the electricity prices are €0.17 in Ireland, £0.15 in the United Kingdom, CAN$ 0.1465 in Canada and $ 0.1312 in the United States (all include VAT). In treatment 2, we still present energy efficiency as the the 10-years energy costs, however this is now based on individual-specific self-reported usage:

$$\begin{aligned} \begin{aligned} Energy \ cost =&\frac{kWh/a}{160} \times \textit{individual-specific weekly use} \\&\times 52 \ \textit{weeks} \times \textit{national electricity price} \times 10 \ \textit{years}. \end{aligned} \end{aligned}$$
(7)

Figures 7-9 in Appendix A provide examples of the descriptions of the energy efficiency attribute given to participants in each of the three groups, and Fig. 10 of the choice sets.

Estimation Strategy

As mentioned in Section 1.2, the mixed logit model distinguishes between parameters that are constant for all respondents (non-random parameters), and parameters that vary by respondent (random parameter). Therefore, the \(X_{ij}\) vector consists of both attributes with a constant impact on utility (\(N_{j}\)), and attributes which impact varies by individual (\(R_{ij}\)).

We keep price and consumer rating as constant for all individuals, since it is reasonable to assume that everyone prefers products with a lower price and a higher star rating. On the other hand, capacity, brand and energy efficiency are allowed to vary by respondent, since it is possible that different individuals have different preferences over these attributes. We relax this categorization in the robustness checks reported in Appendix C.

In all specifications we define energy efficiency as a dichotomous variable “high efficiency versus low efficiency" (\(EE_{ij}\)), based on the underlying level of physical energy consumption used in the experimental design (see the notes of Table 2). More specifically, the variable takes the value 1 for the three least efficient classes (which correspond to the higher consumption levels), and value 2 for the three most efficient classes (corresponding to lower consumption levels).Footnote 4 The effect of reframing energy efficiency in monetary terms is captured by an interaction between the energy efficiency variable and treatment dummies.

We estimate the following model in each country:

$$\begin{aligned} U_{ij} = \alpha _{j} + N_{j}' \beta _{N} + R_{ij}' \beta _{Ri}+ \beta _{EET1i} (EE_{ij} \times T1_{i}) + \beta _{EET2i} (EE_{ij} \times T2_{i}) + Z_{i}' \gamma + \varepsilon _{ij}, \end{aligned}$$
(8)

where \(\alpha _{j}\) is an opt-out alternative-specific constant; N j is the vector of non-random parameters and \(\beta _{N}\) a vector of their coefficients; \(R_{ij}\) is the vector of random parameters (including energy efficiency) and \(\beta _{Ri}\) a vector of their individual-specific coefficients; \(T1_{i}\) and \(T2_{i}\) are dummy variables for treatment 1 (the generic 10-years cost of electricity) and treatment 2 (the personalised 10-years cost of electricity); and \(Z_{i}\) is the vector of individual characteristics.

As aforementioned, the treatment effects are captured by an interaction between energy efficiency and the treatment dummies. Hence, the coefficient of energy efficiency alone gives an indication of the baseline value of this attribute on individuals’ utility, and the interaction terms represent the incremental effect generated by our treatments. The coefficients of the interaction terms (\(\beta _{EET1i}\) and \(\beta _{EET2i}\)) are also assumed to be individual-specific.

The models are estimated through maximum simulated likelihood using 1000 Halton draws. Standard errors are clustered at the individual level.

Data

The DCE was embedded in a survey distributed in November 2018 by the market research company ResearchNow in all four countries. The target are individuals who own and utilize a tumble dryer in their everyday life. Therefore, at the beginning of the survey, we screen out participants who do not have a tumble dryer in their home, or who never use it. The survey included demographic quotas based on National Census information to ensure a representative sample in each country.

The initial sample consisted of a total of 2,676 individual observations. However, we exclude respondents who did not provide any demographic information, who did not complete all 8 choice sets in the DCE, or who gave an extreme answer to the question: “Approximately how many times a week do you use your tumble dryer?".Footnote 5 This leaves 634 valid respondents in the Canadian sample (214 in the control group, 205 in treatment 1 and 215 in treatment 2); 581 in Ireland (198 in the control group, 189 in treatment 1 and 194 in treatment 2); 655 in the United Kingdom (220 in the control group, 218 in treatment 1 and 217 treatment 2); and 657 in the United States (208 in the control group, 228 in treatment 1 and 221 in treatment 2).

As a first step, we test whether there are significant differences between the four countries in our sample, or if it is possible to pool them together in our analysis. For this reason, we conduct likelihood-ratio Chow tests to verify if it is possible to pool Ireland and the United Kingdom in a European group, Canada and the United States in an American group, as well as all countries together.

Table 3 Likelihood-ratio test for pooled and country groups data
Table 4 Descriptive statistics and pairwise comparisons by country and treatment groups

Table 3 reports the results of the tests. As we can see, for all the combinations considered, it is possible to reject the null hypothesis that pooling the countries together is the same as treating them individually. Therefore, in the remainder of the analysis we will run separate models for each country. A possible alternative would be to run the model on the pooled sample and include interactions with a country variable. Such an approach has not been chosen because it would require adding two triple interactions between energy efficiency, country and treatment dummies to Eq. 8, which would complicate the interpretation of the treatment effects. In addition, the model should also include all the two-way interactions as well as the individual variables.

As mentioned in Section 1.3, participants were randomly assigned to the control group or one of the two treatments. We want to control if, in the various countries, there are differences between the three groups in terms of their demographics and other relevant individual characteristics. The Levene’s tests for homogeneity of variances, reported in Appendix B.2, show no significant differences in most of the cases. This is also largely confirmed by the pairwise t-tests reported in Table 4, which do not evidence major differences in the averages between the control and treatment groups. ‘Female’, ‘Degree’ and ‘Working’ are dichotomous variables taking value 1 if the respondent is female, has a university degree or is working, respectively, as 0 otherwise. ‘Age’, ‘Marital status’, ‘Environmental Concern’ and ‘Income’ are based on likert scale questions, with values as per options reported in Appendix B.1. ‘Patience’ and ‘Risk’ take values from 1 to 10, with higher values corresponding to greater patience and higher risk acceptance. ‘Tumble dryer usage’ is the self-reported weekly usage. The most notable differences are represented by a greater proportion of participants who hold a degree in the personalised energy cost treatment in Canada, and in the control group in Ireland. It is worth noting that most of the differences are relatively small compared to the dimension of the corresponding variable. Overall, these results suggest that the three experimental groups in each country do not present fundamental differences and are, therefore, comparable.

The discussion of how the sample characteristics match with the corresponding national values is reported in Appendix B.3. Overall, the sample is broadly representative of the national population in each country. However, it should be noted that we do not have information on the population of “typical tumble dryer owners.”

Results

Main Results

Table 5 presents the results of mixed logit regressions for the four countries separately. These are considered over the whole sample for each country, with the inclusion of interaction variables to account for treatment groups one and two as shown in Eq. 8. In Appendix C.1 we report separate models for the control group, the generic cost information treatment and the personalised cost information treatment: results are qualitatively identical. All regressions control for income, gender, living area, whether the individual holds a degree, environmental concern, impatience, risk attitude and tumble dryer usage. These coefficients are not displayed in Table 5 for ease of presentation. However, later in the paper we investigate if the effect of our manipulations differs by individual characteristics.

Although the magnitude of the coefficients does not have an immediate interpretation, their sign gives us an indication of the effect on the utility function. As it can be seen, attributes have the expected effect on utility, with, for example, price being negative — signifying that respondents would prefer cheaper products —, and star rating and capacity being positive – meaning that people would rather purchase a tumble dryer with better reviews and that can accommodate more clothes. Brand takes value 1 for an established brand and 2 for a new one, hence the negative sign of the coefficients represents the fact that respondents prefer products of established brands. Most of the random parameters’ standard deviations are significantly different from zero, suggesting the presence of substantial preference heterogeneity.

Energy efficiency presents positive and significant coefficients for all countries: more efficient models have a positive impact on utility. However, the interaction terms are insignificant in most of the cases, which means that presenting energy efficiency information in monetary terms (treatment 1) does not have any relevant effects on people’s choices, nor does personalising this information (treatment 2) produce any appreciable difference. There are however two exceptions. In the Canadian sample we find negative and statistically significant coefficients for the two interactions terms. This suggests that displaying energy efficiency information in monetary terms, rather than the simple EnergyStar logo, reduces the utility of more energy-efficient products. Conversely, for the UK, we detect a small positive effect (significant at the 10% level) of personalised energy costs information.

Table 5 Mixed logit models
Table 6 Mixed logit models - Willingness-to-pay estimates

It is important to note that the negative effect of our treatments in the Canadian sample can still be consistent with monetary information helping respondents making better investment decisions. The attribute values generated in JMP yield a composition of the generic energy information treatment in which the more efficient option has the highest total lifetime cost — given by the sum of the purchasing price and the 10-years cost of electricity — in 14 out to 32 choice sets (43.75% of the times). Respondents in the Canadian sample where shown the block with the greater frequency of choice sets presenting this characteristic more often than other countries. In addition, also in the personalised energy information treatment, Canada has the highest percentage of choice sets with the more efficient option having the highest lifetime cost. However, said percentage is still below 50% and not considerably different from that of the other countries: it is 42.6% in Canada, 41.9% in the United Kingdom, 39.9% in the United States and 39.5% in Ireland. Some extra considerations are needed. First, the experimental design excludes the possibility of strictly dominated options. Hence, for example, it makes perfect sense that the more efficient tumble dryer presents a higher total lifetime cost if it has a substantially greater capacity, since that would imply a higher purchasing price as well as operating costs. Second, the choice sets did not report the total lifetime cost, only price and energy cost information separately. It is therefore not possible to know whether respondents carried out such a calculation when taking their decisions. In light of this, we cannot confidently say if the negative effect of monetary energy information in the Canadian sample is the result of people correctly choosing the option with the lower lifetime cost or if it is attributable to other factors. The analysis presented later in this section and in Appendix D tries to shed light on some of these potential factors.

Table 6 presents respondents’ willingness-to-pay for the various attributes. The willingness-to-pay for attribute a is obtained as the ratio between the attribute’s coefficient and the price coefficient, \(WTP_{a} = -\frac{\beta _{a}}{\beta _{price}}\). As it can be seen, energy efficiency is the attribute with the highest WTP in all countries. When energy information is presented as the EU Energy Label’s letter scale, Irish participants are willing to pay €334 more for a tumble dryer from the three most efficient classes with respect to one from the three least efficient ones, and British participants £176 more. Similarly, respondents are willing to pay $288 more in the United States and Can$516 more in Canada for a product with the EnergyStar certification.

Consistent with the results in Table 5, our manipulations of the way in which energy efficiency information is displayed have limited impact on WTP. Once again, we highlight a negative effect of both treatments in the Canadian sample, where the WTP for energy efficiency decreases by roughly Can$118 when generic energy costs are provided, and by Can$126 with personalised energy costs. Conversely, in the United Kingdom, personalised energy information based on self-reported use patterns increases consumers’ WTP for energy efficiency by more than £81 with respect to the baseline level under the current letter scale framingFootnote 6, making consumers willing to pay almost £258 more for a tumble dryer in the three most efficient classes.

Table 7 presents the average energy costs savings (ECS) yielded by the more energy efficient tumble dryers. Comparing the values with the WTP for energy efficiency reported in Table 6, it appears that personalised monetary energy information makes WTP more in line with the expected cost savings. In the Canadian sample, respondents in the control version are overvaluing energy efficiency (\(\text {WTP}_{C} = \text {Can}\$ 516\) vs \(\text {ECS}_{C} = \text {Can}\$ 236\)). The reduction of WTP observed with personalised energy costs brings the WTP closer to the ECS (\(\text {WTP}_{T2} = 516 - 126 = \text {Can}\$ 390\) vs \(\text {ECS}_{T2} = \text {Can}\$ 270\)). In the UK, respondents in the control are undervaluing energy efficiency (\(\text {WTP}_{C} = \text {\pounds } 176\) vs \(\text {ECS}_{C} = \text {\pounds } 246\)). Personalised energy costs increase WTP, again bringing it closer to ECS (\(\text {WTP}_{T2} = 176 + 81 = \text {\pounds } 257\) vs \(\text {ECS}_{T2} = \text {\pounds } 271\)). In the Irish sample it is possible to detect a similar pattern (overvaluation of energy efficiency in the control, the negative impact of monetary energy information brings WTP closer to ECS) despite the absence of a significant effect of either treatment. Differently, in the United States, personalised energy costs seem to induce an undervaluation of energy efficiency, although also in this case the effect was not statistically significant. The effect produced by generic monetary energy information goes in the right direction, increasing WTP when too low and reducing it when too high, but is less pronounced. Overall, this suggests that monetary energy information helps individuals in their energy investments decisions by making their WTP for energy efficiency align with the expected energy costs savings. The effect is stronger for personalised information, although differences remain across countries.

Table 7 Average energy costs savings

In a series of robustness checks we relax the definition of random and non-random parameters in two ways. First, we allow all attributes except price, as well as the opt-out alternative-specific constant, to be individual-specific, hence estimating an error component model. Second, we adopt the opposite approach and restrict all coefficients to be constant for all respondents, which yields the classic conditional logit model. The results, reported in Appendix C.2C.3, are substantially in line with the mixed logit estimations presented in Tables 6 and 5.

Heterogeneity

The overall absence of a positive effect of providing personalised energy information, although contrary to our prior beliefs, is not unprecedented. Considering the automobile sector, Allcott and Knittel (2019) evidence a limited impact of personalised fuel costs on individuals’ purchasing decisions, which tended to disappear a few months after the intervention.

Hence, we investigate whether the effect of reframing energy information differs for various subgroups based on personal attitudes and demographics. One hypothesis is that a limited average usage of the tumble dryer might make energy costs less salient compared to the somewhat shrouded information contained in the current labels. Second, in light of the evidence suggesting that people are typically not very good at translating physical consumption into energy expenditures, one could expect that the provision of more explicit information might benefit mainly those with lower levels of education. A third hypothesis is that people concerned about the environment will tend to choose the most efficient product irrespective of the way in which energy information is framed, while those less concerned will pay more attention to the monetary aspects of energy consumption. Finally, income-constrained individuals can benefit more from energy information reported in monetary terms if energy bills are a considerable proportion of their expenditures.

With this in mind, we split the samples on the basis of the levels of self-reported weekly tumble dryer usage, educational attainments, environmental concern and income. For tumble dryer usage, we define as low usage values smaller than or equal to the median of the respective country, mid-high usage between the median and the 90th percentile, and very-high usage as the top 10th percentile in each country.Footnote 7For education, we distinguish between respondents with and without a degree. For environmental concern we split the sample into participants who say to be concerned or extremely concerned about the environment, and those who are slightly concerned, not concerned or do not know. Lastly, we separate between people stating to live comfortably or very comfortably on current income, and those who do not live comfortably or are coping on current income.

We defer results of these estimations and the corresponding WTP to Tables 1825 in Appendix D. Here we present a discussion of their implications.

For all countries, and in particular for Canada, there are considerably more people in the low usage category. In fact, it is for this subgroup that personalised energy costs lead to a significant decrease in consumers’ utility in the Canadian sample. On the other hand, for respondents in the top 10th percentile of the respective distribution, personalised information presents positive coefficients in all countries, with the effect being significant for the UK. Both these instances seem to confirm that the results in Tables 6 and 5 could be due, at least in part, to a limited average usage of the tumble dryer in our sample. In addition, as aforementioned, it is possible that for the low-usage subgroup the less efficient option is more convenient in terms of total lifetime costs.

Conversely, the hypothesis that providing more accurate and personalised energy information should benefit mostly those with lower levels of education is not substantiated. Although the coefficients of the two treatments become positive (but insignificant) for respondents without a degree in the Canadian sample, this effect does not apply to the other countries. If anything, we observe outcomes that are somewhat contrary to this belief. The positive effect of the personalised energy costs treatment in the United Kingdom comes from the subgroup of respondents who hold a degree. While generic energy costs generate a negative effect for Irish participants without a degree.

In each country, participants who state they are concerned about the environment have a higher WTP for energy efficiency than those who say they are not. In the subgroup of less concerned respondents, providing energy costs has a general positive impact on consumers’ utility, which represents a statistically significant improvement with respect to the EU Energy Label’s letter scale in the Irish and British samples. Presumably, these households care more about the consequences that energy use has on their wallet than on the environment, so reframing energy information in monetary terms improves their WTP for energy efficiency since it allows them to realise how much less they would spend with a more efficient model. On the other hand, monetary information has a generally negative effect for individuals concerned about the environment, which is statistically significant in the Canadian sample. People who care about the environment would buy the more efficient model to reduce their environmental impact, and this impression might be conveyed more strongly by a graphical representation like the EnergyStar logo or the colour-coded letter scale scheme of the EU Energy Label than by energy expenditures, especially if usage is low. Hence, providing monetary information seems to crowd out the motivation of those who would buy a more energy efficient tumble dryer for environmental reasons.

Finally, we do not detect a clear impact of individuals’ income on the effectiveness of our treatments. Personalised energy costs information does increase utility for income-constrained people in the United Kingdom, but this effect does not translate to the other countries — apart from a positive but insignificant coefficient in the Irish sample. In addition, the negative effect in the Canadian sample interests both more and less wealthy individuals.

Limitations and Future Work

This paper uses a particular stated preference method (a DCE) to evaluate the effectiveness of alternative framings of energy efficiency information on consumers’ WTP for energy efficiency. As mentioned in Section 1, SP methods can suffer from hypothetical bias. The empirical investigation of hypothetical bias in choice experiments is still limited (Haghani et al., 2021), and the size and direction of the bias depends on the study design and context (Fifer et al., 2014). In economic applications, hypothetical bias typically leads to overestimation of WTP (Buckell et al., 2020). It is up to researchers to incorporate in the experimental design appropriate techniques to reduce the emergence of hypothetical bias (see (Hensher, 2010; Loomis, 2011) for a discussion). Our experiment employs several of these techniques. Firstly, since evidence shows that familiarity with the product and context alleviates hypothetical bias (Schläpfer & Fischhoff, 2012), we restrict our sample to participants who own and use a tumble dryer. Second, at the beginning of the experiment, participants were presented with a “cheap talk" statement (Cummings & Taylor, 1999) inviting them to express their true opinions and preferences. Finally, all choice sets included an opt-out alternative (Ladenburg & Olsen, 2014). Unfortunately, our design does not support incentive compatibility since participants had to be remunerated irrespective of their choices.

It has been found that empirical economics papers are typically underpowered (Jaffe & Stavins, 2017). Having low statistical power increases the chances that the analysis does not detect a significant effect even if a relationship actually exists. The presence of interaction terms further exacerbates this problem. One way to increase the statistical power is to expand the sample size (Christensen & Miguel, 2018). This study considers only tumble dryer users who are also representative of the respective national population. Meeting these requirements entails higher recruitment costs, so the sample sizes were dictated by the available budget. While the sample size for each country version in our study is almost twice as large as that used in other papers performing a similar analysis (Carroll et al., 2021), we cannot exclude that the absence of significant treatment effects in certain versions is due to a lack of statistical power. Hence, future research should try to employ bigger samples.

Certain aspects of the experimental design present some limitations. Given the complexity of designing the experiment across different jurisdictions with different labelling contexts, currencies and energy prices, the experimental design was generated based on the generic letter label. The design was then converted for each country and each treatment. This generated some choice sets where the more energy efficient appliance has the highest total lifetime costs. The distribution of these choice sets into the blocks was uneven, with block 1 having four, block 2 having three, block 3 having five, and block 4 having two. In addition, the number of respondents assigned to each block is not the same in all countries and experimental versions. The combination of these aspects could make the results less reliable. Future studies should aim to avoid these limitations, testing the composition of choice sets in each treatment and ensuring there are no asymmetries between the various experimental versions.

Another aspect worth highlighting is that the study was done before the COVID-19 pandemic, before the war in Ukraine, before the energy crisis and the high inflation. At that particular point in time, electricity prices were lower, so that in fact, a rational consumer may indeed have been better off buying a less efficient model. The latest estimates of the prices of one kWh of electricity are €0.38 in Ireland (Eurostat, 2023), £0.34 in the United Kingdom (Energy Guide UK, 2023), Can$ 0.165 (GlobalPetrolPrices.com, 2023) and $0.16 in the United States (U.S. Energy Information Administration, 2023). Further research is needed to assess the effectiveness of monetary energy information under high electricity prices.

While tumble dryers present the broadest range of energy efficiency ratings, this is in good part explained by the absence of the heat pump technology in less efficient models. Our experimental design, however, did not include the heat pump technology among the product attributes. Future DCEs investigating tumble dryers could include a heat pump attribute to evaluate if and how it impacts the results.

Finally, although stated preferences methods represent an invaluable tool to investigate consumers’ behaviour in a variety of contexts and to assess the effectiveness of new policies thanks to their flexibility and ease of implementation, some studies have found differences in effects between online and field trials (Allcott and Taubinsky, 2015). So, future research should consider the value of coupling survey data with field experiments and revealed preferences data.

Conclusion

It has been asserted that the current kWh information reported on energy labels might not be sufficient to help consumers make well-informed energy efficiency investments. The literature has documented that individuals often struggle to interpret energy information when provided in physical units. Reframing energy information in monetary terms could allow them to make better and more informed purchasing decisions. Prior studies have investigated the effect of providing monetary energy information in several contexts. Outcomes have been mixed, and it is not clear whether this is to be attributed to the use of different core products, the employment of different methodologies, or the fact that they were conducted in different countries. This paper represents the first attempt to clarify that ambiguity by examining the impact of lifetime energy expenditures employing the same experiment in a multi-country setting.

Our findings show that monetary information has different impacts in different countries. In Ireland and the United States we fail to detect any significant effect of providing lifetime energy expenditures. In Canada, both generic and personalised monetary information reduce the willingness-to-pay for energy efficiency with respect to the EnergyStart logo. In the United Kingdom, individual-specific energy costs have a small positive impact on people’s preferences for energy efficiency. We also observe that monetary energy information seems to make individuals’ WTP for energy efficiency more in line with the expected energy costs savings. Disentangling the effect based on demographic and socioeconomic characteristics highlights that the negative effect comes primarily from individuals who make less frequent use of the tumble dryer, and that monetary information seems to crowd out the motivation of respondents who would buy a more efficient model for environmental reasons.

Framing information in monetary terms is often regarded as a promising option to favour the uptake of more efficient appliances. The results of this paper, while signalling the potential usefulness of monetary labels in helping consumers make more informed energy investments decisions, show that heterogeneity exists in consumers’ response, both across countries and demographic groups. This suggests that to improve the communication of energy efficiency information there is no silver bullet. Hence, to design effective interventions to reframe energy efficiency information, policymakers should carefully evaluate the characteristics of the context where these are to be implemented.

At present, there is not enough information on the effectiveness of different measures in different contexts, and more evidence is required. Moreover, often times there are considerable differences also between regions of the same country — for example in terms of energy prices (Davis & Metcalf, 2016) — which are important to consider and evaluate. Unfortunately, our paper does not have a large enough sample size to accommodate a regional within-country approach. Therefore, future research with larger samples could conduct such an analysis.

Our work paves the way for new research to examine additional products. Examples already exist as stand-alone analyses for refrigerators (Andor et al., 2020; Jain et al., 2021; Kallbekken et al., 2013), TV sets (Heinzle, 2012), washing machines (Department of Energy and Climate Change, 2014), cars (Allcott & Knittel, 2019) and the housing market (Carroll et al., 2021). In addition, labelling is only one means of promoting investments in energy efficiency — others include, but are not limited to, direct regulation, tax reductions, financial incentives, etc. Therefore, future efforts should be devoted to develop structured, large-scale studies to understand what is the most effective intervention in various contexts.