1 Introduction

Since the 1980s, and more so after the Rio Earth Summit in 1992, ‘eco-labels’ have become popular policy measures aimed at encouraging consumers to adopt environmentally friendly consumption (Horne 2009, p. 179), a pressing need in the context of increased commitment to tackle climate change. Such labels are designed to offset the information asymmetry between manufacturers/providers and consumers in various domains, from domestic energy supply (Momsen and Stoerk 2014), to motor vehicles (Teisl et al. 2008), wine products (Delmas and Grant 2014), and food (Vlaeminck et al. 2014).

However, the use of labels has been criticised on the grounds that they are based on the assumption that consumers and firms behave irrationally; on the absence of evidence of an energy-efficiency gap (Gayer and Viscusi 2013, p. 249); and on the claim that labels may lead consumers to over-value energy consumption in the purchase of goods (Sahoo and Sawe 2015). As a result, policy instruments such as ‘eco-labels’ must be based on robust evidence.

This pressing need for evaluation is further required by the debate on libertarian paternalism (Rebonato 2014; Thaler and Sunstein 2003), since labels are typically discussed as tools for nudging consumers. The ‘nudge’ strategy (Thaler and Sunstein 2009; Sunstein 2013) is a new trend in evidence based policy making that draws upon behavioural insights in the design of public policy interventions (Codagnone et al. 2014a; Sousa Lourenço et al. 2016; van Bavel et al. 2013, 2015).

In the EU, labelling was introduced to tackle climate change in relation to passenger vehicles, Directive 1999/94/EC. The formal evaluation of the car labelling Directive started in 2015, and was seen as an opportunity to rethink labelling in the context both of a better knowledge of policy evaluation and of behaviourally informed consumer policy.

This article reports findings from a laboratory experiment and a multi-country online experiment on the effects of eco-labels for cars and promotional material in orienting consumers’ purchasing choices. The experiments, both randomized control trials, were undertaken in the framework of a study for the European Commission supporting the revision of the existing European ‘car labelling’ Directive (Codagnone et al. 2013). The study was designed to contribute evidence on eco-labels effectiveness from an experimental behavioural perspective.

In the lab, the main task was to select a car among a set of options (discrete choice task), with the cars shown with a label on the side, and to answer a set of questions afterwards. We used the choice as a revealed preference in terms of fuel efficiency and CO2 emissions, and we checked if, in answering the questions, the participants correctly processed the information shown on the label.

In the online experiments, the main task was to declare the willingness to pay (WTP) for a car, through a multiple price list (MPL), and answer a set of question afterwards. In this case, as well, the car is shown with a treatment on the side (a label or a customized promotional material). We use the declared price as elicited willingness to pay, and we check if in answering the questions the participants correctly process the information shown on the treatment.

In designing the variants of car labels and other materials, we faced the challenge of combining legislative requirements prescribing what information must be provided in a label with the theoretical and empirical evidence on the relevant heuristics and biases and that could be used to nudge consumers in the desired direction. Thus, the study was an exercise in ‘realistic nudges’—testing interventions that were both compatible with existing legislation and stood a good chance of not being resisted by the stakeholders, especially manufacturers.

The main results were as follows. Labels directing attention to fuel economy or running costs were more easily understood by the consumers, and impact on choices through a form of mental accounting of fuel consumption. The same result was found in the elicited willingness to pay. In particular, large and expensive cars tend to be given less value when fuel economy is made salient. Information on CO2 emissions and green issues in general had little impact unless it was linked to future fuel saving.

This article is organized as follows. Section 2 presents the state of the research on behavioural science applied to ecological behaviour, together with the relevant regulatory framework; Sect. 3 details the experimental methods and materials; Sect. 4 the results and Sect. 5 provides a discussion of the findings, the limitations of the study and the policy implications. Supplementary Online Materials (SOM hereafter) include the technical details and related information.

2 Review of the literature and of the regulatory framework

2.1 Eco labels

Eco-labels for white goods, energy provision, food, etc. as a signalling method to encourage consumers toward sustainable consumption have been studied for decades. Several reviews and discussion essays are available (Anderson and Claxton 1982; Bougherara et al. 2005; de Boer 2003; Dyer and Maronick 1988; Galarraga 2002; Horne 2009; McNeill and Wilkie 1979; Pedersen and Neergaard 2006). There is an equally extensive literature in social-psychology and marketing focusing on factors explaining the adoption and acceptance of eco-labels and how they affect consumers’ tastes and preferences (e.g. Bamberg 2003; Clark et al. 2003; Gadenne et al. 2011; Moon et al. 2002; Rubik et al. 2007; Teisl and Roe 2005; Thøgersen 2000, 2002, 2005; Thøgersen et al. 2012; Thøgersen and Noblet 2012).

The literature includes studies on how eco-labels effectiveness depends on the social psychological characteristics and value orientations of consumers, and on the modality of information provision. Other studies take a more holistic approach (c.f. Gadenne et al. 2011 for labels in general; Teisl et al. 2008 for car labels) considering consumers’ characteristics and values, the design of labels, as well as situational and objective factors, for example economic and market conditions, existence of regulation, taxation and subsidies.

In the social psychology literature, Thøgersen and colleagues highlight the importance of constructs such as ‘environmental involvement’ but also of the credibility of labels and their relevance to choice (Thøgersen 2000, 2002, 2005; Thøgersen et al. 2012; Thøgersen and Noblet 2012). The credibility of labels can be influenced by consumers’ prior beliefs (Teisl 2003). Other important social-psychological constructs include an individual’s level of ‘environmental concern’ (Bamberg 2003), as well as environmental beliefs and norms (Gadenne et al. 2011, p. 7687). The perceived effectiveness of their own behaviour and confidence in the behaviour of others appear to be positively associated with increased impact of labels as sources of information (Berger and Corbin 1992; Bougherara et al. 2005).Footnote 1 The association between socio-demographic characteristics (i.e., education, age, gender, and income), and trust in labels, eco-attitudes and behaviour is mixed and inconclusive (Blend and Van Ravenswaay 1999; Clark et al. 2003; Loureiro et al. 2001; Moon et al. 2002). The effectiveness of labels increases when consumers can rank competing products by key attributes (Teisl and Roe 2005; Teisl et al. 2005). Comparative labels are also considered a potentially effective way of rendering complex numerical information into simple categorical scales (Harrington 2004; Peters et al. 2007, 2009).

2.2 Eco labels for cars

Green products or services, prior to purchase and use are called ‘credence goods’ whose features cannot be appraised objectively (Delmas and Grant 2014, p. 9). Motor vehicles can be evaluated using objective technical (i.e., engine size) and experiential attributes. However, their eco-friendliness is a credence attribute (Teisl et al. 2008, pp. 143–144) that is not easily verifiable either ex ante or ex post. When choosing two products or services (c.f. renewable versus traditional energy service provision) for which the pairwise ranking of all relevant attributes is not consistent, consumers ignore some dimensions and subjectively reconstruct dominance of one of the two options to choose. In so doing they violate the independence of irrelevant alternatives axiom of expected utility theory (Momsen and Stoerk 2014, p. 378). As Sunstein (2013, p. 63) explains, when buying a car or a refrigerator some features become ‘shrouded attributes’ with potential implications for regulatory policy. Starting from such premises, in 2010 the Office of Information and Regulatory Affairs (OIRA, see brief account in Sunstein 2013, pp. 84–89) began collaborating with other US governmental agencies (i.e., EPA, DOT, NHTSA), leading to the enactment of various measures (including labels) concerning the fuel economy of vehicles (EPA and DOT 2011a, b).

Teisl et al. (2008) observe that compared to the general literature on eco-labels focusing on white goods and food, the study of car-labelling is less developed (Choo and Mokhtarian 2004; Kurani and Turrentine 2002; Lane et al. 2012; Lane and Potter 2007; Noblet et al. 2006; Teisl et al. 2005, 2008; Teisl 2003).

At the time of designing the current study, no reports on experiments on car eco-labels could be found. One of the few published studies took a holistic approach with survey data from a sample of registered vehicle owners in the United States. (Noblet et al. 2006; Teisl et al. 2008). Teisl et al. (2008) reported that well-designed labels affect individuals’ perceptions of the eco-friendliness of products and general awareness of environmental problems. They conclude that consumers’ perceptions will shift gradually and that labels have a role in this longer term process of social change.

The literature points to three main issues characterising car purchasing (Codagnone et al. 2013;Footnote 2 Lane et al. 2012; Lane and Potter 2007; Noblet et al. 2006; Teisl et al. 2008).

  1. 1.

    Eco-friendly attributes play a secondary role and are dominated by other attributes such as price, performance and safety.

  2. 2.

    Car purchasing is a two-stage process; in the first stage the class of car is determined. Then in stage 2 attributes including eco-friendliness and fuel economy come into play to select a particular model in the preferred segment.Footnote 3

  3. 3.

    Surveys indicate that fuel economy is considered more important than CO2 emissions and other environmental attributes (Codagnone et al. 2013; Lane et al. 2012).

2.3 The contribution of environmental behavioural economics

Two recent reviews discuss the potential contribution of ‘behavioural environmental economics’ in the design of interventions influencing consumers towards sustainable consumption (Croson and Treich 2014; Lavrijssen 2014). A major focus in this literature is on ‘social norm’ nudges (“look what your neighbours are doing”)Footnote 4 and/or ‘default’ nudges in the household utility consumption (energy and water) and waste recycling (Allcott 2011; Allcott and Mullainathan 2010; Allcott and Rogers 2014; Bernedo et al. 2014; Carlsson and Johansson-Stenman 2012; Graffeo et al. 2015; Kaenzig et al. 2013; Momsen and Stoerk 2014; Sunstein et al. 2014; Sunstein and Reisch 2013). A single study tested other ‘nudges’—priming, framing, mental accounting, decoy—in addition to social norms and defaults, finding that only the latter two were effective (Momsen and Stoerk 2014).

On eco-labels the experimental evidence is limited to two German studies (Heinzle and Wüstenhagen 2012; Kallbekken et al. 2013) and one in Denmark (Ölander and Thøgersen 2014) that focused not on cars but on energy labels for domestic appliances. No experiments have been conducted on car labels after the adoption of the European car labelling Directive. There are only three studies that are more or less relevant to topic (Achtnicht 2012; Hilton et al. 2014; Galarraga et al. 2014). Achtnicht (2012) did not test car labels but rather elicited the willingness to pay (WTP) among German consumers for cars with reduced CO2 emissions. WTP decreased among those who reported lower price ranges for their next car purchase and that differences in WTP by age, gender, and educational level were ‘of weak statistical significance or insignificant’. Hilton et al. (2014) tested the effectiveness of bonus-malus taxes in encouraging consumers to use less polluting means of transport and concluded that such interventions are effective due to price and a social normative effect. On willingness to pay, Galarraga et al. (2014), used a dataset with official and retail prices for cars in Spain, matched to the eco-classification label. They found that vehicles falling in category A and B were sold at a 3–5.9 % higher price compared to cars with similar characteristics but lower energy-efficiency labels.

A debate in environmental behavioural economics is whether eco-labels qualify as ‘nudges’. Since labels supply information, it is germane to ask whether and how they differ from traditional information campaigns (Kosters and Van der Heijden 2015, p. 279). Ölander and Thøgersen (2014) argue that eco-labels provide information at the point of sale but only change the choice architecture for consumers if and when they become familiar and are considered credible. Building on Peters and colleagues’ discussion of labels that perform well in summarising complex numerical information (Peters et al. 2007, 2009), Johnson et al. (2012) consider ‘good labels’ (as opposed to the bad) as an instrument of attribute parsimony reducing information overload. By changing the decision architecture such labels qualify as nudges. Further support for considering ‘eco-labels’ as nudges is the evidence that their design affects consumers’ perceptions (Heinzle and Wüstenhagen 2012; Teisl et al. 2008) an issue to which we will return in the discussion of our findings.

3 Materials and methods

3.1 Framework of the study

The data for this study comes from a multi-country project commissioned by the EC in the formal process of monitoring and evaluation of the ‘car labelling’ Directive (EC 1999).

The Directive is part of the efforts to meet the target of reducing economy-wide greenhouse gas (GHG) emissions by 40 % by 2030 compared to 1990 levels (EC 2016). An ‘energy label’ was introduced in 1992 (EC 1992), revised in 2010 (EC 2010), and then extended to passengers’ cars with the ‘car labelling’ Directive (EC 1999; for more details see Sect. 2 SOM). This directive specifies that labels must contain CO2 emissions displayed near the car at the point of sale (Art. 6 and Annex IV) and that all promotional materials (including different forms of advertising from written texts to videos) must include fuel consumption and specific CO2 emissions data of the car model to which it refers. These prescriptions leave considerable discretion to Member States (MSs) for the implementation of the Directive. The directive (together with the regulations EC 2009, 2014, 2015) form the cornerstone of EU policy to reduce CO2 emissions from passengers’ cars, which in EU-28 are the largest source of energy consumption and CO2 emissions among all labelled products.

The formal evaluation of the car labelling directives started in 2015 (see Sect. 2 SOM).

For the purpose of this study, promotional material is a graphic artefact (not written text only, but not a video) that may appear in different media (point of sale leaflets, advertising in magazines and newspapers, online advertising, etc., except television) promoting a particular car. This choice, in addition to being realistic with regard to current practices in the market, also offers the opportunity to compare the effects of official labels that fulfil the stricter legislative requirements with other forms of information provision (i.e., promotional material) that are less constrained.

Evaluation reports commissioned by the European Parliament and the Commission or undertaken by consumers’ associations (e.g. Branningan et al. 2011) agree in observing that fragmentation in the implementation of car labels and promotional material across the members states is generating confusion among consumers. In the ten countries covered in the current study (Denmark, France, Germany, Italy, Netherlands, Romania, Poland, Spain, Sweden, and UK) only in the UK running costs are reported together with CO2 emissions. Four countries (Italy, Poland, Romania, and Sweden) have not implemented a graphic label with a classification system, and of the remaining six, in three (Belgium, France, UK) the absolute classification system is adopted and in three (Germany, Netherlands, Spain) a relative scale is used (see Table 1 in Sect. 2 of SOM).

Table 1 Factorial design of treatments

Our study includes two experiments—a laboratory experiment and an online survey designed as Randomized Control Trials (RCTs)—with the aim of testing cognitive processing and behavioural choices in a discrete choice task (in the laboratory) and elicited willingness to pay (in the survey) with the inclusion of environmental information in labels and promotional material.

The design of the treatments in our study resembles the process of consultation between OIRA, other agencies, and the public described by Sunstein (2013, pp. 84–89). Combining the perspectives and the suggestions of different stakeholders, we arrived at an agreement on the key questions for the EU policy makers (e.g. emission scale, running costs etc.).

Needless to say, the optimal design of treatments would have required a test of all possible combinations of the different bits of information that can be included into labels or promotional materials. However, budgetary and time constraints rendered impossible the use of a full factorial design that included testing all possible labels and promotional materials generated by all possible combinations. In the companion report (Codagnone et al. 2013) we discuss the main effect of specific informational elements on a number of outcome variables (e.g. we report the main effect of running costs, of vertical emission scale, etc.). In the present paper, we treat labels and formats of promotional materials as holistic treatments: i.e., we test the label as such and we do not aim to isolate the effect of a specific information bit (e.g., running cost per month). In order to maximise ecological validity and to implement a sophisticated dynamic randomisation, a database of 478 cars containing all relevant attributes (image, price, running costs, taxes, emissions, etc.) was constructed for all ten countries (see Sect. 5 SOM).

3.2 Laboratory experiment

The Lab experiment was computer based and conducted on a convenience sample. Each participant had to perform a discrete choice task, selecting the preferred cars among a set of options, and answering a set of questions afterwards. The cars were randomly selected from a car database, from the car class preferred by the respondent. Each car in the discrete choice task was presented with a label aside, which showed the technical features of the cars and its environmental features: the format of the label was the tested treatment, with control condition corresponding with the existing label in the UK (between subjects design). The rankings of the selected cars in terms of fuel consumption and CO2 emission represent the behavioural variables; the correct processing of the information shown in the label, according to the answer to the post-treatment questions, represent the cognitive variables.

In more details, the experiment was conducted in the London School of Economics and Political Science (LSE) Behavioural Lab with 403 participants in November 2012. In order to recruit the participants, the Lab sent emails advertising the experiment to a list of contacts. Those contacts were persons who had offered availability for this kind of activity, and include current and former students and other personnel of the University. The content of the email was very general and do not explain the treatments or the aim of the research. The London School of Economics Research Ethics Committee provided the ethical authorization.

Participants gave their informed consent to participate and received a fixed participation fee. The experiment was programmed by one of the authors. Completion of the experiment required on average 18.13 min, with standard deviation 6.88 min.

The car label currently used in the UK and twelve other variants were designed as experimental treatments. The current official label was the control condition. It should be noted however, that the UK label compared to those implemented in other countries contained additional information (i.e. running cost in the form miles per gallon and the cost of Vehicle Excise Duty). From the evidence in Lane et al. (2012), it emerged that more than 50 % of UK consumers were familiar with the graphic label. Hence, from an ecological validity perspective it was deemed inappropriate to use as control just a plain text.

The labels comprised alternative pieces of information as follows (examples of all labels tested in the lab are available at SOM: Sect. 3.3):

  1. 1.

    Graphical layout of the CO2 emissions classification system (scale presented in a vertical versus a horizontal format);

  2. 2.

    Type of emissions classification, absolute (comparing with all other cars), relative (comparing with cars in the same class) or combined (mixing the two);

  3. 3.

    Running costs, expressed per mile, per month or per 5 years;

  4. 4.

    Additional information on lost saving on fuel compared to the best vehicle in class;

  5. 5.

    Additional information on CO2 taxation.

In Sect. 2 of the SOM we explain the background of absolute and relative classification for emissions. In the experiment we used a simplified version of CO2 scales. In the absolute classification, the class of CO2 emissions was calculated and depicted on the scale A–G along with the level of emissions, less than 100 g CO2/km: A; between 100 and 200: B; etc. In the relative system, the class was determined through a comparison with cars in the same class. Car classes or segments are: Economic Sport Utility Vehicles; Executive Cars, Expensive Sport Utility Vehicles, Large Family Cars, Large Multi-Purpose Vehicles, Micro-cars, Roasters, Small Family Cars, Small Multi-Purpose Vehicles, and Superminis. Finally, the combined classification reported both types of information.

As graphical layouts and classification schemes vary across the member states, it was important from a policy perspective to test their relative effectiveness. Candidate explanations for differential effectiveness include information on running costs that may activate ‘mental accounting’ on fuel economy but run the risk of the so called ‘miles per gallon’ (mpg) illusion (Larrick and Soll 2008). Mental accounting is the process through which people code, categorize and evaluate different economic events and their associated outcomes (Thaler 1985, 2004). The mpg illusion exists because using mpg as a measure of fuel efficiency leads people to a systematic misperception. Another explanation is that information on ‘lost saving’ is a form of ‘framing’ leading to the ‘loss aversion’ identified in Prospect Theory (Kahneman and Tversky 1979; Tversky and Kahneman 1991). Information about taxation is another example for the activation of ‘mental accounting’.

The minimum number of treatments combining the various attributes that it was decided to test was 12. Table 1 provides an outline of the visual stimuli. The logic of the design was to present full labels, i.e. including more than one attribute (e.g. graphical layout and running cost per mile), but allocating those attributes across labels in order to recover the main effect of individual pieces of information. The labels were mock-ups: they were dynamically adjusted in the course of the experiment to report the specific characteristics of the cars appearing in the discrete choice task. An example of these treatments is found in the SOM: Sect. 3.3.

The experimental flow is reported in Table 2. Subjects were first asked questions on socio-demographic characteristics and preferred class of cars. Ascertaining the preferred class of car to purchase was necessary as existing evidence points to a two-step process in car purchase decisions—first selecting the class of car and then selecting the model within the preferred class.Footnote 5

Table 2 Experimental flow

Participants then performed the discrete choice task: three cars were randomly selected from the revealed class preference and subjects were asked to choose one of them as a purchase. These cars were shown with labels alongside; the format of the label (one among the 13 in Table 1) was randomly assigned between subjects and was the same for all the cars, but with the appropriate details for each car. At random, some of the subjects were shown four cars, with the fourth taken from a non-preferred class to control for the robustness of the two step purchasing procedure. This car in the fourth class was picked at random from the next closest segment. (The list of ordered segments is the one presented above).

A screenshot of the experimental task is shown in the SOM (Sect. 3.4).

Following the discrete choice task subjects were asked a number of questions about the noticeability and cognitive processing of the information. In this analysis, we focus on the following two questions, related to emissions and running costs:

  1. 1.

    How do you think the car you selected scores in terms of CO2 emissions compared to the other options available?

  2. 2.

    How fuel efficient do you think is the car you selected compared to the other options available?

Responses were recorded on a scale from one to ten (low score equals less environmentally friendly). The full questionnaire is reported in the SOM, Sect. 3.2.Footnote 6

As dependent measures, we recorded both behavioural and cognitive processing variables. A behavioural variable captures an action by the respondent; a cognitive variable record a cognitive process (in this case related with fluid intelligence) by the respondent.

The behavioural variables were:

  1. 1.

    A score on emissions: This is the ranking in terms of CO2 emissions of the selected car with respect to all the cars in the database. The score goes from one to ten, on a continuous scale. The higher the score, the greener the choice;

  2. 2.

    A score on fuel efficiency: This is the ranking in terms of fuel efficiency of the selected car with respect to all the cars in the database. The score goes from one to ten, on a continuous scale. The higher the score, the larger the saving.

The cognitive variables were:

  1. 1.

    Fuel efficiency: we compare the choice options available to the respondent in terms of fuel efficiency and we rank the chosen car in terms of fuel efficiency, from one to n (n = 3,4 is the number of cars shown), increasing in the level of efficiency (objective score). We normalize this ranking from one to ten. We then take the response to the question “How fuel efficient do you think is the car you selected compared to the other options available?”, which is given on a 1–10 scale (subjective score). If the difference between the objective score and the subjective score is lower than 1/n, our cognitive processing dummy is equal to one; otherwise, it is equal to zero.

  2. 2.

    Environmental friendliness: we compare the choice options available to the respondent in terms of CO2 emissions and we rank the chosen car in terms of CO2 emissions, from one to n (n = 3, 4 is the number of cars shown), increasing in the level of greenness (objective score). We normalize this ranking from one to ten. We then take the response to the question “How do you think the car you selected scores in terms of CO2 emissions compared to the other options available?”, which is given on a 1–10 scale (subjective score). If the difference between the objective score and the subjective score is lower than 1/n, our cognitive processing dummy is equal to one; otherwise, it is equal to zero.

Notice that our cognitive variables allow for a margin of error: the respondent can be wrong up to one position in the objective score.

3.3 Online experiment

The experiment was computer based and conducted on a representative sample. The participant performed a price elicitation task, declaring their willingness to pay, and answered some questions. The car shown was selected randomly from the car class preferred by the participant, with a label (or promotional material) aside. The format of the label (or promotional material) is our tested treatment, with a plain label (or promotional material) as control condition. Randomization occurred between subjects. Dependent variables are the declared price (with respect to market price) and the cognitive processing of the information related with running costs, according to the answer to the post-experimental question.

The online experiment was performed in ten countries, with 8211 participants. The programming was performed by one of the authors and then it was administered to an online panel. The access to the panel was given by a market research company that administers surveys and questionnaires. Data were gathered in February 2013. The participants in the online panel were contacted by email and were asked to give their informed consent by clicking on an “Accept” or a “Reject” button. The London School of Economics Research Ethics Committee provided the ethical authorization. On average, the experiment took 14.51 min, with standard deviation 7.99 min and participants received a fixed fee.

The sample is representative of the online population in each country for the age 18-65 with quotas by country, gender and age group (three). It is a random sampling with sampling error of 1.12 % for overall data and 3.54 % for country-specific data. The countries included are (participants among parenthesis): Belgium (815), France (803), Germany (810), Italy (804), Netherlands (807), Poland (824), Romania (819), Spain (887), Sweden (828), and United Kingdom (814).

The participants answered a series of pre-treatment questions and were allocated to the main task, before concluding with a post-experimental questionnaire. The original design of the experiment included a number of subtasks that were motivated by specific issues in the labelling directives (e.g. a comprehension question regarding the CO2 classification systems). These tasks are not reported in this article (see the discussion in Codagnone et al. 2013). The results featured in this paper concern the evaluation of labels and promotional materials.

The experimental flow is shown in Table 3. Participants answered some preliminary questions and then moved into the main task. The experimental task elicited willingness to pay, through a multiple price list (MPL) format. The subject was shown a car randomly selected from the database; the car was accompanied by a label or promotional material aside, with the specific features of the car reported in the treatment (dynamic mock up). The treatments were allocated randomly between subjects. The participants declared the maximum price they will pay, by clicking one of the options in a grid of prices. The grid was adjusted around the market price of the car in the country. The grid included 12 options with a 6 % interval (of the market price) at each tick; prices were later approximated to be meaningful.

Table 3 Experimental flow

The car purchase was simulated, because performance related payments were not feasible. The opportunity cost of participation was covered by the participation fee mentioned earlier. Although this is not standard in economic experiments, it is often used in online studies and in behavioural science (e.g. Bogliacino et al. 2015).

2485 participants were asked to perform a MPL associated with labels (with a conventional engine, electric or hybrid car), while 2398 were asked to perform a MPL associated with promotional materials (with a conventional engine, electric or hybrid car).

For the labels the information variables were:

  1. 1.

    Running costs in two versions (cost per mile/km and cost per 5 years);

  2. 2.

    Lost saving on fuel with respect to the best performer in the class;

  3. 3.

    Fuel economy: litres per km or miles per gallon depending on the country and battery life in the case of electric car.

The table with the labels elaborated combining these pieces of information is reported below (Table 4).

Table 4 Labels for the online experiment

As there is no standard car label in use across the EU the control condition was a standardised format produced to ensure comparability across the sample. Respondents were randomly allocated to treatments regardless of their country of origin.

Suggestions from the literature on good label design (Peters et al. 2007, 2009) and on the need to avoid using different metrics and classification systems (Fasolo et al. 2010) informed the design of the promotional materials. We also followed the dictum ‘if it is hard to read then it is hard to do’ (Song and Schwarz 2008).

The promotional material comprised the following informational elements:

  1. 1.

    Format: This dimension refers to form of reporting information about CO2 emissions. With respect to the control condition two variations were tested: using only a Graphic Element (GE) or using both a GE and a textual illustration of the CO2 emissions class;

  2. 2.

    Additional elements: a small text indicating running costs (RC small); a larger running cost element (RC salience) with a footnote indicating the unit of measurement for running costs at the very bottom of the promotional material.

  3. 3.

    A weblink that once clicked upon, pop up a label.

The table with the promotional materials elaborated combining these pieces of information is reported below (Table 5).

Table 5 Promotional material factorial design of treatments

After the MPL, every subject answered a number of questions. As in the case of the laboratory experiment, these questions concerned the noticeability of the specific piece of information we aimed to test. We will focus on the following question, asked to all participants:

“How do you think the car you selected scores in terms of running costs with respect to the other cars in the market?”

The response was elicited on a ten-point scale, increasing in the greenness of the choice.

The dependent variables included a behavioural variable and a cognitive variable.

The behavioural variable was the elicited WTP. This was calculated as the ratio of the chosen option and the market price of the car in the country.

The cognitive variable is a dummy measuring correct processing of the information. We first compute the objective score of the car in terms of running costs, with respect to the entire car database, and we normalize it from one to ten. We then consider the subjective score, given by the answer to the question: “How do you think the car your selected scores in terms of running costs with respect to the other cars in the market?” If the difference between the objective and the subjective score was lower than 3.3 then the dummy is equal to one; zero otherwise.

3.4 The analysis performed

Following the standard Rubin causal model (1974) we can define the outcome for the untreated individual i in reduced form as:

$$y_{i}^{0} = \alpha_{0} + u_{i} ,$$
(1)

where the last term is the unobservable component, while for the individual i receiving treatment \(j = 1, 2, \ldots\; n\):

$$y_{i}^{j} = \alpha_{0} + \beta_{j} + u_{i}$$
(2)

As a result for the generic participant i we can define the \(d_{i}^{j}\) as the dummy equal to one if treatment j is applied to I and zero otherwise. We can write:

$$y_{i} = \left( {1 - \mathop \sum \limits_{j} d_{i}^{j} } \right)y_{i}^{0} + \mathop \sum \limits_{j} d_{i}^{j} y_{i}^{j} = \alpha_{0} + \mathop \sum \limits_{j} d_{i}^{j} \beta_{j} + u_{i} .$$
(3)

We can define the betas as the average treatment effect. OLS is unbiased if

$$E\left[ {u_{i} |d_{i}^{1} , \ldots , d_{i}^{n} } \right] = 0.$$
(4)

But we know that assumption (4) holds under randomization. In case the outcome is a non-linear function, we simply apply a proper functional form (e.g. logit or ordered logit), but identification still occurs under the same basic assumption. We run Huber-White heteroschedasticity robust standard errors.

4 Results

4.1 Laboratory experiment

In SOM, Sect. 3.1 we report basic descriptive statistics of the sample and the histograms of the response variables.

Since the lab is based on a convenience sample, it is worth discussing some of the prevalent features: 55.56 % of the participants are female, 63.95 % have less than 22 years old, 57.53 % have tertiary education, 15.80 % have never bought a car, 5.19 % is married

Socio-demographic characteristics are balanced across treatments, implying that randomization met its objective. We perform a set of ANOVA for socio-demographic, using treatments as independent variables. The null hypothesis of balanced characteristics is never rejected: gender (F = 0.91; p = 0.53); having bought a car (F = 1.03, p = 0.42); age group (F = 1.39, p = 0.16); education (F = 1.31; p = 0.21); marriage status (F = 0.82; p = 0.63).

In Table 6 the main results from the discrete choice experiment in the laboratory are shown. Logistic regression is used for the cognitive variables and ordered logit regression for the behavioural variables, both with robust standard errors. (The description of the outcome variables was presented in the previous section). Note that the cognitive variables are dummies for correct processing of the information (with an error margin), while the behavioural variables are “scores”, the greener the choice (in terms of emissions or fuel economy) the higher the score. The coefficients represent the average treatment effect in comparison to the control condition (the existing label in the UK).

Table 6 Laboratory experiment: average treatment effect of the labels

We start with columns (1) and (2) where the coefficient captures the marginal effect on the log of the odds ratio. In both cases, label 3 (vertical layout, relative classification system, running costs per mile, no info on taxation, but inclusion of loss savings nudge) significantly increases the likelihood of answering satisfactorily (i.e. within the accepted margin). We can calculate the predicted probability of correct processing for label j (j = 1, 1, … 12) as

$$p_{j} = \frac{1}{{1 + { \exp }( - \alpha_{0} - \beta_{j} )}}$$
(5)

Where alpha is defined in (3) and is our constant.

When moving from the control condition to label 3 the probability of processing correctly the information on emission increases from 69 to 95 %, while for the fuel economy increases from 63 to 88 %. In terms of fuel economy label 2 (horizontal layout, relative classification system, monthly running costs, info on taxation, and loss savings nudge) has a positive and significant effect, raising the likelihood of satisfactory processing up to 84 %.

Both labels have a relative classification system, but we suspect that the presence of CO2 taxation, and information fuel costs (lost savings on fuel and running costs) drive the effects. Although we cannot isolate individual effects, the evidence is consistent with the effectiveness of the loss framing.

We now move to the behavioural variables. Results are reported in columns (3) and (4). First, nudging on emissions does not lead to greener choices per se, but it may be effective in activating mental accounting on fuel economy. This explains the different results across the two variables. Secondly, a number of labels are at least weakly effective (p < 0.10) in this case: classifications system and layout are diverse across treatments and we suspect that did not drive the results, while the running costs information per 5 years may be an important component to explain the effectiveness. Thirdly, it is important to stress that label 3 is not effective on this dimension.

These results are important as they confirm the higher effectiveness of fuel economy and running costs as compared to emissions information in capturing consumers’ attention and in influencing choices (Lane et al. 2012). Finally, we note that of all the labels implemented in Europe, the UK label is the only one where running costs and tax information are included. As such, the variation between control condition and the treatments was relatively limited, implying that our estimated effects are likely to be at the lower bound of the real causal impact. The online experiment allows for further clarification of this issue, as the control condition is simpler and more different from the treatments.

In order to summarize the assessment of the labels, we build an overall score: we normalize each measure separately, subtracting the average value and dividing by the standard deviation. In this way, the measure is centred on zero and with unitary standard deviation. We then merge the four measures with the corresponding dummies for treatments: the final database has now 1612 observations because each participant is considered four times, one per each measure.

The average value of the score per treatment, together with the 95 % confidence interval, is plotted in the following Fig. 1. The control condition is the 13th label, as explained in Table 1. As can be seen, at a global assessment labels from one to four, where fuel economy is shown, and twelve are the best performers.

Fig. 1
figure 1

Normalized Cognitive-Behavioural Score of labels in the Lab. Merge of the Normalized Cognitive processing of emissions; normalized cognitive processing of fuel economy; normalized score of the choice in terms of emissions and normalized score of the choice in terms of fuel economy (1612 observations). Normalization is performed subtracting the mean and dividing by the standard deviation of each indicator. Explanation of the labels is in Table 1 (label 13 is control condition). Column refers to the mean by condition, capped spikes to the 95 % confidence intervals

4.2 Online experiment

Moving to the online experiment, in this case descriptive statistics are less interesting, since the sample is representative. Nevertheless, the full set of socio-demographic is in the SOM Sect. 4.1; we add also that 10.77 % of the sample report never having bought a car.

Socio-demographic characteristics are balanced across treatments, as should be given randomization. We perform a set of ANOVA for socio-demographic, using treatments as independent variables. The null hypothesis of balanced characteristics is never rejected: gender (F = 0.95; p = 0.51); having bought a car (F = 1.38, p = 0.13); age group (F = 0.42, p = 0.98); education (F = 0.88; p = 0.59); marriage status (F = 0.82; p = 0.67); country (F = 0.70; p = 0.80).

Table 7 reports the results of the online experiment on the effects of both labels and promotional materials on Willingness to Pay (WTP). We treat as omitted category (control condition) both the control label and the control promotional material. The reason is that the distribution of the responses is not different in the two cases. We used a Mann–Whitney–Wilcoxon rank sum test, where the null hypothesis is that the two samples (control label and control promotional material) come from the same distribution; the hypothesis is not rejected (z = 0.689, p = 0.490).

Table 7 Online experiment: average treatment effect of the labels and promotional materials on WTP

There are two results that we want to highlight. First, only two variants of the promotional material have a significant influence on WTP; while none of the label variants do.

Second, we have at least two versions of the promotional material, both with weblinks, one with running costs, showing a significant effect. Yet, the impact on WTP is negative—apparently a counter intuitive finding. To explore this further, we look at the heterogeneity of the impact according to the class of car (“large and polluting” versus “small” and, hence, intentional available budget). Our “large class” dummy is equal to one for cars belonging to the following classes: Economic Sport Utility Vehicles; Executive Cars, Expensive Sport Utility Vehicles, Large Family Cars, Large Multi-Purpose Vehicles, whose emissions are on average higher. This helps to compare our findings with those of the discrete choice experiment performed by Achtnicht (2012), where he detect a shift in demand towards greener cars once exposed to labels (Table 8).

Table 8 Disentangling the effect of label and promotional material

What we find is a that the significant negative effects on WTP applies only to large and polluting cars. When lower environmental friendliness of these cars is made salient in the promotional material, it lowers consumers’ willingness to pay for them. We conjecture that this is consistent with what we observed in the lab. The information provision impact on mental accounting is through the fuel economy information, and as a result, in this case the effect weighs against high engine capacity and fuel consumption cars.

Over time as pro-environmental values become more prevalent, larger cars may increasingly be seen as problematic. In the experiments the information is a reminder that such cars are an environmental hazard, making people think again about their choices. The label becomes both a personal environmental nudge (values affecting choices) and a social normative nudge (what will others think of me?).

The final set of results concern cognitive processing. Table 9 shows that a large number of labels and promotional materials have a positive and significant effect.

Table 9 Online experiment: average treatment effect of the labels and promotional materials on cognitive processing

As for the laboratory experiment, we report the size of the marginal effect. Label 2 and 3 raise the likelihood of correct ranking from 58 to 64 %; P.M. 2, 4 and 7 to 68 %; P.M. 5 and 12 to 67 %, and finally P.M. 9 and 11 to 72 %.

The labels tested in the online experiments were somewhat simpler than those tested in the laboratory experiment in the UK. The promotional materials were also simpler than the labels. This suggests that simple messages increase the effectiveness of the treatments by inducing better cognitive processing by the consumers.

As we did in the lab, we summarize the findings with a simple graph, which includes the normalized cognitive-behavioural score. As before, we first normalize each measure (cognitive processing of the running costs and willingness to pay) separately, by subtracting the mean and dividing by the standard deviation; then, we merge the dataset, reaching 9761 observations. In Fig. 2 below, we plot the mean by condition together with the 95 % confidence interval.

Fig. 2
figure 2

Normalized Cognitive-behavioural Score: Online Experiment. Merge of the Normalized Cognitive processing of running costs and normalized willingness to pay (9761 observations). Normalization is performed subtracting the mean and dividing by the standard deviation of each indicator. Control includes both Label 6 (see Table 4) and Promotional Material 13 (see Table 5). Column refers to the mean by condition, capped spikes to the 95 % confidence intervals

As can be seen, promotional material variants are the best performers when we provide a global assessment.

5 Discussion and policy implications

This paper presents the results of two experiments evaluating the effect of eco-information, presented in labels and promotional materials, on cognitive processing and car purchase choices. While significant effects in the laboratory and online experiments are identified, these are not systematic across conditions. In both experiments, fuel economy and running costs were better understood and affected subjects behaviour when environmental friendliness was coupled with fuel economy (what we define as incentive driven environmental friendliness). Simpler information in promotional materials led to better cognitive processing than information dense labels. No effects were found for information on CO2 emissions. These findings are consistent with the existing evidence on car labels (Lane et al. 2012) and with some ex ante hypotheses on mental accounting and loss-framed nudges. The higher effectiveness of the simpler promotional materials is in line with expectations concerning label summarising complex numerical information (Peters et al. 2007, 2009).

This study has a number of limitations. First, given constraints from the regulatory context (‘realistic nudging’), the experimental labels could not be tested in formats that were radically different from control conditions. For example, we could not test very simple messages or emotionally charged ones. This may have reduced the impact of the treatments. Second, the MPL was not incentivised due to logistic and budgetary constraints, although we do not believe that this substantially biased the results (Camerer and Hogarth 1999).

While recognising these limitations, we stress that regulatory realism is also a strength of the study. It is the first test of car eco-labels and promotional materials with randomised control trials in ten European countries. It has collected new empirical evidence and raised a number of conceptual, theoretical, and policy implication that warrant further discussion.

One important policy issue pertains to the use of labels for behaviourally informed policies. The original and more restrictive definition provided by Thaler and Sunstein (2003, 2009) specifies that a nudge helps individuals make better decision for themselves (internalities), follows a non regulatory approach, and leverages system 1 against itself (i.e., default options functioning by the mere existence of the status quo bias) rather than triggering rational cognition. Using these criteria, the car labels we tested do not qualify as nudges. These labels aim at externalities, are embedded in regulation, and aim to activate rational cognition. Recently, Sunstein (2015, p. 511) has recently relaxed the definition and argued that to qualify as a nudge it is sufficient that an intervention does not impose significant material incentives, so a Global Positioning System (GPS) or a warning are nudges but subsidies or taxes are not. On this issue, some further theoretical reflection is needed.

In the case of eco-label, we suggest to follow a more innovative approach to policy, based on insights from behavioural economics. Consumers in most countries and for most types of products and services self-report green purchasing intentions that are translated into actions only in a minority of cases; the literature that documents this phenomenon calls it the ‘attitude-action gap’ (Anable 2006; Anable et al. 2009; Gadenne et al. 2011; Kaenzig et al. 2013; Kollmuss and Agyeman 2002; Lane et al. 2012; Lane and Potter 2007; Momsen and Stoerk 2014, p. 376; Noblet et al. 2006; Teisl et al. 2008; Vlaeminck et al. 2014, p. 180). Such a gap is explained by Weber (2013) by identifying the underlying heuristics and biases. In particular Weber (2013, p. 382) argues that one of main causes is that we have insufficient visceral reactions to environmental issues. In addition to some of the more conventional nudges (mental accounting, loss framing, social comparison, and regrets), Weber (2013, pp. 391–392) concludes that consumers should be nudged toward more environmentally sustainable habits through visual fear appeals and/or concretization of future events. This would correspond to some emotionally charged label as in the tobacco case (Bogliacino et al. 2015; Codagnone et al. 2014b).

Alternatively, one may think along the lines of Codagnone et al. (Codagnone et al. 2014a). They proposed a typology of nudges where the two dimensions are ‘automatic versus reflective mode’ of reaction and ‘hot versus cold affect’. Car purchasing and the corresponding eco-label is a situation of reflective mode and cold affect, where neither automatic defaults nor elicitation of visceral reaction tend to be activated. At present, mental accounting and loss framing appear to be the most promising nudges. However, over time, if pro-environmental values become more widespread, social normative nudges are likely to become more effective.