1 Introduction

Life cycle assessment (LCA) is often used to support the evaluation of the potential impacts caused by a product or a system and to perform relative comparisons between similar products to identify the best option in terms of eco-efficiency (Bjørn et al. 2015). Given the plurality of the environmental aspects covered by LCA, and the possible trade-offs between impact categories, a direct comparison between two products comprehensive of all environmental indicators can only be performed after the so-called normalization and weighting steps, that allow to aggregate the different environmental dimensions into a single score. The typical approach adopted in LCA to perform this aggregation is a weighted sum, as described in Eq. 1.

$$S(p)=\overset{}{\underset{}{{\textstyle\sum_{i=1}^n}}}w_i\times\frac{I_i(p)}{R_i}$$
(1)

Here, the characterized results I for each impact category i caused by product p (each with its own unit of measure) are converted into the relative contributions of the analyzed product to a reference system (Sleeswijk et al. 2008) by dividing them by a normalization reference (R) calculated for impact category i and aggregated into a single score (S) after applying to each a weight (wi) (a numerical factor denoting its relative importance). Normalization and weighting are optional steps of the interpretation phase of a LCA study according to the ISO standards (2006a, b), mostly due to the potential biases introduced by choosing a certain normalization reference, and the value-choices that need to be done to assign weights to impact categories (Heijungs et al. 2007; Pizzol et al. 2017). Nevertheless, they are often applied in practice, and were recommended by the UNEP/SETAC Life Cycle Initiative (UNEP-SETAC 2021) as a way to support the interpretation of the meaning of the results of life cycle impact assessment (LCIA), by providing information on the magnitude of impacts and facilitating their communication to stakeholders as well as supporting decision-making (Pizzol et al. 2017).

A common approach to normalization is to divide the characterized results of a product system by the characterized total emissions and extractions linked to the production or the consumption taking place within a political or geographical boundary (broadly defined in this article as a region, which could correspond to a country or to an area including several countries) over a certain period of time, complemented with estimations of missing elementary flows (Cucurachi et al. 2014, 2017). This approach, referred to as external normalization (Norris 2001), is most commonly applied in LCA by adopting a production-based (either regional, or to a lesser extent global) approach (Pizzol et al. 2017). Alternatively, external normalization references can be developed with a consumption-based approach, considering all the elementary flows related to the activities linked to the apparent consumption of a region, i.e., including those related to imports and excluding those related to exports. Examples of consumption-based normalization references are those developed for Finland (Breedveld et al. 1999) and for the Netherlands (Dahlbo et al. 2013). It should be noted that when adopting a global scale, production, and consumption-based approaches should in principle lead to similar results as global production and consumption coincide. A barrier to the development of regional consumption-based normalization references is the large data requirements for the inclusion of environmental flows related to imports in a consistent way, especially taking into account the different technological development and related efficiencies of manufacturing activities taking place outside the physical boundaries of the reference system (Laurent and Hauschild 2015). To overcome this barrier, environmental extended input/output (EIO) analysis is often applied. However, due to the lack of sectorial emission data for substances contributing to toxic impacts and to the heterogeneity of industrial sectors in input/output (IO) statistics, EIO is often complemented with process-based data sets (Laurent and Hauschild 2015). Instead, the development of consumption-based normalization references by means of fully process-based LCA has not been explored so far in the literature. A third typology of external normalization, which presents some similarities with a consumption-based approach but does not require the use of statistical data to model the final demand of a region, was suggested by Hélias and Servien (2021). Here normalization references are calculated as the geometric mean of all processes included in a life cycle inventory database, to be used in the normalization step of process-based LCAs developed using the same database. The benefit of this approach is that, similarly to a consumption-based approach, it ensures consistency between the modelling of the system under study and the normalization references, overcoming potential biases.

The external normalization approach presents a number of drawbacks, mostly related to the difficulty of compiling reliable external normalization reference datasets, the introduction of potential biases as the comparative results can be dominated by the normalization step, irrespectively of differences in inventory or weighting schemes, and the issues of compensability and inverse proportionality (Cucurachi et al. 2017; Prado et al. 2017). The risk of bias in external normalization approaches was demonstrated by Prado et al. (2019), showing how the significant variety in the scales of the normalized values made the aggregated results insensitive to the weighting set applied (as weights have generally lower variability). As the authors demonstrated with a practical example, when this is the case, only few impact categories can dominate the aggregated score calculated with a weighted sum, entailing that the ranking of options after aggregation depends mostly on how the options perform in those environmental domains (similar conclusions were reached by Myllyviita et al. 2014 and by Muhl et al. 2021). This is closely connected to another limitation of the application of a weighted sum with external normalization, referred to in the literature as compensability. This is described as the “possibility of offsetting a disadvantage on some criteria by a sufficiently large advantage on another criterion” (Munda 2005). In other words, due to the linearity of the aggregation through weighted sum, full compensation between impact categories is allowed, making it possible for a single good performance in one impact category to compensate for multiple poor performances (Prado et al. 2019; Pollesch and Dale 2015; Rowley et al. 2012). Lastly, inverse proportionality is the unavoidable effect that the normalized value decreases as the normalization value increases. This effect seems to contradict one of the reasons for performing normalization in the first place, which is to enable a contextual understanding of the relative magnitude of the different impacts calculated for the product under study (White and Carty 2010).

As a potential way to overcome these obstacles, scientist have proposed an alternative approach to normalization which makes use of target impact levels instead of existing impact levels (Bjørn and Hauschild 2015). The main benefits of this approach are that it enables to make bias more transparent, it overcomes inverse proportionality, and it enables the identification of hotspots in relation to global challenges and goals (Cucurachi et al. 2017) by shifting away from the traditional eco-efficiency perspective of LCA and moving towards an absolute assessment of sustainability (Hauschild 2015). Common challenges regard (i) the identification of a level of acceptance of an environmental impact based on, e.g., the global carrying capacity of Earth (the so-called planetary boundaries – PB) and (ii) the translation of the existing planetary boundaries (PBs) in metrics that are compatible with those of the LCIA impacts. A first step in this direction was presented by Sala et al. (2020), who developed a set of LCIA-based PBs compatible with the 16 impact categories adopted in the environmental footprint (EF) method (EC 2021). Finally, another alternative to external normalization suggested by some authors (e.g., Prado et al. 2017) is internal normalization. This approach is based on the selection of a baseline scenario among the ones considered by the study, against which the characterized results obtained for each impact category of all the products considered are compared (Laurent and Hauschild 2015). The use of internal normalization allows to overcome the issue of compensability, by adopting aggregation approaches that perform partial compensation (e.g., the outranking approach suggested by Prado et al. 2019) or no-compensation (as those suggested by Rowley et al. 2012). The main limitations of internal normalization are that it is context-dependent, limiting its applicability in LCA, and it cannot be used with generic weighting for aggregation (Norris 2001). For these reasons this normalization approach is out of the scope of this paper.

This study aims at contributing to the ongoing debate on normalization approaches, by investigating the influence of the choice of a normalization reference over the interpretation of the results of LCA studies, and the relationship between normalization and weighting in determining aggregated scores. A similar experiment was conducted by Prado et al. (2019). The novelty of this work stands in the application of a broader range of external normalization references (one global, one at EU level developed with a production approach, two at EU level developed with a consumption approach) and in the inclusion of a normalization reference based on planetary boundaries. To this end, the developed normalization references were applied in the normalization step of more than 140 products’ LCIA and aggregated scores were calculated using two alternative weighting sets. As a result, the relative ranking of impact categories was compared for each normalization and weighting option. Implications of choosing one normalization set over the others are discussed and current limitations and future research needs are illustrated.

2 Methods

This section illustrates the normalization sets used and the approach adopted to assess the influence of normalization references over the interpretation of the results of LCA studies.

The LCIA method used to characterize environmental elementary flows in this work is the EF method version 3.1 (EC 2021; Andreasi Bassi et al. 2023), the European Commission’s reference method in the impact assessment of the environmental performance of products and organizations (CEC 2013; EC 2021). This method considers 16 impact categories: climate change, acidification, ozone depletion, eutrophication -terrestrial, marine, and freshwater, photochemical ozone formation, particulate matter, ionizing radiation, ecotoxicity, human toxicity cancer and non-cancer, land use, water use, resource use—metal and minerals, and fossil.

2.1 Defining normalization references

Five different normalization references were evaluated in this study (Table 1): four of them can be classified as external references and one as absolute reference. Of the external references, two have a production scope and two a consumption scope.

Table 1 Description of the normalization sets used in this work

The consumption-based references are both developed at EU level: one is based on process based LCA (EU-C-p) and one on input/output (EU-C-i/o). The EU-C-p is taken from the consumption footprint indicator (Sala et al. 2019; Sala and Sanyé Mengual 2022; Sanyé Mengual and Sala 2023). This normalization reference was obtained by quantifying the environmental burden of EU final consumption considering five areas of consumption: food, mobility, housing, household goods, and appliances. To this end, process-based LCAs of representative products selected to meet food, mobility, housing, and other consumers’ needs were developed to assess their environmental impacts over their full life cycle (i.e., from raw materials extraction to production, distribution, use and end-of-life). The environmental impacts of the representative products were then multiplied by consumption statistics to assess the impacts of EU consumption and of the average EU citizen in 2010 (Sala and Castellani 2019; Sala et al. 2019; Sala and Sanyé Mengual 2022). The LCI model of the consumption footprint indicator employs ecoinvent v3.6 for background inventory data (Wernet et al. 2016) and is characterized using the characterization factors (CFs) of the EF3.1 method version (Andreasi Bassi et al. 2023). Data has been extracted from the Consumption Footprint Platform (EC-JRC 2023).

The EU-C-i/o was derived by adopting a top-down approach that employed environmentally extended multi-regional input–output-tables (i.e., EXIOBASE 3). Input–output analysis enables to allocate the emissions and resources extraction of the production stages to the final consumption of goods and services, through the application of the Leontief inverse equation (Leontief 1970). The inventory of resources and emissions related to household consumption extracted from EXIOBASE 3 for the year 2011 uses a different classification of elementary flows compared to the one adopted by the EF method, and therefore a mapping was performed to link each elementary flow to the corresponding CF from the EF method, in order to calculate the potential impacts these flows induce on the environment, as presented in Beylot et al. (2020). The resulting impacts, calculated with the EF2.0 method, are presented in Castellani et al. (2019) and Sala et al. (2019). Only 14 out of the 16 EF impact categories were considered (i.e., all except ionizing radiation and ozone depletion), as in the environmental extension of EXIOBASE 3 ionizing radiations and ozone depleting substances are missing (Beylot et al. 2020). For this study, this calculation was repeated using the CFs of the EF3.1 method version.

The production-based references concern the EU-28Footnote 1 and the world. These normalization references were calculated from inventories reporting the overall resources extracted and emissions released in the environment (air, water, soil) for a determined territory. In both cases, the reference year was 2010. Data are mainly collected from official statistical sources, although some of the flows (i.e., resource extraction or emission) have been modelled due to the lack of available statistics. Based on these characteristics these two normalization references are named EU-P-s and GLO-P-s. The inventory reporting the emissions and extractions within the EU underlying the calculation of the EU-P-s normalization reference is presented in Sanyé Mengual et al. (2022), according to its latest update presented in the Consumption Footprint Platform (EC-JRC 2023).

The compilation of the global inventory of resource and emissions underlying the GLO-P-s references can be found in Crenna et al. (2019). Since this publication, some updates and refinements took place in the global inventory and the resulting normalization factors were re-calculated in version 3.1 of the EF method. All the updates of the global inventory and the normalization factors thus obtained are presented in Andreasi Bassi et al. (2023).

The last normalization reference set considered in this study is based on an absolute threshold described by PBs. As it is developed at global scale it is named GLO-PB. PBs are a measure of Earth’s ecological limits and carrying capacity, introduced by Rockström et al. (2009), and updated by Steffen et al. (2015). The combination of the PBs framework with LCA was proposed by Bjørn et al. (2015) and Ryberg et al. (2018) to perform an absolute sustainability assessment. This assessment allows comparing the environmental impacts of consumption with absolute thresholds and in this way to identify which path should be followed and prioritized to remain within these boundaries. To this end, LCIA-based PBs were derived in Sala et al. (2020), by adapting the PBs framework to the LCIA indicators and metrics of the EF method. The GLO-PB normalization reference set used in this work is directly taken from Sala et al. (2020) for all impact categories excluded land use. In this case an update was performed to the original set to convert the value suggested in Sala et al. (2020) measured in terms of soil erosion in the metric used by the EF method (Pt), presented in De Laurentiis et al. (2019). The reader is referred to the SI for more details. In Sect. 3.1 the resulting normalization sets are presented and compared.

2.2 Assessing the influence of normalization on the ranking of impact categories

The benefit of performing normalization and weighting is that it enables to solve trade-offs between impact categories and to identify which are the dominant impact categories for the product considered. This can be performed by adding together the normalized and weighted results, as presented in Eq. 1, and calculating the contribution of each normalized impact to the total aggregated impact. Based on this approach, it is possible to analyze how impact categories are ranked in terms of their contribution to the aggregated impact for the product considered, to identify hotspot impact categories.

As mentioned in Sect. 1, the normalization step of LCIA has been criticized for the potential biases that can be introduced by the choice of a certain normalization reference over another, and as this step alone can heavily influence the outcome of a comparative analysis between two products, irrespectively of the differences at inventory level (Cucurachi et al. 2017). To investigate this critical aspect of normalization, a five-stepped approach was adopted to understand the influence of using the different normalization sets presented in Sect. 2.1 on the resulting ranking of impact categories:

  1. 1.

    Characterization: the environmental impacts of the 144 representative products that compose the “Baskets of Products” presented in Sala et al. (2019) were calculated with the CFs of the EF3.1 method;

  2. 2.

    Normalization: the derived impacts were divided by the per capita normalization references calculated for the 5 normalization sets considered;

  3. 3.

    Weighting: for each product and each normalization set, the normalized impacts were added together to obtain a single score calculated with two alternative weighting sets;

  4. 4.

    Impact category contribution: the relative contribution of each impact category to the total aggregated impact was calculated;

  5. 5.

    Statistical analyses were performed to understand the role of normalization and weighting in defining the relative contribution of impact categories to the aggregated scores.

For the weighting step, two different weighting approaches were evaluated to assess the influence of weighting on the final results. The first is a unitary weighting system, where impact categories are assigned a weight equal to one when performing the aggregation into a single score. The second is the EF weighting set, presented in Table 2 (Sala et al. 2018).

Table 2 EF weighting factors (Sala et al. 2018)

In this way, it was possible to assess: (i) which impact categories are most commonly responsible for the largest share of the aggregated impact, (ii) how the choice of a normalization set influences the ranking of impact categories, (iii) how the choice of a weighting set influences the ranking of impact categories. The results of this analysis are presented in Sect. 3.2.

3 Results and discussion

In this section, the five normalization sets are compared and the differences among them are investigated and discussed (Sect. 3.1). It is important to highlight that in the case of the four external normalization references as the impact assessment method used is the same, the differences are solely linked to the different approaches taken in building each inventory. A practical application of the use of the normalization sets is presented, to highlight the influence of normalization and weighting on the relative importance of impact categories (Sect. 3.2). To conclude, limitations of the suggested normalization sets are discussed and future research needs are identified (Sect. 3.3).

3.1 Comparing normalization references

Table 3 presents an overview of the per capita normalization factors derived with the five approaches presented in Sect. 2.1. For each impact category, the highest value across the five sets is highlighted (this will result into the lowest normalized result once the normalization is applied). It is possible to see that for 8 impact categories (namely: ozone depletion, human toxicity cancer, human toxicity non-cancer, ionizing radiation, acidification, terrestrial and marine eutrophication, and water use) the highest value is the one obtained adopting an absolute approach (GLO-PB). This entails that for those environmental domains the per-capita impacts obtained either at EU level (with a territorial perspective or a consumption perspective) or at global level do not exceed the earth’s carrying capacity. This is in line with the findings of Sala et al. (2020), for all impact categories excluded those that were updated between the EF2.0 and the EF3.1 (for which these two sets of results are not comparable). As for the remaining impact categories, both the EU-C-i/o set (based on input/output) and the EU-C-p set (obtained with process-based LCA) presented the highest value in three cases (respectively, particulate matter, photochemical ozone formation, resource use mineral and metals for EU-C-i/o and climate change, ecotoxicity, and resource use fossil for EU-C-p), while the GLO-P-s set presented the highest value for two impact categories (freshwater eutrophication and land use).

Table 3 Overview of the normalization sets considered in this study. All values are per capita. Shaded cells report the highest value across the 5 sets for each impact category. Acronyms for the normalization sets are presented in Table 1
Fig. 1
figure 1

a Comparison between a production (EU-P-s) and a consumption-based (EU-C-p) approach in deriving normalization references at the EU level, b comparison between territorial approaches developed at different geographical scale: per-capita EU impacts (EU-P-s) versus per-capita global impacts (GLO-P-s), c comparison between consumption-based normalization references for the EU obtained with input/output (EU-C-i/o) and processed-based (EU-C-p) approaches. Acronyms for the normalization sets are presented in Table 1. Acronyms for the EF impact categories are presented in Table 3

A direct comparison between the adoption of a consumption-based and production-based approach, is provided by comparing the results obtained at EU level with the EU-P-s approach (production-based) and the EU-C-p approach (consumption-based) (Fig. 1a). For 10 impact categories, the consumption approach yields larger impacts. This is to be expected as the EU is a net importer of goods; therefore, the impacts driven by EU consumption caused overseas outweigh the impacts caused in the EU for the production of exported goods. This is the case of the two impact categories related to resource use (fossil, and mineral and metals), where the lower values found for the domestic footprint compared to the consumption footprint reflect the limited resources extracted in Europe (as pointed out in Sala et al. 2019). In three cases, the values obtained for the EU-C-p were lower than half of the ones obtained for the EU-P-s (i.e., land use, ionizing radiation, ozone depletion). For land use, the lower values found with the EU-C-p might be due to an underestimation of the land occupation flows in the inventories of the process-based LCAs developed. In fact, it is known that the EU is a net importer of virtual land (Kastner et al. 2014; O’Brien et al. 2015; Tramberend et al. 2019) and therefore the per capita land use derived with a consumption-based approach should be higher than the one derived through a territorial approach. Instead, for the remaining two impact categories, this might be explained considering that the EU-P-s covers the whole economy while the EU-C-p only considers household consumption.

The production-based normalization sets were developed for the EU and the world. A comparison between the per-capita normalization factors obtained with the two approaches is provided in Fig. 1b. Global per capita impacts resulted in higher values compared to EU territorial per capita impacts for 14 out of 16 impact categories (all excluded particulate matter and climate change). Therefore, for these 14 impact categories the contribution of EU territorial impacts to global impacts is lower than the EU share of world population in the year considered (equal to 7%), as illustrated in Fig. S7 of SI. This could support the theory that the EU is outsourcing part of its impacts through trade (Corrado et al. 2020). This trend is instead reversed in the case of climate change and particulate matter. A potential explanation for this is that some of the main drivers of impact for these two impact categories are related to activities performed within the EU borders, such as transport and household energy use, which will therefore result in the domestic accounting of emissions.

Figure 1c illustrates a comparison between the two consumption-based normalization references presented in the study (i.e., the EU-C-i/o and the EU-C-p). EU-C-i/o presents higher impacts than EU-C-p for 10 impact categories out of 14. This is to be expected, as the modelling of the former includes more sectors compared to the 5 areas of consumption considered by the modelling of the latter. The same comparison (obtained by characterizing impacts with the EF2.0 method) is presented in Castellani et al. (2019); however, for some impact categories the results here presented are different due to the update of the EF version used and updates in the models underlying the consumption footprint indicator. The most striking difference being the results on human toxicity-cancer, which in this current work resulted higher with the EU-C-i/o approach compared to the EU-C-p, while the opposite was found in Castellani et al. (2019). The lower coverage of substances of the EXIOBASE 3 inventory compared to the substances included in the inventory underlying the EU-C-p set (1402 in EU-C-p versus 78 in EU-C-i/o), can explain the lower value obtained for ecotoxicity in the i/o approach (Castellani et al. 2019). In fact, the top contributing substances to the EU-C-p for this impact category (emissions to water of chlorpyrifos and of chlorine) are not captured in EXIOBASE 3 (where the main contributing substances are emissions to air of ammonia and NMVOC). The difference between the values obtained for water use can be explained considering that the EU-C-p includes country-specific CFs to assess water impacts, which are instead derived using global average CFs in case of the EU-C-i/o (Castellani et al. 2019). Finally, the resulting large difference for the indicator “mineral resource depletion” could be explained considering the different level of detail of the inventory flows in the two approaches. In the EU-C-i/o, the largest contributor to the impact is an aggregated flow (namely, “other industrial minerals”): this limits the potential to apply the most appropriate characterization factors adding a certain degree of uncertainty to the results (Castellani et al. 2019). For further comparisons between the four external normalization sets, the reader is referred to the SI of this article, providing logarithmic scale scatter plots for all possible combinations of normalization sets.

3.2 The relevance of normalization and weighting in impact assessment and interpretation

The application of the 5 normalization sets to the 144 representative products composing the “Baskets of Products” (Sala et al. 2019), and the aggregation of the resulting normalized impacts with two alternative weighting sets (the first using unitary weights and the second using the EF weights), resulted in two dashboards, showing the contribution of each impact category to the aggregated scores obtained for each product, available in SI. The distribution of the contributions obtained across the 144 products with each normalization set is shown in Figs. 2 (unitary weights) and 3 (EF weights).

Fig. 2
figure 2

Distribution of the relative contribution of each impact category to the aggregated score across the 144 products for the 5 normalization sets (aggregation with unitary weights). Acronyms for the normalization sets are presented in Table 1. Acronyms for the EF impact categories are presented in Table 3. To maximize readability the scale of the y-axis varies across the charts

In the first weighting approach, a uniform distribution of the relative contributions across impact categories characterizes the EU-C-p set, and to a certain extent the GLO-P-s and the EU-P-s set, although for the GLO-P-s set the impact category fossil resource depletion shows a higher relative contribution compared to others, while for the EU-P-s set this is the case for fossil resource depletion, mineral resource depletion and ecotoxicity. A clear dominance of one or more impact categories can be seen in the GLO-PB set (climate change, particulate matter, and ecotoxicity) and in the EU-C-i/o set (ecotoxicity). This is to be expected, as, in the case of the GLO-PB set the normalization factors for these three impact categories are significantly lower (up to two orders of magnitude) than the normalization factors obtained with the EU-C-p approach (underpinned by the same inventories of the products tested). The relevance of climate change and particulate matter is consistent with the findings presented in Sala et al. (2020), who highlighted that these two ecosystem thresholds have been transgressed by current environmental impacts of consumption and production patterns. Similar considerations can be made for the EU-C-i/o set, where the normalization reference for ecotoxicity is two orders of magnitude lower than the one obtained with the EU-C-p set (Table 3), due to the lower coverage of elementary flows of EXIOBASE 3 compared to the other inventory sources, as discussed in Sect. 3.1.

When applying the EF weighting set, climate change (associated with the highest weight) gains relevance across all normalization sets, presenting in three sets (GLO-P-s, GLO-PB, and EU-C-p) the highest median value of the relative contributions of impact categories and in the remaining two the second highest (Fig. 3). Water use, fossil resource depletion and mineral resource depletion gain relevance, while a decrease in the relevance of impacts due to human toxicity and ecotoxicity is evident.

Fig. 3
figure 3

Distribution of the relative contribution of each impact category to the aggregated score across the 144 products for the 5 normalization sets (aggregation with EF weights). Acronyms for the normalization sets are presented in Table 1. Acronyms for the EF impact categories are presented in Table 3. To maximize readability the scale of the y-axis varies across the charts

To perform a more in-depth analysis, 5 products were selected (i.e., one for each area of consumption). For each an extract of the two dashboards is presented in Table 4 (unitary weights) and Table 5 (EF weights).

Table 4 Contribution of impact categories to the aggregated score (calculated with unitary weights) for 5 selected products

Regardless of the weighting set used, ecotoxicity is the largest contributor for all five products selected when the normalization is calculated with the EU-C-i/o set. For all the remaining sets, when using equal weights, the main driver of impact varies across both products and normalization sets, with some recurrent patters, e.g., mineral resource depletion being the driver of impacts for the TV screen with the remaining four normalization sets (in line with the findings of Sala et al. (2019)) and ecotoxicity for the shampoo with three normalization sets out of five (Table 4). The deviation reported with the EU-C-i/o set for the TV screen can be expected as the normalization reference of the EU-C-i/o set for mineral resource depletion is significantly higher than in the four other sets (Table 4), while the high contribution obtained with the EU-P-s set (70% of the aggregated score) can be explained considering the limited extraction of resources in the EU (as discussed in Sect. 3.1).

To further investigate the role of normalization in defining the relative contribution of the impact categories to the aggregated score, the correlation between the relative contributions obtained with each combination of normalization set is reported for the five products considered (Table 6). For the products TV screen and shampoo strong positive correlations between different sets can be seen with some exceptions: for TV screen the EU-C-i/o is not correlated with the other sets and for shampoo the EU-C-p is either not correlated or weakly positively correlated with the other sets. The product tomato shows a strong or moderately strong positive correlation between the different sets with the exception of the EU-C-p (not correlated or negatively correlated with the remaining sets). Instead, the products car and house present either weak positive or no correlation between different sets (negative only for the combination EU-C-p and EU-C-i/o), with the exception of the two production-based approaches (GLO-P-s and EU-P-s that have a strong positive correlation). The higher the correlation coefficients are the lower is the dominance of the choice of the normalization set on the relative ranking of impact categories in the aggregated score. Table 7 presents the complementary analysis, showing the correlation between the relative contributions obtained with each combination of products for the five normalization sets. Strong positive correlations can be seen between the different products in the case of the EU-C-i/o normalization (confirming the bias identified for this set in Sect. 3.1) and the EU-PB normalization. In the latter case these findings are not surprising, due to the dominating role of climate change and particulate matter across the five products (Table 4). In the remaining three sets, the different products in most cases are not correlated. This is particularly true for the EU-C-p set: the set influencing the least the ranking of impact categories. Based on the comparison between Tables 6 and 7 it is possible to argue that, with the exception of the EU-C-i/o set, the ranking of impact categories is more influenced by the inventories of the different products than by the choice of normalization reference, even though this choice plays a significant role and can lead to large differences in rankings observed for the same product (Table 4).

Table 5 Contribution of impact categories to the aggregated score (calculated with EF weights) for 5 selected products
Table 6 Pearson’s correlation between the relative contribution of impact categories to the aggregated score (calculated with unitary weights) obtained with the five normalization sets. One correlation table is provided for each of the five selected products. Acronyms for the normalization sets are presented in Table 1
Table 7 Pearson’s correlation between the relative contribution of impact categories to the aggregated score (calculated with unitary weights) obtained for five selected products. One correlation table is provided for each normalization set. Acronyms for the normalization sets are presented in Table 1

Table 8 shows correlation coefficients calculated between the relative contribution of impact categories obtained with the two weighting sets for each combination of normalization set and product (e.g., correlation between the contributions obtained after applying unitary weights and EF weights for the product tomato normalized with the GLO-P set). Results obtained with the two weighting approaches are positively correlated in all cases and correlation coefficients are generally higher than those reported between normalization sets (Table 6), confirming the dominant role of normalization over weighting (suggested also by a visual comparison between Tables 4 and 5), in line with previous literature (e.g., Prado et al. 2019). However, in 12 cases out of 25 the correlation is lower than 0.75, illustrating that the normalized results are not insensitive to the weighting set applied.

Table 8 Pearson’s correlation between the relative contribution of impact categories to the aggregated score calculated with unitary weights and calculated with EF weights. One correlation value is provided for each combination of normalization reference and product. Acronyms for the normalization sets are presented in Table 1

3.3 Limitations of the normalization step

The analysis presented in this article illustrates how the choice of normalization approach can highly influence the interpretation of the results of an LCA study and should therefore be taken carefully. It is for this reason that the ISO standards suggest using several normalization sets, reflecting different reference systems, to evaluate the sensitivity of the final results to the normalization step (2006b). The following sections analyze the main limitations in the development of normalization references resulting from the analysis presented in this work (Sect. 3.3.1) and discuss implications for the application of normalization references to derive aggregated scores (Sect. 3.3.2).

3.3.1 Limitations in defining normalization references

One of the main limitations of external normalization references are data gaps in the inventories resulting in biased normalization, identified by Heijungs et al. (2007). The authors used this term to identify situations in which there is a different coverage of elementary flows between the inventory developed for the normalization set and the one developed for the product system, causing the resulting normalized value to be either too low or too high. In this current study, this is the case for the ecotoxicity impact category when the EU-C-i/o normalization set is applied: due to the low coverage of toxic substances in the underlying inventory (based on EIO tables) compared to the coverage in the inventory models developed for the 144 products tested, the normalization reference is too low and therefore yields too high normalized ecotoxicity impacts. Another drawback of the EU-C-i/o normalization set is the high level of aggregation of industrial sectors in IO statistics, making the use of average toxic emissions meaningless (Laurent and Hauschild 2015) and the highly aggregated inventories, which limit the possibility to apply the most appropriate characterization factors, as illustrated with the example of mineral resource depletion in Sect. 3.1. For these reasons, the EU-C-i/o normalization set is not deemed appropriate to be used in its current form for normalizing process-based LCA results.

In general terms, to reduce as much as possible the risk of bias when developing external normalization references, it is key to ensure that normalization sets are built on comprehensive inventories. To identify potential bias, Heijungs et al. (2007) suggest to implement some bias detection methods. One of these is to compare the flow contribution of the system under study for a certain impact category with the one of the normalization reference, to verify that the most contributing flows of the system under study are accounted also in the inventory underlying the normalization reference. An example of the magnitude of error caused by using incomplete inventories was provided by Kim et al. (2013). In their work, the authors complemented an existing inventory used to calculate normalization factors for the US (developed by Bare et al. (2006)) and obtained significant variations for four impact categories (namely human toxicity cancer, human toxicity non-cancer, and ecotoxicity, that increased by 4, 3, and 2 orders of magnitudes respectively). Such comparison are of utmost relevance in studies using different impact assessment methods, which can have a different coverage and correspondence between the elementary flows included in the normalization references (of each impact assessment method) and the inventory (of the product or system modelled) (Sanyé-Mengual et al. 2022).

The choice of data source and modelling approaches adopted when building inventories of regional or global emissions is crucial, as it might cause compelling variations in the resulting estimates. For instance, in the case of global normalization factors for ionizing radiations, estimations can be done by considering official statistical data (e.g., UNSCEAR 2016a, b) or by upscaling life cycle inventories of the most relevant processes based on production statistics (for instance based on LCI databases such as Ecoinvent (Wernet et al. 2016)). Choosing the most robust and comprehensive reference is therefore of paramount importance to avoid as much as possible issues of underestimation or overestimation of the resulting normalization factor.

Furthermore, when building global inventories, official statistical data is often missing for several substances at global scale, and therefore extrapolations need to be performed to derive global emission values from information available for a limited number of regions (Laurent and Hauschild 2015; Crenna et al. 2019), adding uncertainty to the results. Extrapolations, both in space and time, should be conducted in a way that limits the introduction of bias and the validity of the underlying assumptions should be checked systematically. Other sources of uncertainty of external normalization references, discussed in detail in Benini and Sala (2016), are as follows: (i) the classification of statistical data as elementary flows, (ii) the characterization of substances, (iii) the specification of emission compartments, (iv) the spatial differentiation of CFs, and (v) the uncertainty associated with the impact assessment models. The authors called for improvements in the spatial resolution of inventories to allow the use of the most appropriate CFs specific for emission source typology and geographical location to reduce uncertainties.

Consumption-based external normalization references have different shortcomings. For example, the main limitation of the consumption-based set presented in this article developed using process-based LCA is that, as it is built on process-based LCA of a number of products considered representative of five main areas of consumption, its comprehensiveness is linked to three elements: the coverage of products consumed in the EU, the representativeness of the products selected, and the upscaling approach adopted to derive total impacts of EU consumption from the impacts calculated for the representative products. As the implementation of this normalization set is limited to the EU, its applicability is limited to studies with the same geographical scope. Furthermore, to correctly capture the impacts of imported goods, the share of each product imported should be modelled considering the efficiency level and production technologies of the countries from which EU is importing such products. This aspect is captured only to a certain extent in the consumption footprint indicator on which the EU-C-p set is built (e.g., different electricity mixes are used for imported goods, but not different feed mixes); however, current efforts are being undertaken to update the consumption footprint indicator taking this aspect into account. One of the main benefits of this type of approach is that it allows to use the same data source for the system under study and the normalization reference, ensuring consistency between the two.

The last normalization set considered in this study, based on the concept of PBs, is also affected by uncertainty, as discussed in detail in Sala et al. (2020). This is mainly related to the assumptions made to translate the PBs (expressed in terms of limits associated to ecological processes) into LCIA metrics, and to difficulties in upscaling local environmental pressures to the global level of PBs (Bjørn and Hauschild 2015; Springmann et al. 2018). Another limitation of this normalization approach is that several of the impacts considered are intrinsically context-specific and hence more relevant at a local scale (e.g., soil erosion, or water scarcity), limiting the meaning of using equal per capita allocations of the total allowable resource use or emission, irrespectively of the location where a certain activity takes place (Hoff et al. 2014). Finally, in the case of the land use indicator, as the PB value expressed in the metric used by the EF method is not known, this was derived from the global impact (i.e., the value in the GLO-P-s normalization set) by making an assumption on the level to which the carrying capacity is currently exceeded. As a consequence, for this impact category, the GLO-PB normalization can be interpreted as an external normalization weighted by applying a carrying capacity-based distance-to-target weighting factor (Hélias and Servien (2021), and unlike the remaining impact categories, it is not independent from the global normalization set.

3.3.2 Implications for the use of normalization references

The choice of normalization reference should be made considering the compatibility of the available normalization references with the goal and scope of the study. This includes considerations on the correspondence between the geographical scale of the normalization reference and of the study. For instance, in this article the inventory models of the 144 products tested were developed considering average products consumed in the EU, which are often characterized by a global supply chain. For this reason, the two global normalization references (GLO-P-s and GLO-PB) and the two consumption-based EU normalization references, developed considering the consumption of traded products (EU-C-p and EU-C-i/o) are deemed more appropriate than the production-based EU normalization reference (EU-P-s), from this perspective. In general terms, as discussed in this and previous works (Huijbregts et al. 2003; Laurent and Hauschild 2015; Pizzol et al. 2017), when adopting external normalization, in order to ensure consistency between the normalization reference and the study, the use of global normalization references is preferable to regional ones, due to the fact that most supply chains are now stretched over the globe.

Furthermore, it is important to be aware of the potential bias that can be introduced by performing external normalization due to the dominant role of the normalization step combined with the issue of compensability. As the results of this exercise demonstrated, the choice of normalization reference significantly affects the relative weight of impact categories in the aggregated score and in some cases few impact categories dominate the calculation of the aggregated score. This is a challenge for the use of this type of aggregation to perform comparative analysis as: (i) the comparative analysis between two products is affected by the choice of normalization set and (ii) in case of few dominating impact categories the comparison between two products will depend on how they perform in those specific environmental domains, compensating potential diverging trends in several other environmental domains (Prado et al. 2019).

Notwithstanding the limitations discussed for the PBs normalization set, of the five sets presented, this is the only one that cannot be affected by data coverage issues (as its normalization references are not based on inventories of current emissions and resources used that can suffer from data gaps) and that by definition overcomes the issue of inverse proportionality. As its application also relies on the use of a weighted sum, aggregated scores calculated with the GLO-PB set are not exempt from the risk of bias discussed above, nevertheless in this case the impact categories dominating the relative ranking of options are those for which more urgent action is needed. For these reasons, this approach shows a high potential to support policy-making by providing absolute sustainability thresholds for defining policy targets. In other words, it enables to prioritize policies targeting hotspot impact categories identified considering the distance to a desired target (Sala et al. 2020).

4 Conclusions

This study investigated the use of different normalization approaches in LCA, by testing five alternative normalization sets. Of the normalization approaches considered, four were external normalization sets, two obtained with a production perspective, applied at global and at EU level, and two with a consumption perspective, derived at EU level by means of environmental extended input–output tables and by using process-based LCA. The last normalization set was based on the concept of planetary boundaries, providing a measure of target impact levels. The proposed normalization sets were compared with one another and differences among them were discussed. The influence of using different normalization approaches on the interpretation of the results of LCA studies was explored by comparing the normalized and weighted life cycle impact assessment results of more than 140 products, after the application of the five normalization sets and of two alternative weighting sets, one making use of unitary weights and the other based on the EF weighting set. The relative ranking of impact categories, in terms of their contribution to the final aggregated score obtained in each case was compared and analyzed. The comparison was complemented by a discussion of the effect that the limitations and assumption of each approach might have on the normalized results.

The findings of this study highlighted the dominating role of the normalization step, and to a lesser extent of the weighing set, over the relative importance of each impact category in the final aggregated score of an LCA study, in line with previous research. This emphasizes the need to choose the most suitable normalization set according to the goal and scope of the study by ensuring: i. the geographical consistency between the normalization reference and the system under study and, ii. that the chosen normalization set is based on an inventory of emissions and resources as comprehensive as possible in order to avoid the risk of biased normalization. A number of additional findings were drawn from the analysis: the dominance of ecotoxicity in the results obtained with the external normalization set based on input/output was interpreted as a sign of biased normalization, making this normalization set not suitable for the proposed application. Similarly, a potential bias was identified in the application of the normalization set developed with a production-based approach at EU level, due to the fact that several of the products tested have global supply chains, discouraging the use of this set. Instead, when applying the PB set, the dominance of impacts on climate change and particulate matter indicated the need to prioritize policy action targeting these indicators.

Limitations of the normalization approaches presented are discussed. In the production-based sets the main limitations are related to the coverage of data and, in the global set, to the assumptions needed to perform extrapolations to fill data gaps. In the process-based consumption approach they are mostly associated with the limited coverage of activities, while in the input–output consumption approach they refer to its lower granularity, which limits the possibility to properly evaluate the impact of each sector (i.e., by applying the most appropriate characterization factors), and to the limited coverage of elementary flows. Both production and consumption-based external normalization approaches are affected by the issue of inverse proportionality. Limitations of the PB-based set are mostly associated with the assumptions made to translate the PBs (expressed in terms of limits associated to ecological processes) into LCIA metrics, and to difficulties in upscaling local environmental pressures to the global level of PBs. Furthermore, the dominant role of the normalization over the aggregated score and the linearity of the aggregation approach allowing for full compensability are limitations of these approaches, affecting the use of normalization and weighting in comparative studies. The PB-based normalization approach can be seen as a way to overcome these issues as it allows to make bias more transparent, and it solves the issue of inverse proportionality.

This analysis calls for extending ongoing efforts in a number of crucial directions. Firstly, to further develop robust and comprehensive inventories of emissions and resources for regions and/or at global level. Secondly, to further explore consumption-based normalization references by increasing the level of detail in top-down approaches for macro-scale applications and complementing the list of products in the bottom-up approach to better capture and represent missing areas of consumption. Thirdly, to ensure the best possible correspondence between the coverage and classification of elementary flows in the inventories used to develop the normalization sets and in the impact assessment method adopted in assessing the products or systems, for which a normalisation step is applied. Lastly, to advance further the coupling of PBs and LCA in order to move towards a decision support based on absolute sustainability thresholds.