1 Introduction

Benjamin Franklin’s aphorism that “in this world nothing can be said to be certain, except death and taxes” ranks along with Albert Einstein’s apocryphal expression that “everything is relative” as famous statements that apply to almost anything. Here, we will argue that they also apply to life cycle assessment (LCA). But we will do more: we will connect the two statements, and connect the uncertainty in LCA to the relative nature of LCA. A central issue in this storyline is the place of correlation: correlated LCA results have been recognized to disturb the interpretation of comparative LCA.

Our story develops around the two themes, uncertainty and relativity. We will first briefly introduce both, and then concentrate on their combination, because most LCA studies are comparative by nature and they invariably involve uncertainties. The two themes have been mostly studied in isolation, but new problems appear whenever we combine them.

The aim of this discussion article is to provide an overview of the ramifications of this combination. It will point out some of the issues in a casual, not too formal way, without overloading it with literature references. It is to a large extent based on a synthesis of a number of recent publications by the authors, and in particular Henriksson et al. (2014, 2015a, 2015b), Groen and Heijungs (2017) Mendoza Beltrán et al. (2016, 2018) and Heijungs et al. (2017).

2 Basic concepts

In this section, we study the two main ideas, uncertainty and relativity, in relation to LCA.

2.1 Uncertainty

The topic of uncertainty in LCA has been brought up almost as soon as the entire idea of LCA popped up. This is not surprising, because LCA deals with a lot of data and involves many choices. Various consensus procedures and standardization attempts have resulted in a slightly smaller spectrum of options for data and choices, but it is well known that different LCA studies on the same topic within the same geographical, temporal and technological context can still produce markedly different results, due to a difference in, for instance

  • functional unit;

  • unit process data (including sampling design, measurement errors, temporal variability, variation in space, assumptions with respect to end-of-life infrastructure, etc.);

  • system boundaries;

  • allocation rules;

  • characterisation methods;

  • characterisation factors (including variation in space, extrapolations from laboratory to field conditions, etc.);

  • normalization principles;

  • weighting factors;

  • calculation principles.

To account for such differences, LCA guidebooks usually recommend reporting the choices made, and, if time allows, to do a few extra calculations, using for instance another allocation principle, system boundary or choice for one or two critical parameters, such as the product’s life time.

To an increasing extent, differences in data are processed with a probabilistic approach, where the input data are considered to have a stochastic component, which propagates into a stochastic LCA result. The Monte Carlo simulation is a widely used technique for the propagation of such input uncertainties, although some authors prefer the use of other methods.

While the incorporation of variations due to data uncertainties and methodological choices is from a scientific point of view unavoidable, from a practical and policy point of view it has definite drawbacks. We just want to know if there will be rain tomorrow or not, but science often only tells us that there will be a 30% chance of rain. Dealing with uncertain information is obviously a challenge for any decision-maker, and this applies in particular when the stakes are high.

2.2 Relativity

LCA answers in the vast majority of cases relative questions. Is product A better than product B? Is a redesigned version of this product better than the currently available version? Is it better to outsource the electricity production or to generate it on-site? Purchase decisions, investment decisions, ecolabels, it’s all done on the basis of comparisons. While we recognize that there are perhaps a few situations where LCAs are done on a stand-alone basis, without comparison, this discussion article will further build on the typical situation of comparative assessments, in one form or another.

Conceptually, it may be important to further differentiate between comparing two systems and comparing more than two systems. In many contexts, a comparison of two systems is easier than a comparison of several systems. Just think about most sport matches, ranging from football to chess and from boxing to hockey, where two teams or players compete for priority. Whenever there are more competing teams or players, we need to set-up more complicated systems for finding out a winner or a ranking. This differentiation between simple comparison and multiple comparisons is also present in scientific procedures, for instance in statistical analysis, where we have an independent samples t test for the case of two options and an ANOVA with an F-test for the case of more than two options. In our discussion, we will take the general point of view of comparing several systems. Occasionally, we will study the simpler case of comparing only two systems.

2.3 Uncertainty in a relative perspective

Combining the two points made, we are now invited to study how comparisons are to be done in the case of uncertain information. Here, an important complication enters the scene.

Consider the following case: we have information about the price of two cars, but the price information is not entirely accurate. Car 1 costs about 45,000$ but it might be a few thousand more or less. Let us symbolize this as 45,000$ ± 2,000$. Car 2 costs 50,000 ± 2,000$. It seems clear: the first car is likely to be cheaper than the second car. But now, I’m living in Europe and wish to decide on the basis of prices in euro. I do not know the precise exchange rate between dollars and euros, but typically 1 dollar is approximately 1 euro, although sometimes it is 20 cents less and sometimes 20 cents more. Let us write this as 100$ = 100 ± 20€. A straightforward calculation now tells me that car 1 costs approximately 45,000 ± 11,000€ and car 2 approximately 50,000 ± 12,000€, so there is a tremendous region of overlap, and a naive suggestion would be that there is no significant difference between the two cars in terms of its price in euro. Figure 1 illustrates the case.

Fig. 1
figure 1

The price of car 1 and car 2 in $ (left) and in € (right) given an uncertainty in price and exchange rate. Choosing the cheapest car in $ is straightforward, but there does not seem to be a significant difference between the car prices in €

This is of course a weird situation, because the uncertainty of the exchange rate should apply equally to the two cars. If I want to buy a car tomorrow, I will face tomorrow’s exchange rate, and in a comparative sense car 1 will be cheaper than car 2, for sure.

This example is supposed to make clear that in a comparative analysis with uncertainties, there may be shared uncertainties as well as uncertainties that are specific to each option. In order to still see the signal, and not drown in the noise by uncertainties on top of uncertainties, we need to develop calculation procedures which can distinguish such nuances. It is unclear to which extent the currently used software allows for such more sophisticated analyses, and if it allows so, to which extent LCA practitioners indeed employ it.

3 Correlations

The issue presented has been recognized within LCA, often under the term correlation or correlated uncertainty. Correlation, however, is a much wider concept, and an unqualified application of this term is likely to mislead the audience. We will therefore discuss the issue of correlations in the context of LCA in more detail.

3.1 General reflections

Correlation has to do with dependence; one thing depends on another thing, and the other way around. The terms imply a “betweenness”: we can only speak of a correlation “between” two or more things or a dependence of one or more things on another thing or things. In the present case, the ultimate variable is the product’s score, which can be a single number (e.g., weighted index or carbon footprint), or a set of numbers (e.g., a normalized environmental profile), for each product alternative. These numbers are the result of calculations which involve input data and choices. Some of these data and choices are common to all products compared. For instance, we usually take the same GWP list for product 1 and 2; we will not use GWP-20 years from the 2007 list for product 1 and GWP-100 years from the 2013 list. And we usually include or exclude capital goods for product 1 and 2. But there are also numbers and choices that usually differ per product alternative. For instance, if we compare electricity produced by fossil fuel to electricity produced by biomass, only the second product requires data on carbon sequestration. All the numbers and choices that play a role in the LCA procedure can interact to create correlations or dependencies in different ways.

Correlated uncertainties are a ubiquitous phenomenon in comparative LCA with uncertainties due to choices and data. So ubiquitous in fact that a generic treatment is impossible. In the remainder of this article, we will focus on the issue of correlated uncertainty.

3.2 Correlated inputs and correlated outputs

A basic distinction in modelling is between inputs and outputs. These two words can, certainly in the context of LCA, be misleading. LCA traditionally discerns inputs, such as materials and resources from outputs, such as waste and emissions. A more general modelling theory, however, discerns inputs and outputs in a mathematical sense. This can be aptly described by

$$ y=f(x) $$

where the input x is transformed into a model output y, by means of some model, symbolized through f. In case we have more than one input and output, say n inputs x1, x2, …, xn and m outputs y1, y2, …, ym, we can write this as

$$ \left\{\begin{array}{ccc}{y}_1& =& {f}_1\left({x}_1,{x}_2,\dots, {x}_n\right)\\ {}{y}_2& =& {f}_2\left({x}_1,{x}_2,\dots, {x}_n\right)\\ {}\dots & =& \dots \\ {}{y}_m& =& {f}_m\left({x}_1,{x}_2,\dots, {x}_n\right)\end{array}\right. $$

By recognizing multiple outputs, we should make it very clear that this embraces two kinds of multiple outputs:

  • one product with several LCA results, for instance a score for global warming (y1), one for acidification (y2) and one for smog (y3);

  • several products with one LCA result, for instance a carbon footprint for product A (y1), for product B (y2) and for product C (y3);

  • the combination of the two aspects above.

We will now move from the situation of deterministic inputs to stochastic inputs. We follow the usual convention in probability theory to write stochastic variables as a capital letter, and their realizations as a lowercase letter. Therefore, instead of x1, etc. we will write X1, etc. For example, if in a deterministic model the first input variable has a value of 5, we would write x1 = 5. If on the other hand a probability distribution has been specified for this first variable, say, a normal distribution with mean 5 and standard deviation 1, we can write X1~N(5, 1). More generally, we write

$$ {X}_1\sim N\left({\mu}_1,{\sigma}_1^2\right) $$

when X1 is normally distributed with mean μ1 and variance \( {\sigma}_1^2 \) (or equivalently, standard deviation σ1).

The functions f1, etc. will remain as they were. We will not assume stochastic models, only stochastic model inputs. If needed, uncertainty of the model itself due to choice uncertainty may be introduced through one of the inputs. For instance, if we have a choice between mass allocation, energy allocation and economic allocation, each of which have equal probability, we may specify this as one of the input parameters with a discrete uniform distribution between 1 and 3, so as X2~Udiscrete(1, 3):

$$ {x}_2=\left\{\begin{array}{cc}1& \mathrm{mass}\ \mathrm{allocation}\\ {}2& \mathrm{energy}\ \mathrm{allocation}\\ {}3& \mathrm{economic}\ \mathrm{allocation}\end{array}\right. $$

See also Mendoza Beltrán et al. (2016).

As a result of the stochastic inputs, the deterministic outputs y will become stochastic as well; the symbol Y will be used to refer to them. So the previous system of model equations now becomes

$$ \left\{\begin{array}{ccc}{Y}_1& =& {f}_1\left({X}_1,{X}_2,\dots, {X}_n\right)\\ {}{Y}_2& =& {f}_2\left({X}_1,{X}_2,\dots, {X}_n\right)\\ {}\dots & =& \dots \\ {}{Y}_m& =& {f}_m\left({X}_1,{X}_2,\dots, {X}_n\right)\end{array}\right. $$

Which correlations can now be present? Recognizing that correlations always imply two items, we can now discern:

  • correlations between a pair of input variables, say X1 and X2;

  • correlations between a pair of output variables, say Y1 and Y2;

  • correlations between an input and an output variable, say X1 and Y1.

We will discuss each of these cases below.

3.3 Correlations between a pair of input variables

The first case we consider is of correlated model inputs, where, it should be recalled, inputs have a broader meaning than usual, comprising all data that is inserted into the calculation, so including emission factors, characterisation factors and allocation choices. The existence of correlations between such input data is a realistic case in LCA. Unit process data that refer to the same process will often be correlated in some way, due to the laws of physics, chemistry and biology. A less-efficient engine needs more fuel and will emit more exhaust gases. A more efficient cattle-breeding farm will consume less feed and produce less waste. Similar relationships will exist at other places of the LCA. If we decide to choose allocation principle 1 (mass-based) for one process, probably we will choose it for another process as well. It also applies to impact assessment choices (like the time horizon of GWP) and uncertainties in characterisation factors (the half-life time of a toxic will affect both human toxicity and ecotoxicity). Summing up, there may exist correlations between many of the input variables X1, …, Xn.

In general, expressing a multivariate probability distribution is much more cumbersome than for univariate case. An important exception if the multivariate normal case, in which the notation

$$ \mathbf{X}\sim N\left({\boldsymbol{\mu}}_X,{\boldsymbol{\Sigma}}_X\right) $$

is used, and where the bold symbols code for vectors and matrices:

$$ \mathbf{X}=\left(\begin{array}{c}{X}_1\\ {}{X}_2\\ {}\dots \\ {}{X}_n\end{array}\right),{\boldsymbol{\mu}}_X=\left(\begin{array}{c}{\mu}_{X_1}\\ {}{\mu}_{X_2}\\ {}\dots \\ {}{\mu}_{X_n}\end{array}\right),{\boldsymbol{\Sigma}}_X=\left(\begin{array}{cccc}{\sigma}_{X_1}^2& {\sigma}_{X_1{X}_2}& \dots & {\sigma}_{X_1{X}_n}\\ {}{\sigma}_{X_2{X}_1}& {\sigma}_{X_2}^2& \dots & {\sigma}_{X_2{X}_n}\\ {}\dots & \dots & \dots & \dots \\ {}{\sigma}_{X_n{X}_1}& {\sigma}_{X_n{X}_2}& \dots & {\sigma}_{X_n}^2\end{array}\right) $$

Of particular interest is the (symmetric) covariance matrix ΣX, of which the diagonal elements contain the usual univariate variances, but of which the off-diagonal elements contain the covariances that express the correlation between inputs. If \( {\sigma}_{X_1{X}_2}={\sigma}_{X_2{X}_1}=0 \), X1 and X2 are not correlated. If these elements are positive, there is a positive correlation, and if they are negative, the correlation is negative.

All three cases, zero, positive and negative elements, are likely to show up in LCA. For instance:

  • fuel into a car and emission out of the same car will be positively correlated;

  • solar electricity into and fossil electricity into a house will be negatively correlated;

  • electricity into a house and fuel into a car will be uncorrelated.

The specification of a covariance matrix is likely to be difficult in practice, given that even the diagonal elements, the univariate variances, are often difficult to find. And even if we would know the covariance matrix, most, if not all, software for LCA does not offer the possibility to enter this information and use it in subsequent calculations.

3.4 Correlations between an input and output variable

Correlations between inputs and outputs are trivially present. Because the model f is deterministic, there will be a correlation between X1 and Y1, in one way or another. In rare cases, probability theory can calculate the probability distribution of an output when the probability distribution of an input is specified. An example is X1~N(0, 1) and \( {y}_1=f\left({x}_1\right)={x}_1^2 \), for which it follows that Y1~χ2(1), the chi-square distribution with 1 degree of freedom. In the majority of cases, such calculations are not possible. Even for a simple case like X1~N(0, 1) and \( {y}_1=f\left({x}_1\right)=\frac{1}{x_1} \), the distribution of Y1 is not known in mathematical form.

Special techniques for so-called uncertainty propagation are available to approximate the distribution of Y1 in such cases (Groen et al. 2014). Important examples of these techniques are Monte Carlo simulation and Gaussian error propagation, the latter relying on a Taylor-series approximation.

Monte Carlo simulations are based on sampling the probability space spanned by the input variables X1, …, Xn, using random number generators that comply with the specified probability distribution (e.g. \( N\left({\mu}_{X_1},{\sigma}_{X_1}^2\right) \)), and calculating the output variables Y1, …, Ym for each set of sampled values. Thus, a quasi-empirical distribution of the various Y variables is obtained. Monte Carlo simulations are typically done with a large sample size, for instance 1,000 or 10,000. This makes the process computationally expensive.

Gaussian error propagation is based on the linear approximation of the functions f1, …, fm around the working point. A typical choice for this working point is the mean value of X1, …, Xn, namely \( {\mu}_{X_1},\dots, {\mu}_{X_n} \). An approximate expression for the variance of Y1, …, Ym is then obtained through

$$ {\sigma}_{Y_i}^2\approx \sum \limits_{j=1}^n{\left({\left.\frac{\partial {f}_i}{\partial {x}_j}\right|}_{\mu_{X_1},\dots, {\mu}_{X_n}}\right)}^2{\sigma}_{X_j}^2 $$

These techniques, and in particular Monte Carlo, are increasingly available in LCA software.

3.5 Correlations between a pair of output variables

Whenever two model outputs, say Y1 and Y2, depend on a common input, say X1, there can be a correlation between the two outputs. Just consider the case of

$$ \left\{\begin{array}{ccc}{y}_1& =& 2{x}_1+6\\ {}{y}_2& =& 4{x}_1-4\end{array}\right. $$

This example provides a case of full linear correlation: y2 = 2y1 − 16. The correlation may also be smaller, for instance, when non-linear functions are involved. It is only when two outputs do not rely on the same inputs that there is zero correlation. An example is

$$ \left\{\begin{array}{ccc}{y}_1& =& \sqrt{x_1}+\ln {x}_2\\ {}{y}_2& =& 4{x}_3^2-\sin {x}_4\end{array}\right. $$

LCA-related examples within one product are the emissions of CO and NOx, depending on a variable which controls the air supply of combustion process, or the impact scores on smog and acidification, depending on the characterisation factor for heavy metals. An LCA-related example in the case of a product comparison is the carbon footprint of products A, B and C, all depending on an uncertain emission factor of the same power plant.

When we propagate uncertainties in the input data in a probabilistic way, for example using Monte Carlo simulation, the resulting output distributions will contain a correlation structure. However, it requires a careful uncertainty propagation, in every iteration of the simulation:

  • sampling all input variables (so x1 for X1, x2 for X2, etc.) once;

  • calculating the output variables, for all product alternatives and/or impact categories (so y1 for Y1, y2 for Y2, etc.).

Failing to do so will lead to the problem outlined in Fig. 1.

The issue highlighted here has been described in the LCA literature as “dependent sampling”. From our analysis, it follows that we must do this dependent sampling not only across product alternatives, but also across the LCA indicators for one product, be it at the inventory level or impact assessment. It is often unclear if in published case studies and in programs for LCA this issue has been taken into account. In case of stand-alone or pre-calculated LCA studies, post hoc comparisons will probably lead to overly weak conclusions (Heijungs et al. 2017).

For non-sampling methods, like the Gauss/Taylor-based analytical uncertainty propagation, the issue is more complicated. The point is that this method for uncertainty propagation gives an expression for the variance of the output variables (so here: \( {\sigma}_{Y_1}^2 \) and \( {\sigma}_{Y_2}^2 \)), but no expression for a covariance between them (like \( {\sigma}_{Y_1,{Y}_2} \)). Given the advantages of the analytical expressions over the time-intensive Monte Carlo method, here appears to be an important methodological gap that needs to be filled. Fortunately, there still seems to be progress in computation performance: Brightway2 claims to be able to do “more than 100 Monte Carlo iterations/second”.

As a final remark, observe that correlations between output variables may be the result of correlations between input variables, but not necessarily so. They can also occur when there is just one uncertain input, or when the inputs are uncorrelated. Further, while the danger of misrepresenting uncertainty of results is clear in case of correlated inputs, it is less clear if this is also the case for correlated outputs.

4 Conclusions

As we have now studied the three elementary cases, it is time to sum up and move on. Figure 2 summarizes the elements of addressing uncertainty in LCA.

Fig. 2
figure 2

Proposed framework for propagating and interpreting uncertainties in LCAs (adapted from Henriksson et al. 2015a)

The first element of the framework is Data collection. Correlations between input variables can be described by probability functions. Only the base case of the multivariate normal distribution is well known. It requires a covariance matrix with variances on the diagonal and covariances on the off-diagonal elements. Already for the non-correlated case, variances are often crudely represented due to limited access to data, limited resources for a proper sampling design and data collection, and incomplete data handling and reporting. The pedigree-based approach has taken an intractable role here for providing surrogate variances. It is also questionable if such tricks will work for the much more challenging covariances. For example, if we have an LCA with 5,000 input variables (this is not exceptionally large: ecoinvent v3 exceeds this number), we need to specify 5,000 variances and no less than 12 million covariances. We also mention the problem that the conventional probability distribution for uncertain LCA data is not the normal but the log-normal probability distribution. Specifying correlations for non-normal multivariate distributions is a much more complicated affair, with many open questions.

The second element of the framework is Propagation. Correlations between input variables on the one hand and output variables on the other hand can be taken into account with a few precautions, depending on the uncertainty propagation method. A sampling-based method such as the Monte Carlo simulation must take care to sample one full set of input variables and then calculate every model output, so for each indicator and for each product. Otherwise, the inflated error bars from Fig. 1 will ruin the analysis. Generating random numbers from a normal distribution including correlations is straightforward. Generating random numbers from a non-normal multivariate distribution is again much more complicated. A more economical method for propagating uncertainties, such as the analytical expressions on the basis of Gaussian uncertainty propagation, needs to include the covariance structure of the input variables. This can in theory be done by adding more terms to the Taylor expansion:

$$ {\sigma}_{Y_i}^2\approx \sum \limits_j^n{\left({\left.\frac{\partial {f}_i}{\partial {x}_j}\right|}_{\mu_{X_1},\dots, {\mu}_{X_n}}\right)}^2{\sigma}_{X_j}^2+2\sum \limits_{j\ne k}\left({\left.\frac{\partial {f}_i}{\partial {x}_j}\right|}_{\mu_{X_1},\dots, {\mu}_{X_n}}{\left.\frac{\partial {f}_i}{\partial {x}_k}\right|}_{\mu_{X_1},\dots, {\mu}_{X_n}}\right){\sigma}_{X_j{X}_k} $$

A downside is again the unavailability of the covariance data, besides other downsides, for instance the limited validity of the expression, only for “small” uncertainties.

The third element of the framework is Interpretation. It concerns the process of interpreting, visualizing and deciding on the basis of correlated model output results. Correlations between output variables are almost automatic in a Monte Carlo set-up, provided the previously mentioned precautions have been taken. For the analytical expressions, a major research gap has been identified: how to account for correlated results when using such methods. A practical basis for decision support and series of convenient presentations has been synthesized by Mendoza Beltrán et al. (2018). It is based on tables with pairwise comparative assertions, such as “product A beats product B” or “product A is significantly better than product B”. Such schemes may seem complicated to use in a comparison of more than two products, and this certainly is so when there are not only several products but also several criteria. Nevertheless, we think there is presently no way that better combines insight and transparency.

The final element of the framework is Communication. We will have to communicate our results carefully as they are uncertain. Thus, instead of concluding that A is better than B, we should state something like “with 95% certainty, A is better than B”. The best way to do so will of course depend on the audience. Product information for the general public requires another strategy than for highly specialized process engineers. For instance, for public communication purposes, a translation of probabilistic outcomes into digested information is needed.

Although we have the elements in place now, there are still huge challenges. Additional work is needed to operationalize and streamline them so that in the end they will become available through the existing LCA software packages to the entire community of LCA practitioners. And last but not least, we will have to collect the required data. But we need to go this way, raise the bar for LCA case studies, and not produce LCA results without uncertainties anymore.