Abstract
Introduction
It is widely recognized that LCA is in most cases relative and contains uncertainties due to choices and data. This paper analyses the combination of the two comparative uncertainties.
Basic concepts
We carefully define the idea of relativity and uncertainty within LCA. We finish off by giving an example of case where inappropriate handling of comparative uncertainties will lead to a misleading result for a decisionmaker.
Correlations
We develop a generic framework for probabilistic comparative LCA and analyse at which places correlations may be present. We also discuss the most convenient approaches for handling such correlated uncertainties.
Conclusion
We put the elements discussed in a structure that provides a research agenda for dealing with comparative uncertainties in LCA.
Introduction
Benjamin Franklin’s aphorism that “in this world nothing can be said to be certain, except death and taxes” ranks along with Albert Einstein’s apocryphal expression that “everything is relative” as famous statements that apply to almost anything. Here, we will argue that they also apply to life cycle assessment (LCA). But we will do more: we will connect the two statements, and connect the uncertainty in LCA to the relative nature of LCA. A central issue in this storyline is the place of correlation: correlated LCA results have been recognized to disturb the interpretation of comparative LCA.
Our story develops around the two themes, uncertainty and relativity. We will first briefly introduce both, and then concentrate on their combination, because most LCA studies are comparative by nature and they invariably involve uncertainties. The two themes have been mostly studied in isolation, but new problems appear whenever we combine them.
The aim of this discussion article is to provide an overview of the ramifications of this combination. It will point out some of the issues in a casual, not too formal way, without overloading it with literature references. It is to a large extent based on a synthesis of a number of recent publications by the authors, and in particular Henriksson et al. (2014, 2015a, 2015b), Groen and Heijungs (2017) Mendoza Beltrán et al. (2016, 2018) and Heijungs et al. (2017).
Basic concepts
In this section, we study the two main ideas, uncertainty and relativity, in relation to LCA.
Uncertainty
The topic of uncertainty in LCA has been brought up almost as soon as the entire idea of LCA popped up. This is not surprising, because LCA deals with a lot of data and involves many choices. Various consensus procedures and standardization attempts have resulted in a slightly smaller spectrum of options for data and choices, but it is well known that different LCA studies on the same topic within the same geographical, temporal and technological context can still produce markedly different results, due to a difference in, for instance

functional unit;

unit process data (including sampling design, measurement errors, temporal variability, variation in space, assumptions with respect to endoflife infrastructure, etc.);

system boundaries;

allocation rules;

characterisation methods;

characterisation factors (including variation in space, extrapolations from laboratory to field conditions, etc.);

normalization principles;

weighting factors;

calculation principles.
To account for such differences, LCA guidebooks usually recommend reporting the choices made, and, if time allows, to do a few extra calculations, using for instance another allocation principle, system boundary or choice for one or two critical parameters, such as the product’s life time.
To an increasing extent, differences in data are processed with a probabilistic approach, where the input data are considered to have a stochastic component, which propagates into a stochastic LCA result. The Monte Carlo simulation is a widely used technique for the propagation of such input uncertainties, although some authors prefer the use of other methods.
While the incorporation of variations due to data uncertainties and methodological choices is from a scientific point of view unavoidable, from a practical and policy point of view it has definite drawbacks. We just want to know if there will be rain tomorrow or not, but science often only tells us that there will be a 30% chance of rain. Dealing with uncertain information is obviously a challenge for any decisionmaker, and this applies in particular when the stakes are high.
Relativity
LCA answers in the vast majority of cases relative questions. Is product A better than product B? Is a redesigned version of this product better than the currently available version? Is it better to outsource the electricity production or to generate it onsite? Purchase decisions, investment decisions, ecolabels, it’s all done on the basis of comparisons. While we recognize that there are perhaps a few situations where LCAs are done on a standalone basis, without comparison, this discussion article will further build on the typical situation of comparative assessments, in one form or another.
Conceptually, it may be important to further differentiate between comparing two systems and comparing more than two systems. In many contexts, a comparison of two systems is easier than a comparison of several systems. Just think about most sport matches, ranging from football to chess and from boxing to hockey, where two teams or players compete for priority. Whenever there are more competing teams or players, we need to setup more complicated systems for finding out a winner or a ranking. This differentiation between simple comparison and multiple comparisons is also present in scientific procedures, for instance in statistical analysis, where we have an independent samples t test for the case of two options and an ANOVA with an Ftest for the case of more than two options. In our discussion, we will take the general point of view of comparing several systems. Occasionally, we will study the simpler case of comparing only two systems.
Uncertainty in a relative perspective
Combining the two points made, we are now invited to study how comparisons are to be done in the case of uncertain information. Here, an important complication enters the scene.
Consider the following case: we have information about the price of two cars, but the price information is not entirely accurate. Car 1 costs about 45,000$ but it might be a few thousand more or less. Let us symbolize this as 45,000$ ± 2,000$. Car 2 costs 50,000 ± 2,000$. It seems clear: the first car is likely to be cheaper than the second car. But now, I’m living in Europe and wish to decide on the basis of prices in euro. I do not know the precise exchange rate between dollars and euros, but typically 1 dollar is approximately 1 euro, although sometimes it is 20 cents less and sometimes 20 cents more. Let us write this as 100$ = 100 ± 20€. A straightforward calculation now tells me that car 1 costs approximately 45,000 ± 11,000€ and car 2 approximately 50,000 ± 12,000€, so there is a tremendous region of overlap, and a naive suggestion would be that there is no significant difference between the two cars in terms of its price in euro. Figure 1 illustrates the case.
This is of course a weird situation, because the uncertainty of the exchange rate should apply equally to the two cars. If I want to buy a car tomorrow, I will face tomorrow’s exchange rate, and in a comparative sense car 1 will be cheaper than car 2, for sure.
This example is supposed to make clear that in a comparative analysis with uncertainties, there may be shared uncertainties as well as uncertainties that are specific to each option. In order to still see the signal, and not drown in the noise by uncertainties on top of uncertainties, we need to develop calculation procedures which can distinguish such nuances. It is unclear to which extent the currently used software allows for such more sophisticated analyses, and if it allows so, to which extent LCA practitioners indeed employ it.
Correlations
The issue presented has been recognized within LCA, often under the term correlation or correlated uncertainty. Correlation, however, is a much wider concept, and an unqualified application of this term is likely to mislead the audience. We will therefore discuss the issue of correlations in the context of LCA in more detail.
General reflections
Correlation has to do with dependence; one thing depends on another thing, and the other way around. The terms imply a “betweenness”: we can only speak of a correlation “between” two or more things or a dependence of one or more things on another thing or things. In the present case, the ultimate variable is the product’s score, which can be a single number (e.g., weighted index or carbon footprint), or a set of numbers (e.g., a normalized environmental profile), for each product alternative. These numbers are the result of calculations which involve input data and choices. Some of these data and choices are common to all products compared. For instance, we usually take the same GWP list for product 1 and 2; we will not use GWP20 years from the 2007 list for product 1 and GWP100 years from the 2013 list. And we usually include or exclude capital goods for product 1 and 2. But there are also numbers and choices that usually differ per product alternative. For instance, if we compare electricity produced by fossil fuel to electricity produced by biomass, only the second product requires data on carbon sequestration. All the numbers and choices that play a role in the LCA procedure can interact to create correlations or dependencies in different ways.
Correlated uncertainties are a ubiquitous phenomenon in comparative LCA with uncertainties due to choices and data. So ubiquitous in fact that a generic treatment is impossible. In the remainder of this article, we will focus on the issue of correlated uncertainty.
Correlated inputs and correlated outputs
A basic distinction in modelling is between inputs and outputs. These two words can, certainly in the context of LCA, be misleading. LCA traditionally discerns inputs, such as materials and resources from outputs, such as waste and emissions. A more general modelling theory, however, discerns inputs and outputs in a mathematical sense. This can be aptly described by
where the input x is transformed into a model output y, by means of some model, symbolized through f. In case we have more than one input and output, say n inputs x_{1}, x_{2}, …, x_{n} and m outputs y_{1}, y_{2}, …, y_{m}, we can write this as
By recognizing multiple outputs, we should make it very clear that this embraces two kinds of multiple outputs:

one product with several LCA results, for instance a score for global warming (y_{1}), one for acidification (y_{2}) and one for smog (y_{3});

several products with one LCA result, for instance a carbon footprint for product A (y_{1}), for product B (y_{2}) and for product C (y_{3});

the combination of the two aspects above.
We will now move from the situation of deterministic inputs to stochastic inputs. We follow the usual convention in probability theory to write stochastic variables as a capital letter, and their realizations as a lowercase letter. Therefore, instead of x_{1}, etc. we will write X_{1}, etc. For example, if in a deterministic model the first input variable has a value of 5, we would write x_{1} = 5. If on the other hand a probability distribution has been specified for this first variable, say, a normal distribution with mean 5 and standard deviation 1, we can write X_{1}~N(5, 1). More generally, we write
when X_{1} is normally distributed with mean μ_{1} and variance \( {\sigma}_1^2 \) (or equivalently, standard deviation σ_{1}).
The functions f_{1}, etc. will remain as they were. We will not assume stochastic models, only stochastic model inputs. If needed, uncertainty of the model itself due to choice uncertainty may be introduced through one of the inputs. For instance, if we have a choice between mass allocation, energy allocation and economic allocation, each of which have equal probability, we may specify this as one of the input parameters with a discrete uniform distribution between 1 and 3, so as X_{2}~U_{discrete}(1, 3):
See also Mendoza Beltrán et al. (2016).
As a result of the stochastic inputs, the deterministic outputs y will become stochastic as well; the symbol Y will be used to refer to them. So the previous system of model equations now becomes
Which correlations can now be present? Recognizing that correlations always imply two items, we can now discern:

correlations between a pair of input variables, say X_{1} and X_{2};

correlations between a pair of output variables, say Y_{1} and Y_{2};

correlations between an input and an output variable, say X_{1} and Y_{1}.
We will discuss each of these cases below.
Correlations between a pair of input variables
The first case we consider is of correlated model inputs, where, it should be recalled, inputs have a broader meaning than usual, comprising all data that is inserted into the calculation, so including emission factors, characterisation factors and allocation choices. The existence of correlations between such input data is a realistic case in LCA. Unit process data that refer to the same process will often be correlated in some way, due to the laws of physics, chemistry and biology. A lessefficient engine needs more fuel and will emit more exhaust gases. A more efficient cattlebreeding farm will consume less feed and produce less waste. Similar relationships will exist at other places of the LCA. If we decide to choose allocation principle 1 (massbased) for one process, probably we will choose it for another process as well. It also applies to impact assessment choices (like the time horizon of GWP) and uncertainties in characterisation factors (the halflife time of a toxic will affect both human toxicity and ecotoxicity). Summing up, there may exist correlations between many of the input variables X_{1}, …, X_{n}.
In general, expressing a multivariate probability distribution is much more cumbersome than for univariate case. An important exception if the multivariate normal case, in which the notation
is used, and where the bold symbols code for vectors and matrices:
Of particular interest is the (symmetric) covariance matrix Σ_{X}, of which the diagonal elements contain the usual univariate variances, but of which the offdiagonal elements contain the covariances that express the correlation between inputs. If \( {\sigma}_{X_1{X}_2}={\sigma}_{X_2{X}_1}=0 \), X_{1} and X_{2} are not correlated. If these elements are positive, there is a positive correlation, and if they are negative, the correlation is negative.
All three cases, zero, positive and negative elements, are likely to show up in LCA. For instance:

fuel into a car and emission out of the same car will be positively correlated;

solar electricity into and fossil electricity into a house will be negatively correlated;

electricity into a house and fuel into a car will be uncorrelated.
The specification of a covariance matrix is likely to be difficult in practice, given that even the diagonal elements, the univariate variances, are often difficult to find. And even if we would know the covariance matrix, most, if not all, software for LCA does not offer the possibility to enter this information and use it in subsequent calculations.
Correlations between an input and output variable
Correlations between inputs and outputs are trivially present. Because the model f is deterministic, there will be a correlation between X_{1} and Y_{1}, in one way or another. In rare cases, probability theory can calculate the probability distribution of an output when the probability distribution of an input is specified. An example is X_{1}~N(0, 1) and \( {y}_1=f\left({x}_1\right)={x}_1^2 \), for which it follows that Y_{1}~χ^{2}(1), the chisquare distribution with 1 degree of freedom. In the majority of cases, such calculations are not possible. Even for a simple case like X_{1}~N(0, 1) and \( {y}_1=f\left({x}_1\right)=\frac{1}{x_1} \), the distribution of Y_{1} is not known in mathematical form.
Special techniques for socalled uncertainty propagation are available to approximate the distribution of Y_{1} in such cases (Groen et al. 2014). Important examples of these techniques are Monte Carlo simulation and Gaussian error propagation, the latter relying on a Taylorseries approximation.
Monte Carlo simulations are based on sampling the probability space spanned by the input variables X_{1}, …, X_{n}, using random number generators that comply with the specified probability distribution (e.g. \( N\left({\mu}_{X_1},{\sigma}_{X_1}^2\right) \)), and calculating the output variables Y_{1}, …, Y_{m} for each set of sampled values. Thus, a quasiempirical distribution of the various Y variables is obtained. Monte Carlo simulations are typically done with a large sample size, for instance 1,000 or 10,000. This makes the process computationally expensive.
Gaussian error propagation is based on the linear approximation of the functions f_{1}, …, f_{m} around the working point. A typical choice for this working point is the mean value of X_{1}, …, X_{n}, namely \( {\mu}_{X_1},\dots, {\mu}_{X_n} \). An approximate expression for the variance of Y_{1}, …, Y_{m} is then obtained through
These techniques, and in particular Monte Carlo, are increasingly available in LCA software.
Correlations between a pair of output variables
Whenever two model outputs, say Y_{1} and Y_{2}, depend on a common input, say X_{1}, there can be a correlation between the two outputs. Just consider the case of
This example provides a case of full linear correlation: y_{2} = 2y_{1} − 16. The correlation may also be smaller, for instance, when nonlinear functions are involved. It is only when two outputs do not rely on the same inputs that there is zero correlation. An example is
LCArelated examples within one product are the emissions of CO and NO_{x}, depending on a variable which controls the air supply of combustion process, or the impact scores on smog and acidification, depending on the characterisation factor for heavy metals. An LCArelated example in the case of a product comparison is the carbon footprint of products A, B and C, all depending on an uncertain emission factor of the same power plant.
When we propagate uncertainties in the input data in a probabilistic way, for example using Monte Carlo simulation, the resulting output distributions will contain a correlation structure. However, it requires a careful uncertainty propagation, in every iteration of the simulation:

sampling all input variables (so x_{1} for X_{1}, x_{2} for X_{2}, etc.) once;

calculating the output variables, for all product alternatives and/or impact categories (so y_{1} for Y_{1}, y_{2} for Y_{2}, etc.).
Failing to do so will lead to the problem outlined in Fig. 1.
The issue highlighted here has been described in the LCA literature as “dependent sampling”. From our analysis, it follows that we must do this dependent sampling not only across product alternatives, but also across the LCA indicators for one product, be it at the inventory level or impact assessment. It is often unclear if in published case studies and in programs for LCA this issue has been taken into account. In case of standalone or precalculated LCA studies, post hoc comparisons will probably lead to overly weak conclusions (Heijungs et al. 2017).
For nonsampling methods, like the Gauss/Taylorbased analytical uncertainty propagation, the issue is more complicated. The point is that this method for uncertainty propagation gives an expression for the variance of the output variables (so here: \( {\sigma}_{Y_1}^2 \) and \( {\sigma}_{Y_2}^2 \)), but no expression for a covariance between them (like \( {\sigma}_{Y_1,{Y}_2} \)). Given the advantages of the analytical expressions over the timeintensive Monte Carlo method, here appears to be an important methodological gap that needs to be filled. Fortunately, there still seems to be progress in computation performance: Brightway2 claims to be able to do “more than 100 Monte Carlo iterations/second”.
As a final remark, observe that correlations between output variables may be the result of correlations between input variables, but not necessarily so. They can also occur when there is just one uncertain input, or when the inputs are uncorrelated. Further, while the danger of misrepresenting uncertainty of results is clear in case of correlated inputs, it is less clear if this is also the case for correlated outputs.
Conclusions
As we have now studied the three elementary cases, it is time to sum up and move on. Figure 2 summarizes the elements of addressing uncertainty in LCA.
The first element of the framework is Data collection. Correlations between input variables can be described by probability functions. Only the base case of the multivariate normal distribution is well known. It requires a covariance matrix with variances on the diagonal and covariances on the offdiagonal elements. Already for the noncorrelated case, variances are often crudely represented due to limited access to data, limited resources for a proper sampling design and data collection, and incomplete data handling and reporting. The pedigreebased approach has taken an intractable role here for providing surrogate variances. It is also questionable if such tricks will work for the much more challenging covariances. For example, if we have an LCA with 5,000 input variables (this is not exceptionally large: ecoinvent v3 exceeds this number), we need to specify 5,000 variances and no less than 12 million covariances. We also mention the problem that the conventional probability distribution for uncertain LCA data is not the normal but the lognormal probability distribution. Specifying correlations for nonnormal multivariate distributions is a much more complicated affair, with many open questions.
The second element of the framework is Propagation. Correlations between input variables on the one hand and output variables on the other hand can be taken into account with a few precautions, depending on the uncertainty propagation method. A samplingbased method such as the Monte Carlo simulation must take care to sample one full set of input variables and then calculate every model output, so for each indicator and for each product. Otherwise, the inflated error bars from Fig. 1 will ruin the analysis. Generating random numbers from a normal distribution including correlations is straightforward. Generating random numbers from a nonnormal multivariate distribution is again much more complicated. A more economical method for propagating uncertainties, such as the analytical expressions on the basis of Gaussian uncertainty propagation, needs to include the covariance structure of the input variables. This can in theory be done by adding more terms to the Taylor expansion:
A downside is again the unavailability of the covariance data, besides other downsides, for instance the limited validity of the expression, only for “small” uncertainties.
The third element of the framework is Interpretation. It concerns the process of interpreting, visualizing and deciding on the basis of correlated model output results. Correlations between output variables are almost automatic in a Monte Carlo setup, provided the previously mentioned precautions have been taken. For the analytical expressions, a major research gap has been identified: how to account for correlated results when using such methods. A practical basis for decision support and series of convenient presentations has been synthesized by Mendoza Beltrán et al. (2018). It is based on tables with pairwise comparative assertions, such as “product A beats product B” or “product A is significantly better than product B”. Such schemes may seem complicated to use in a comparison of more than two products, and this certainly is so when there are not only several products but also several criteria. Nevertheless, we think there is presently no way that better combines insight and transparency.
The final element of the framework is Communication. We will have to communicate our results carefully as they are uncertain. Thus, instead of concluding that A is better than B, we should state something like “with 95% certainty, A is better than B”. The best way to do so will of course depend on the audience. Product information for the general public requires another strategy than for highly specialized process engineers. For instance, for public communication purposes, a translation of probabilistic outcomes into digested information is needed.
Although we have the elements in place now, there are still huge challenges. Additional work is needed to operationalize and streamline them so that in the end they will become available through the existing LCA software packages to the entire community of LCA practitioners. And last but not least, we will have to collect the required data. But we need to go this way, raise the bar for LCA case studies, and not produce LCA results without uncertainties anymore.
References
Groen EA, Heijungs R (2017) Ignoring correlation in uncertainty and sensitivity analysis in life cycle assessment: what is the risk? Environ Impact Assess 62:98–109
Groen EA, Heijungs R, Bokkers EAM, de Boer IJM (2014) Methods for uncertainty propagation in life cycle assessment. Environ Model Softw 62:316–325
Heijungs R, Henriksson PJG, Guinée JB (2017) Precalculated LCI systems with uncertainties cannot be used in comparative LCA. Int J Life Cycle Assess 22:461
Henriksson PJG, Guinée JB, Heijungs R, de Koning A, Green DM (2014) A protocol for horizontal averaging of unit process data. Including estimates for uncertainty. Int J Life Cycle Assess 19:429–436
Henriksson PJG, Rico A, Zhang W, al Nahid SKA, Newton R, Phan LT, Zhang Z, Jaithiang J, Dao HM, Phu TM, Little DC, Murray FJ, Satapornvanit K, Liu L, Liu Q, Haque MM, Kruijssen F, de Snoo GR, Heijungs R, van Bodegom PM, Guinée JB (2015a) A comparison of Asian aquaculture products using statistically supported LCA. Environ Sci Technol 49:14176–14183
Henriksson PJG, Heijungs R, Dao HM, Phan LT, de Snoo GR, Guinée JB (2015b) Product carbon footprints and their uncertainties in comparative decision contexts. PLoS One 10:e0121221
Mendoza Beltrán MA, Heijungs R, Guinée JB, Tukker A (2016) A pseudostatistical approach to treat choice uncertainty: the example of partitioning allocation methods. Int J Life Cycle Assess 21:252–264
Mendoza Beltrán MA, Prado V, Font Vivanco D, Henriksson PJG, Guinée JB, Heijungs R (2018) Quantified uncertainties in comparative life cycle assessment: what can be concluded? Environ Sci Technol 52:2152–2161
Acknowledgements
The main ideas have been presented at the SETAC Europe conference in Rome, Italy, and at the LCA Food conference in Bangkok. This paper was initially selected for the special issue on Sustainable Food Production and Consumption, but the editor and reviewers observed that it was more appropriately placed as a general commentary and discussion article.
Author information
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Responsible editor: Mary Ann Curran
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Heijungs, R., Guinée, J.B., Mendoza Beltrán, A. et al. Everything is relative and nothing is certain. Toward a theory and practice of comparative probabilistic LCA. Int J Life Cycle Assess 24, 1573–1579 (2019) doi:10.1007/s1136701901666y
Received:
Accepted:
Published:
Issue Date:
Keywords
 Comparative analysis
 Correlations
 Life cycle assessment
 Uncertainty