1 Introduction

Uncertainty in LCA is pervasive, and it is widely acknowledged that uncertainty analyses should be carried out in LCA to grant a more rigorous status to the conclusions of a study (ISO 2006, JRC-IES 2010). The most popular approach for doing an uncertainty analysis in LCA is the Monte Carlo approach (Lloyd and Ries 2007), partly because it has been implemented in many of the major software programs for LCA, typically as the only way for carrying out uncertainty analysis (for instance, in SimaPro, GaBi, Brightway2, and in openLCA).

The Monte Carlo method is a sampling-based method, in which the calculation is repeated a number of times, in order to estimate the probability distribution of the result (see, e.g., Helton et al. 2006, Burmaster and Anderson 1994). This distribution is then typically used to inform decision-makers about characteristics, such as the mean value, the standard deviation or quantiles (such as the 2.5 and 97.5 percentiles). In LCA, the results are typically inventory results (e.g., emissions of pollutants) or characterization/normalization results (e.g., climate change, human health, etc.). In comparative LCA, such distributions form the basis of paired comparisons and tests of hypothesis (Mendoza Beltran et al. 2018). Many programs and studies offer or present visual aids for interpreting the results, including histograms and boxplots (Helton et al. 2006; McCleese and LaPuma 2002).

A disadvantage of the Monte Carlo method is that it can be computationally expensive. Present-day LCA studies can easily include 10,000 or more unit process, and calculating such as system can take some time. Repeating this calculating for a new configuration then takes the same time, and this is repeated a large number of times. Finally, the stored results must be analyzed in terms of means, standard deviations, p values and visual representations. Altogether, if we use the symbol Nrun to refer to the number of Monte Carlo runs, the symbol Tcal for the CPU time needed to do one LCA calculation, and Tana for the time needed to process the Monte Carlo results, the total time needed, Ttot, is simply

$$ {T}_{\mathrm{tot}}={N}_{\mathrm{run}}\times {T}_{\mathrm{cal}}+{T}_{\mathrm{ana}} $$

Usually, Tcal > Tana and certainly Nrun × Tcal ≫ Tana, so that we can write

$$ {T}_{\mathrm{tot}}\approx {N}_{\mathrm{run}}\times {T}_{\mathrm{cal}} $$

and further ignore the aspect of Tana.

The time needed for a Monte Carlo analysis is thus determined by two factors: Tcal, which is typically in the order of seconds or minutes, and Nrun. A normal practitioner has little influence on Tcal, as it is dictated by the combination of algorithm, the hardware, and the size of the database. Typically, it is between 1 s and 5 min. (This is a personal guess; there is no literature on comparative timings using a standardized LCA system). A practitioner has much more influence on the number of Monte Carlo runs, Nrun. So, the trick is often to take Nrun not excessively high, say 100 or 1000. On the other hand, it has been claimed that this number must be large, for instance 10,000 or even 100,000. For instance, Burmaster and Anderson (1994) suggest that “the analyst should run enough iterations (commonly ≥10,000),” and the authoritative Guide to the Expression of Uncertainty in Measurement (BIPM 2008) writes that “a value of M = 106 can often be expected to deliver [a result that] is correct to one or two significant decimal digits.” In the LCA literature, we find similar statements, for instance by Hongxiang and Wei (2013) (“more than 2000 simulations should be performed”) and by Xin (2006) (“[it] should run at least 10,000 times”). Such claims also end up in reviewer comments: We recently received the comment “Monte Carlo experiments are normally run 5000 or 10,000 times. In the paper, Monte Carlo experiments are only run 1000 times. Explain why?”. With the pessimistic Tcal = 5 min, using Nrun = 100,000 runs will require almost 1 year. If we take the short calculation time of Tcal = 1 s, we still need more than one full day. And, even Brightway2’s (https://brightwaylca.org/) claim of “more than 100 Monte Carlo iterations/second” (of which we do not know if this also applies to today’s huge systems) would take more than 16 min. Such waiting times may be acceptable for Big Science, investigating fundamental questions on the Higgs boson or the human genome. But, for a day-by-day LCA consultancy firm, even 1 h is much too long.

In this study, we investigate the role of Nrun. We will in particular focus on the original purpose of the Monte Carlo technique vis-à-vis its use in LCA, and consider the fact that in LCA, the input probability distributions are often based on small samples, or on pedigree-style rules-of-thumb, as well as the fact that in LCA, we are in most cases interested in making comparative statements (“product A is significantly better than product B”).

The next section discusses the elements of the analysis: the mathematical model and its probabilistic form, the description of probabilistic (“uncertain”) data, the estimation of input data, and the estimation of output results. Section 3 provides two numerical examples. Section 4 finally discusses and concludes.

2 Probability theory

In this section, we discuss a few background topics from probability theory. The interested reader is referred to general textbooks, such as Ross (2010) and Gharamani (2005).

2.1 Mathematical models

When a model needs several input variables to compute an output variable, we can abstractly write the model relation as

$$ y=f\left({x}_1,{x}_2,\dots \right) $$

Here, x1, x2, … represent the values of the input variables (the data, for instance CO2 coefficients and electricity requirements) and y is the output (the result, for instance a carbon footprint). The function f(·) is a specification of the LCA algorithm (Heijungs and Suh 2002). We will assume that this algorithm is known and fixed, and that it has been implemented in software in a reliable way and therefore does not introduce any uncertainty (however, see Heijungs et al. 2015).

2.2 Probabilistic models

Uncertainty can enter the scene in different ways:

  • When the input data is not exactly known (for instance, the effect of glyphosate on human health is not fully known)

  • When the input data displays variability (for instance, the lifetime of identical light bulbs is not exactly equal)

  • When choices must be made by the analyst (for instance, allocation factors can be based on mass or on economic value)

Sometimes, additional sources of uncertainty are mentioned (Huijbregts 1998), such as model uncertainty. Here, we restrict the discussion to those types of uncertainty that can be phrased as inputs (x1, x2, etc.) in the model equation (f(·)). Our analysis can, however, easily be broadened to cover such cases. For instance, we can include allocation choices as an extra input parameter into f(·). (Heijungs et al. 2019).

2.3 Probability distributions of input variables

In a probabilistic model, we can specify the input data as a probability distribution (continuous or discrete). So, from now on, we will assume that x1, x2, … are not fixed numbers, but that they are stochastic (random) numbers, following some probability distribution. We will use the convention from probability theory to indicate stochastic variables with capital letters, like X1, X2, … Further, the symbol ~ indicates that a stochastic variable is distributed according to some probability distribution. For instance,

$$ \left\{\begin{array}{l}{X}_1\sim N\left({\mu}_{X_1},{\sigma}_{X_1}\right)\\ {}{X}_2\sim N\left({\mu}_{X_2},{\sigma}_{X_2}\right)\\ {}\cdots \sim \cdots \end{array}\right. $$

where N(μ, σ) is the normal (Gaussian) probability distribution with parameters μ and σ. We might go for other probability distributions (uniform, log-normal, binomial, etc.) but at this stage want to keep the discussion simple. The numbers that specify the numerical details of the probability distribution (here μ and σ in general, and more specifically \( {\mu}_{X_1} \), \( {\mu}_{X_2} \), \( {\sigma}_{X_1} \), \( {\sigma}_{X_2} \), etc.) are referred to as parameters. So, not x1 is a parameter (as the usual terminology in LCA goes), but rather \( {\mu}_{X_1} \) and \( {\sigma}_{X_1} \) are parameters of the distribution of X1. Other types of distributions are usually specified with different types of parameters (for instance, the uniform distribution with a parameter for the lower limit and a parameter the upper limit) or even with another number of parameters (for instance, the Poisson distribution requires only one parameter, while the asymmetric triangular distribution requires three parameters).

2.4 Probability distributions of output variables

Recognizing that (some of) the input parameters of the model f(·) are stochastic, a logical consequence is that the model output is also stochastic. Thus, we write

$$ Y=f\left({X}_1,{X}_2,\dots \right) $$

See Heijungs et al. (2019). With this change of y into Y, our task shifts from calculating the value of y to calculating the distribution of Y. More specifically, we may want to know:

  • The shape of the distribution of Y (i.e., normal, uniform, log-normal, binomial, etc.)

  • The value or values of the parameter or parameters (e.g., μY and σY)

Probability theory offers methods to calculate the probability distribution of Y when those of X1, X2, … are given, but only for a few cases of f(·) and only for a few input distributions. For instance, when Y = f(X1, X2) = X1 + X2 and X1 and X2 are normal, every textbook shows that

$$ Y\sim N\left({\mu}_{X_1}+{\mu}_{X_2},\sqrt{\sigma_{X_1}^2+{\sigma}_{X_2}^2}\right) $$

In words, the sum of two normal variables is itself normally distributed, and the parameters μY and σY can easily be calculated from the parameters of the input distributions. Another case is \( Y=f\left({X}_1,{X}_2\right)={X}_1^2+{X}_2^2 \). This is pretty complicated, but when we take the special case of \( {\mu}_{X_1}={\mu}_{X_2}=0 \) and \( {\sigma}_{X_1}={\sigma}_{X_2}=1 \), it is a well-known result:

$$ Y\sim {\chi}^2(2) $$

where χ2(ν) is the chi-squared distribution with parameter ν. In general, most choices of f(·) with less trivial combinations of X1, X2, … (such as \( f\left({X}_1,{X}_2\right)={X}_1{X}_2^2+\frac{\ln {X}_1}{4+\sin {X}_2} \)) are not manageable by the theory of probability. It is therefore important to have an alternative way to determine the probability function of such more complicated functions of stochastic variables. The same applies also to situations where f(·) is straightforward, but where the input distributions for X1, X2, etc. are not normal.

The Monte Carlo approach (Metropolis and Ulam 1949; Shonkwiler and Mendivil 2009) can be used as an alternative way for constructing the probability distribution of Y in case the mathematical approach is too hard. It is based on artificially sampling values from Y, and using this sample for reconstructing (the technical term is estimating) the shape and the parameter values of Y. We will spend the next section on the topic of estimating a probability distribution from a sample of values. This is a topic of more general interest than Monte Carlo simulations, so we will keep the discussion quite general, also covering the case of estimating the distribution of input variables like X1 and X2.

2.5 Estimating a probability distribution in general

We will discuss the question of estimating a probability distribution Z (including its parameters), given a sample of data, z1, z2, …, zn. This task is known as the estimation problem, and it is one of the central topics of inferential statistics. See, for instance, Rice (2007) and Casella and Berger (2002) for general textbooks.

Suppose we have a sample of data from an unknown stochastic process, Z. Let the sampled values be indicated by zi, for i = 1, …, n. If we want to estimate the probability distribution belonging to the stochastic process that generated this sample, we must first make an assumption about the type of distribution. Is it a normal distribution, a uniform distribution, a log-normal distribution, a Weibull distribution? This choice is one of the trickiest parts of the entire estimation process, because there is no clear guidance. Different aspects can play a role here:

  • Evidence: the data (e.g., a histogram or a boxplot) may suggest a certain distribution.

  • Conventions and compatibility with software: the log-normal distribution has a longer and more widespread history in LCA than the Erlang distribution.

  • Familiarity and simplicity: if the histogram looks approximately bell-shaped, a normal distribution is more natural than the Cauchy distribution.

  • Statistical criteria: we can use statistical tests (such as those by Kolmogorov-Smirnov and Anderson-Darling) to assess the goodness-of-fit with a number of probability distributions.

Clearly, there are also cases where none of the conventional model distributions provides a satisfactory fit with the empirical data. We will not further discuss such cases, because the usual procedure in LCA is to model input uncertainties in terms of just a few distributions: lognormal, normal, uniform, or triangular (Frischknecht et al. 2004) or perhaps a few more (gamma and beta PERT; see Muller et al. 2016).

Once we have selected a probability distribution, the next task is to estimate the parameter value or values of that distribution. Suppose we have selected a normal distribution, so

$$ Z\sim N\left({\mu}_Z,{\sigma}_Z\right) $$

where μZ and σZ are the distribution’s parameters, which are still unknown at this stage of the analysis. Then, our task is to estimate the values of μZ and σZ that correspond best with the sampled data. Different estimation principles are available in the statistical literature to do this. Two widely used principles are the method of moments and the method of maximum likelihood. For the case of a normal distribution, these two principles yield the same estimate of μZ and σZ, but for some distributions, there is a difference in the outcome of the estimation procedure. Anyhow, the theory of statistics offers formulas for estimators, which are functions of the observations. We can use the symbol of the parameter to be estimated with a hat on top of it to indicate the estimator: \( \hat{\mu} \) is an estimator of μ and \( \hat{\sigma} \) is an estimator of σ. In the case of a normal distributions, both estimation principles (method of moments and method of maximum likelihood) suggest

$$ {\hat{\mu}}_Z=\frac{1}{n}\sum \limits_{i=1}^n{Z}_i $$

and

$$ {\hat{\sigma}}_Z=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({Z}_i-{\hat{\mu}}_Z\right)}^2} $$

as estimators for μZ and σZ. When applied to a concrete data set, z1, z2, …, zn, these estimators produce a concrete value, because we insert the observed values of zi at the place of the stochastic variable Zi. These concrete values are the estimates, which we will indicate hereafter as \( \overline{z} \) and sZ.

Of course, we cannot expect that the estimates will be fully accurate if the sample size is finite. The estimate \( \overline{z} \) will be hopefully close to the true value μZ, but probably it will be a little bit off (that is also why we distinguish the symbols: in general \( \overline{z}\ne {\mu}_{\mathrm{Z}} \), but \( \overline{z}\approx {\mu}_{\mathrm{Z}} \)). The same applies to the estimate sZ of σZ.

The theory of inferential statistics not only allows to estimate the values, but it also allows us to say something about the level of precision of such estimates. This is done through the theory of sampling distributions, standard errors, and confidence intervals.

A sampling distribution is the probability distribution of an estimator. Let us suppose we have a probability distribution Z ∼ N(μZ, σZ), with unknown parameter μZ and known parameter σZ, from which we sample n observations, and use the estimator \( {\hat{\mu}}_{\mathrm{Z}} \) to estimate μZ by the value \( \overline{z} \). If we would take another sample of size n, we can use the same estimator to again estimate μZ, but we will find a slightly different value \( \overline{z} \), because the sample will contain different values. Repeating and repeating, always with the same sample size n, we will end up with a distribution of \( \overline{z} \) values. This distribution will be referred to as \( \overline{Z} \).

The famous central limit theorem states that the distribution of the estimates of the mean, \( \overline{Z} \), is normally distributed and that there is a simple relation between its parameters (\( {\mu}_{\overline{Z}} \) and \( {\sigma}_{\overline{Z}} \)) and the parameters of the parent distribution Z (μZ and σZ):

$$ \overline{Z}\sim N\left({\mu}_{\mathrm{Z}},\frac{\sigma_{\mathrm{Z}}}{\sqrt{n}}\right) $$

So, \( {\mu}_{\overline{\mathrm{Z}}}={\mu}_{\mathrm{Z}} \) and \( {\sigma}_{\overline{\mathrm{Z}}}=\frac{\sigma_{\mathrm{Z}}}{\sqrt{n}} \). This first fact signifies that the mean of the sample means corresponds to the mean of the parent distributions. This is a convenient property, because it allows to use the sample mean (\( \overline{z} \)) as the best guess of μZ. The second fact tells us that the width of the distribution of \( \overline{Z} \) (so \( {\sigma}_{\overline{\mathrm{Z}}} \)) depends on the width of the distribution of Z (so on σZ) and on the size of the sample (so on n). In fact, \( {\sigma}_{\overline{\mathrm{Z}}} \) decreases without limits when n increases. The important consequence is that the estimate of μZ, \( \overline{z} \), is more precise when n is large and that we can determine its value as precisely as we want by just increasing sample size. The larger the sample, the more precise the estimate.

The quantity \( {\sigma}_{\overline{\mathrm{Z}}}=\frac{\sigma_{\mathrm{Z}}}{\sqrt{n}} \) is known as the standard error of the mean, also known as “the” standard error. For a precise estimation of μZ, we want this \( {\sigma}_{\overline{\mathrm{Z}}} \) to be small. The only way to do so is to use a large sample size n, because σZ is fixed. The standard error is related to the concept of a confidence interval. For the case of estimating μZ, the 95% confidence interval is given by

$$ C{I}_{\mu_{\mathrm{Z}};0.95}=\left[\overline{z}-1.96{\sigma}_{\overline{\mathrm{Z}}},\overline{z}+1.96{\sigma}_{\overline{\mathrm{Z}}}\right] $$

This means that with 95% confidence, the interval CI will contain the true value μZ that we are supposed to estimate by \( \overline{z} \). Observe that the confidence interval has a width of \( 2\times 1.96{\sigma}_{\overline{Z}}=3.92{\sigma}_{\overline{Z}}=3.92\frac{\sigma_Z}{\sqrt{n}} \). If we want this interval to be smaller, we need to increase sample size n.

Above, we discussed how to estimate the parameter μ when the parameter σ is known. Estimation of σ and other parameters, and estimation of μ when σ is unlnown, are technically more difficult, but conceptually the idea is the same.

2.6 Estimating the probability distribution of input variables

When we want to estimate the probability distribution of an input variable (X1, etc.), we carry out the following steps:

  • We sample data (x11, x12, …, x1n) from the phenomenon (e.g., unit process).

  • We choose a convenient probability distribution shape (e.g., normal).

  • We use the formulas for the estimators (\( {\hat{\mu}}_{X_1} \), \( {\hat{\sigma}}_{X_1} \), etc.) to find estimates (\( \overline{x_1} \), \( {s}_{X_1} \), etc.).

The estimated parameter values (\( \overline{x_1} \), \( {s}_{X_1} \), etc.) are “best guesses” given the available data. However, we cannot expect that they are perfect estimates, because the confidence interval of these parameters decreases with \( \frac{1}{\sqrt{n}} \), and n is usually limited. Of course, we can increase n by collecting more primary data, but site visits and measurements are usually expensive and time-consuming. For that reason, in LCA, as in most other fields of science, n is usually quite limited. The price we pay for that is a larger standard error and a wider confidence interval.

2.7 Estimating the probability distribution of output variables, given perfectly known inputs

Next, we move to the topic of estimating the probability distribution of an output variable (Y, etc.). Suppose, for simplicity, we have one stochastic input variable, X, normally distributed, with known parameters:

$$ X\sim N\left({\mu}_X,{\sigma}_X\right) $$

Next, we define a very simple function of that variable:

$$ Y=f(X)=X $$

Of course, the distribution of the output variable Y is trivial:

$$ Y\sim N\left({\mu}_X,{\sigma}_X\right) $$

and in particular, μY = μX. But, let us pretend we are bad in probability theory and prefer to use a Monte Carlo approach. We simulate Nrun instances of X (namely \( {x}_1,{x}_2,\dots, {x}_{N_{\mathrm{run}}} \)) and use that to calculate Nrun instances of Y (namely y1 = x1, y2 = x2, etc.). These values of y are used to estimate μY as follows:

$$ \overline{y}=\frac{1}{N_{\mathrm{run}}}\sum \limits_{i=1}^{N_{\mathrm{run}}}{y}_{\mathrm{i}} $$

When the sample has been obtained in a random way, we can also be sure that the estimate will converge to the correct value:

$$ \underset{N_{\mathrm{run}}\to \infty }{\lim}\overline{y}={\mu}_{\mathrm{Y}}={\mu}_{\mathrm{X}} $$

Likewise, we can estimate the standard deviation of Y, σY. This can be used to find the standard error of the mean

$$ {s}_{\overline{\mathrm{Y}}}=\frac{s_{\mathrm{Y}}}{\sqrt{N_{\mathrm{run}}}} $$

The noteworthy aspect of this standard error is that it will go to zero when Nrun grows very large:

$$ \underset{N_{\mathrm{run}}\to \infty }{\lim }{s}_{\overline{\mathrm{Y}}}=0 $$

As a consequence, the estimate of μY will become arbitrarily precise, if we have enough computer time:

$$ \underset{N_{\mathrm{run}}\to \infty }{\lim }C{I}_{\mu_{\mathrm{Y}};0.95}=\left[{\mu}_{\mathrm{Y}},{\mu}_{\mathrm{Y}}\right]=\left[{\mu}_{\mathrm{X}},{\mu}_{\mathrm{X}}\right] $$

That is not surprising. If we would have been more thoughtful, we could have saved the computer expenses and directly deduce that μY = μX, with infinite precision. The situation is comparable to computing \( \frac{1}{2}+\frac{1}{4}+\frac{1}{8}+\frac{1}{16}+\dots \), for a large number of terms, or being more thoughtful and directly writing this as \( \frac{\frac{1}{2}}{1-\frac{1}{2}}=1 \). Both approaches yield approximately the same result. So, when we want to use a Monte Carlo approach to estimate the parameters of a probability distribution, we must use a large sample size Nrun to find a reliable answer. The recommendations quoted in the introduction (1000, 10,000, 100,000) are based on the situation described here: accurately estimating an output distribution on the basis of perfect knowledge of the input distributions.

2.8 Estimating the probability distribution of output variables, given imperfectly known inputs

But now, take the next case, a normal distribution with parameters μX and σX, but under the provision that μX itself is slightly off, because we did not know μX but used its imperfect estimate \( \overline{x} \). So, we consider

$$ X\sim N\left(\overline{x},{\sigma}_{\mathrm{X}}\right) $$

Next, we again study the trivial function

$$ Y=f(X)=X $$

first analytically, using probability theory, and then through a Monte Carlo simulation.

Analytically, we find

$$ Y\sim N\left(\overline{x},{\sigma}_{\mathrm{X}}\right) $$

The essential point to observe is that the mean of Y is not μX but \( \overline{x} \), which is likely to be somewhat wrong.

Next, let us try this by a Monte Carlo simulation. We use \( \overline{y} \) to estimate μY. It will be close to \( \overline{x} \), rather than close to μX. Moreover, the standard error of this estimate is still \( {s}_{\overline{Y}}=\frac{s_{\mathrm{Y}}}{\sqrt{N_{\mathrm{run}}}} \), so as close to 0 as we like. In fact,

$$ \underset{N_{\mathrm{run}}\to \infty }{\lim }C{I}_{\mu_{\mathrm{Y}};0.95}=\left[\overline{x},\overline{x}\right] $$

Summarizing, using probability theory and using the Monte Carlo approach, both will give you the wrong value (\( \overline{x} \) instead of μX) when estimating μY, and the Monte Carlo approach will in addition suggest that this estimate is very precise due to a vanishing standard error, at least when Nrun.is very large.

Observe that this is not a mistake or limitation of the Monte Carlo approach. In fact, it performs very well. The mistake is entirely due to the analyst, who uses an imperfectly estimated input parameter (\( \overline{x} \) instead of μX) to run an infinite-precision method. Also, observe that this is a very ubiquitous situation in LCA: Most LCA data on unit processes is obtained from limited samples. Even a sample size of 1 is not uncommon. There is even a widely used approach, referred to as the pedigree approach and popularized by the ecoinvent database, of which the purpose is to estimate a probability distribution on limited data (Frischknecht et al. 2004; Weidema et al. 2013). We devote a longer discussion to this problem toward the end of this paper.

3 Numerical illustration

To test and illustrate these ideas, we did two simulation experiments, first for one stand-alone system, and then for two systems in a comparative analysis.

To illustrate the situation for one system, we made a small code in R (Fig. 1) and used it to simulate the following case:

  • The parent distribution is X ∼ N(10, 1).

  • We sample n = 16 observations, and estimate μX by \( \overline{x} \).

  • We draw from \( Y\sim N\left(\overline{x},{\sigma}_{\mathrm{X}}\right) \) a Monte Carlo sample of size Nrun = 100,000.

  • From this sample, we estimate μY by \( \overline{y} \).

Fig. 1
figure 1

R code for generating a large Monte Carlo sample (Nrun = 100,000) from an input distribution with limited precision

In our simulation, the results were as follows:

  • \( \overline{x}=10.31 \), \( {\sigma}_{\overline{\mathrm{X}}}=0.25 \), so the 95% confidence interval for μX is [9.819, 10.799].

  • \( \overline{y}=10.31 \), \( {\sigma}_{\overline{\mathrm{Y}}}=0.0031 \), so the 95% confidence interval for μY is [10.305, 10.318].

The interpretation of these results are as follows:

  • We misestimate μX (10.31 instead of 10.00).

  • But, we acknowledge that it may be wrong, and in fact, our 95% confidence interval contains the correct value (it suggests a value somewhere between 9.8 and 10.8).

  • We misestimate μY (10.31 instead of 10.00).

  • But, we deny that it may be wrong, because our 95% confidence interval is pretty sure about a value somewhere 10.30 and 10.32.

In conclusion, the Monte Carlo approach will yield a very precise, but inaccurate, result.

The precision of an estimate plays an important role in testing statistical hypotheses. When we would like to test a statement like μX = 10, the null hypothesis significance testing procedure would not reject the null hypothesis, because the hypothesized value of 10 is in the 95% confidence interval [9.819, 10.799]. On the other hand, the same procedure when applied to the null hypothesis μY = 10 would lead to a rejection, because 10 is not in the 95% confidence interval [10.305, 10.318].

The second example is about two systems, A and B, in a comparative LCA: Seemingly precise estimates of the impact of products A and B can lead to the conclusion that A is better than B, while the real situation is that B is better than A. Or we find that A is better than B, although they do not differ. To test and illustrate this phenomenon, we made another computer experiment (Fig. 2). We generate n = 16 samples from XA ∼ N(10, 1) and XB ∼ N(10, 1). From these two samples, we estimate \( {\mu}_{X_{\mathrm{A}}} \) through \( {\overline{x}}_{\mathrm{A}} \) and \( {\mu}_{X_{\mathrm{B}}} \) through \( {\overline{x}}_{\mathrm{B}} \) and do a two-sample t test to test the hypothesis \( {\mu}_{X_{\mathrm{A}}}={\mu}_{X_{\mathrm{B}}} \). Next, we use Y1 = f(X1) = X1 and Y2 = f(X2) = X2, and sample Nrun = 100,000 values from YA and YB. From this Monte Carlo sample, we test the null hypothesis \( {\mu}_{Y_{\mathrm{A}}}={\mu}_{Y_{\mathrm{B}}} \). The p value of the first test was 0.67 providing strong evidence of equality of \( {\mu}_{X_{\mathrm{A}}} \) and \( {\mu}_{X_{\mathrm{B}}} \). The second test yielded a p value around 10−16, pointing to overwhelming evidence that \( {\mu}_{Y_{\mathrm{A}}}\ne {\mu}_{Y_{\mathrm{B}}} \).

Fig. 2
figure 2

R code for testing the hypothesis of equality of means in the input data X1 and X2, generated from small samples (\( {n}_{X_A}={n}_{X_B}=16 \)), and of equality of means in the output results Y1 and Y2, generated with a large Monte Carlo sample (Nrun = 100,000)

This comparative case is even more interesting than the first example, because decisions about purchases, ecolabels, etc. are often taken on the basis of comparative assessments: Is there evidence that one product is significantly better than another product? Statistical hypothesis testing can provide an answer to such questions, but the example shows that inaccurately specified parameters of the parent distributions may give a seemingly convincing wrong answer, because an excessive number of Monte Carlo runs will optimize precision, ignoring inaccurate inputs.

4 Discussion and conclusions

Let us be a bit more explicit on the terminology: An estimate can be imprecise or it can be inaccurate. The two have been illustrated in various ways (Fig. 3). In our analysis of example 1, we have an inaccurate estimate (\( \overline{y} \) can be off quite a bit due to small n in determining \( \overline{x} \)) with arbitrary high precision (\( {\sigma}_{\overline{\mathrm{Y}}} \) is almost zero due to very large Nrun). By reporting a very small standard error of the mean, we suggest to have done a high-quality calculation.

Fig. 3
figure 3

Illustration of the difference between precision and accuracy. The left figure illustrates both; the middle one is an example of low precision and the right one is an example of low accuracy. Source: https://en.wikipedia.org/wiki/Accuracy_and_precision

The discussion above took a very trivial function, namely Y = f(X) = X as starting point. The storyline is no different for more complicated cases, such as \( Y=f\left({X}_1,{X}_2\right)={X}_1{X}_2^2+\frac{\ln {X}_1}{4+\sin {X}_2} \) or for functions of hundreds of input distributions Y = f(X1, X2, …). Likewise, we used a normal distribution with known standard deviation to start with. If the standard deviation is unknown, or if the parent distribution is of a different type (log-normal, binomial, ...), the mathematics is more difficult, but the take home message remains the same: with an imprecise estimate of the input parameters, we can make a very precise but probably inaccurate estimate of the output parameters. Garbage in, garbage out, but the type of garbage has changed: from imprecise to inaccurate. That is a problem, because imprecision is visible through a large standard error of the mean (\( \overline{x}=10.31\pm 0.25 \)), while inaccuracy is not visible (\( \overline{y}=10.31\pm 0.0031 \)). As a result, the estimate will suggest to be of high quality where it is not.

Superficially, it sounds better to make precise statements than imprecise statements. But, when the statements are on inaccurate values, this is not necessarily true.

In a statistical analysis, we can always draw wrong conclusions (type I errors: not rejecting an incorrect null hypothesis, type II errors: rejecting a correct null hypothesis), but this is a completely different type of error: rejecting a null hypothesis for which we have no appropriate data. The root of the problem is that we sample from inaccurately specified distributions. While we would naively expect that this leads to inaccurate results, the statistical analysis neglects the inaccuracy and concentrates on the precision. The imprecision declines with the number of Monte Carlo runs, but the inaccuracy does not. And, imprecision is visible, while inaccuracy is invisible.

The remedy is to maintain the imprecision in the estimate of the input parameters. As long as the parameters of the input distributions are imprecise, we should not be allowed to decrease the precision of the output distribution estimates without limits. How can this be done? One simple way is to put an upper limit to the number of Monte Carlo runs. If the estimate of the input parameter μX is based on a sample of n = 16 data points, perhaps we should not do more than Nrun = 16 Monte Carlo runs. While this sounds fair, a complication is that we need more guidance on the case of more complicated functions than just Y = X, for instance \( Y={X}_1{X}_2^2+\frac{\ln {X}_1}{4+\sin {X}_2} \). If X1 has been sampled with \( {n}_{X_1}=16 \) and X2 with \( {n}_{X_2}=9 \), what should we take for the number of Monte Carlo runs, Nrun? Perhaps the weakest link defines our maximum quality, so our Monte Carlo run could do with just 9 runs in this case. The result is a very imprecise estimate of μY, but visibly imprecise. The solution of taking a small number of Monte Carlo runs by the way also solves the problem of overly significant results (Heijungs et al. 2016).

Another remedy is of course to determine the parameters of the input distributions with more precision, so using a larger sample size \( {n}_{X_1} \), \( {n}_{X_2} \), etc. In practice, this is, however, not easy. Many of the millions of data in the LCA model come from general purpose generic databases, and recollecting these data from multiple sites and at multiple days would be a horrendous task.

A final point is the case of probability distributions with parameters that have not been estimated from data, but for which a procedural estimation has been used. An important example is the earlier-mentioned pedigree approach, where data quality indicators, for instance for representativeness and age, define default standard deviations. The popular ecoinvent database is a major example here (Frischknecht et al. 2004; Weidema et al. 2013), but the approach is also becoming popular in other areas (Laner et al. 2016). For such data, it is often unclear what the sample size of the data is, so it is not possible to estimate the precision of the mean in terms of a standard error. But, it will be clear that the parameters of the input distribution are not at all accurate, so a propagation into almost infinitely precise Monte Carlo output results is as misleading as the parameter-based procedure on which our main argument was based. An ultimate consequence is that such pedigree-based probability distributions are incompatible with large-scale Monte Carlo simulations. This is an important take-home message of our analysis, because the pedigree approach has grown into a major paradigm for estimating standard deviations of LCA data, and Monte Carlo has become the default procedure for propagating uncertainties in LCA. The incompatibility of the two has, as far we know, not been recognized before, and our analysis does not suggest any way out. This suggests a major area of research in dealing with uncertainty in LCA.