Skip to main content

Beyond pedigree—optimizing and measuring representativeness in large-scale LCAs



Data sampling strategies in large-scale life-cycle assessments (LCAs) are often developed informally based on a combination of sector expertise, common sense, resource restrictions, and politics. The assessment of the representativeness of the acquired sample is then assessed ex post in a qualitative fashion or using a semi-quantitative approach based on pedigree matrices. The purpose of this paper is twofold: provide a structured framework for both designing a representative sample for these types of studies as well as for assessing the representativeness of the sample one was able to obtain.


For sample design, we propose the use of proportionate stratified sampling after defining the strata through the identification of those population characteristics that are able to introduce a relevant bias to average specific environmental burdens of the product system under study. For assessment of the final representativeness of the acquired sample, we propose a metric based on the weighted average deviation between population and sample based on the identified strata for technological and geographical representativeness and another metric for temporal representativeness that is based on a weighting scale applied to the years that data was collected from.

Results and discussion

The proposed approach is pragmatic and practical and helps to improve representativeness compared to simple random sampling. The general principles can inform the discussions about how many and which sites to sample even if detailed data on the composition of the population is missing. Its key strength is that it is not a one-size-fits-all methodology, but that it can and needs to be adapted to the product system under study, which in return requires the transparent documentation of all rationales and value choices along the way.


The proposed approach provides practitioners with a flexible framework to plan data collection in a way that increases representativeness compared to simple random sampling. The representativeness can be quantified and discussed using a defined scale that is based on quantitative measures rather than based on qualitative descriptions or pedigrees. If the underlying rationales and value choices are transparently documented and justified, the framework can help to improve how representativeness of primary data is addressed in large-scale LCAs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. To further simplify this task for the practitioner, a variety of online calculators are available that will calculate the sample size n based on a specific population size N and the desired confidence level and margin of error, for example (Creative Research Systems 2012; National Statistical Service 2016).

  2. We recommend the use of k-means clustering as it is relatively easy to apply. However, other clustering algorithms may be more appropriate in some cases.


Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Christoph Koffler.

Additional information

Responsible editor: Adisa Azapagic

Electronic supplementary material


(XLSX 502 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Koffler, C., Shonfield, P. & Vickers, J. Beyond pedigree—optimizing and measuring representativeness in large-scale LCAs. Int J Life Cycle Assess 22, 1065–1077 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Data collection
  • Data quality
  • Representativeness
  • Sampling strategies
  • Stratified sampling