1 Introduction

This chapter addresses how you can determine the analytical quality of your stable isotope tree-ring data for reporting in publications and with the data when made public. Jardine and Cunjak (2005) summarized the wide disparity in reporting analytical error of stable isotopic data in the ecological literature. We provide guidance on the necessary data and methods for estimating the uncertainty around stable isotope data for one’s research project. Our approach is from the perspective of the researcher designing a project that uses the stable isotopes contained within tree rings, rather than from the perspective of a laboratory conducting stable isotope analysis for a wide array of projects and individuals. Ultimately, the researcher is responsible for quantifying the quality of the data generated within the project and needs to report that uncertainty in final reports and published papers. Ideally, measures of uncertainty should stay with the data when the data are made publicly available.

1.1 What is QA /QC?

Quality assurance (QA) is the overall framework or plan that describes your experimental design to address your research questions and what steps you will take to detect and minimize errors in the data you are generating within the project. Quality control (QC) is the system of procedures or actual steps you take to quantify and minimize potential errors and assure data integrity. In other words, quality assurance is the plan you develop, whereas quality control is implementing that plan and documenting those steps. For example, your quality assurance plan might call for a certain frequency of sample duplication (see Sect. 6.2.3.1) to be included in each sample set analyzed on the Isotope Ratio Mass Spectrometers (IRMS) for calculating precision and that a QC standard be used in each sample set to test the accuracy of calibration procedure. The quality control would be the actual inclusion of those duplicates and QC standards, and then an estimation of precision with duplicates, and accuracy with the QC standard. A good quality assurance plan with appropriate quality control measures is a critical component of every study and needs to be developed at the very beginning of every project.

A QA plan should be developed around a project’s objectives and experimental design and include the information in the methods section of one’s publication(s). The QA plan is not static, it evolves as the project proceeds and is updated to address new or emerging issues not foreseen at the beginning of the project. Individual procedures used in the study such as methods for ring separation and cellulose extraction should be written into standard operating procedures (SOPs), which provide step-by-step instructions for carrying out each procedure. The purpose of SOPs is to ensure that everyone is following the same methods for every sample within the study, and to limit variation that may be introduced by different people conducting the procedure. All personnel conducting those procedures must be required to read and follow the study SOPs. Developing a good QA plan and SOPs is the first step to ensuring your data are of the quality you need to address your research objectives.

Quality control steps should be described in both the QA plan and in individual SOPs. Quality control steps would be measures that would allow for quantifying variation introduced by the procedure. For example, including a QC sample, such as a uniformly homogenized wood sample of sufficient volume for many samples, into each set of samples processed or analyzed together would allow for quantification of the uncertainty with sample processing and measurement (Porter and Middlestead 2012). This chapter will discuss many of the sources of uncertainty in tree-ring stable isotope analysis, and QC steps to assess the quality of your data even if you are not conducting the stable isotope measurements yourself.

1.2 Taking Ownership of Your Data Quality

The following are a set of questions to consider during development of your QA plan and develop QC steps to address them:

  • Can I detect the isotopic differences I am expecting with my study design?

This is the primary study question being asked of the isotopic data. Do you have sufficient replication and sample precision so that the signal variation within the study is much larger than the variation between sample replicates? In order to answer this question, you must understand some more basic levels of isotopic variation, and what might be causing that variation: signal variation is what you are hoping to show using stable isotopes, and noise-variation is unrelated to your study question which could be natural variation of the system or measurement noise. A good experimental design might be able to control for natural system variation, and good QA/QC should minimize the measurement noise.

  • What is the range of isotopic variation within my study?

Is your isotopic variation within the study sufficiently large relative to analytical error in order to detect biologically meaningful patterns? Are the isotopic values a useful measure to address the question you have? Is the variation between replicates (two trees from the same stand expected to show similar patterns) small relative to the variation across time? Generally, background literature of similar studies will provide some guiding expectations and will help in designing experiments that will maximize the signal around your study questions.

  • How variable are my samples (variation within a sample)?

This is an area of which you have control and relates to sample homogenization (see Sect. 6.3.1.1) and is measured by sample precision of duplicates (see Sect. 6.2.3).

  • Have my samples been isotopically altered since collection?

This question addresses potential problems with sample extraction methods or storage issues that alter the isotopic ratio of the sample in unintended ways. Clear and detailed SOPs will help ensure that all personnel are conducting analyses correctly and minimizing mistakes. The use of replicates, duplicates and a study standard can be useful in detecting any issues that do occur (see Sects. 6.2.3.1, 6.3.1.1 and 6.4.2.3).

  • Do the stable isotope values I received from a lab accurately reflect the isotopic values of the samples I submitted?

This question relates to whether the IRMS was functioning normally, and the samples were accurately calibrated to the international scale (see Sect. 6.3). Generally, you are not in control of how your samples are calibrated but obtaining information about the calibration standards and independent QC standards as well as your sample duplicates should enable you to answer this question by calculating the accuracy and precision as outlined below. We encourage you to compile your QC data from each set of samples submitted for analysis on the IRMS and calculate your own study accuracy and precision to be reported in your final manuscripts as well as kept with the data when made publicly available.

2 Measurements of Uncertainty

All measurements have some uncertainty around a reported value, and this uncertainty should be reported with the value to allow for accurate interpretation. In the Eurachem/CITAC guide (https://www.eurachem.org/index.php/publications/guides/quam), the term measurement ‘uncertainty’ is defined as the “parameter associated with the result of a measurement that characterizes the dispersion of the measured values that could reasonably be attributed to the measurand” (Ellison and Williams 2012). In this section, we outline the steps to accurately quantify uncertainty around tree-ring stable isotope values used in a research project.

2.1 Identical Treatment Principle

The Identical Treatment Principle (IT Principle) is the principle that standards (both for calibration and QC purposes) shall be treated the same as the samples (Werner and Brand 2001). Following this principle, the standards will go through the same transformations as the sample from cellulose to the final gas that is measured in the IRMS, and any alteration to the isotopic composition will affect both standards and samples equally. For example, calibration and QC standards (and preferably one or two cellulose standards) need to be transformed from sample matrix into the measured gases along with each set of samples analyzed on the IRMS for accurate isotopic measurements. Isotopic reference gas, which is injected directly to the IRMS without the transformation step, should never be used alone to establish the calibration. By using this principle, calibration standards should accurately correct for isotopic alterations during the transformation process. The IT Principle should be standard practice in all IRMS laboratories, but it can also be applied within a study to other transformation steps such as cellulose extraction (see Chap. 5). However, no wood standard with a certified isotopic ratio for cellulose exists, so applying the IT Principle to cellulose extraction must be modified. The key is to develop a study standard that can be used to assess the influence of the process or transformation that the samples go through (see Sect. 6.4.2.3 in this chapter). For example, Porter and Middlestead (2012) developed a study standard using a large amount of homogenized wood particles similar in particle size to their samples, and included one sample in every sample set extracted for cellulose within a study to account for the variance the extraction process may have imposed on their samples. While they didn’t know a priori the isotopic composition of cellulose within this wood study standard, they could calculate the uncertainty introduced by variation in cellulose extraction and isotopic measurement over the duration of their study. Another approach would be to construct an “artificial” wood sample mixing wood components (lignin, cellulose and other materials) using reagent grade cellulose where the isotopic composition could be measured prior to and after extraction (See Richter et al. 2009 for an example of an artificial leaf). For hydrogen isotopes in cellulose samples, hydrogen exchange of hydroxyl groups with local water vapor is another important isotopic transformation that must be corrected for using standards that have been treated identically to the samples (see Chap. 11). To apply the IT Principle, researchers should include some form of standards along with their samples whenever samples are transformed from their original state to the final gas introduced into the IRMS.

2.2 Accuracy

Accuracy is a measure of systematic bias and is calculated as the difference between a measured value from the “true” value (measured—true). For stable isotope measurements, accuracy is determined from a QC standard with a known isotopic value. A QC standard cannot be used for any calibration or normalization of the isotopic results but is used as an independent test of the calibration and normalization process (see Sect. 6.3). Generally, one or two QC standards are included into each set of samples analyzed by the IRMS, following the IT Principle.

A research study is generally composed of multiple sets of samples that have been measured on the IRMS over time, and thus contain multiple measurements of a QC standard (Table 6.1). To calculate accuracy across the entire study, researchers should calculate the average and the standard deviation of the difference value (measured—true). The average difference value is the systematic bias within the study, and if the isotope data are adequately calibrated, the average should be very close to zero. The standard deviation of the difference values is the random error around the accuracy measurement, and it should be similar to sample precision (see next section) or smaller. In Table 6.1, accuracy, or study bias is average difference (µ) and was estimated to be −0.01 ‰, while random error represented by the standard deviation (σ) was 0.12 ‰, which is typical for carbon isotope values of homogenized internal laboratory standards. If the study bias was larger than the standard deviation, then the study data could have a significant bias that the calibration procedures did not correct; however, a low number of QC samples could lead to a false indication of bias as well. For the example in Table 6.1, the accuracy of the study should be reported as −0.01 ± 0.12 ‰ SD.

Table 6.1 Example calculation of accuracy from QC standards from six IRMS sets of study samples. A set is a sequence of samples analyzed continuously on the IRMS, usually overnight. In this example, QC samples were analyzed at the 68th position in the sample set, and the 93rd position

2.3 Precision

Precision is the random error of the measurement for study samples, generally calculated as the standard deviation of repeated measures from study samples. Standard deviation is a statistical description of population variance and for precision, that population is repeated measures of the same study sample. One standard deviation away from the mean captures 68% of the observations (assuming a normal distribution), and two standard deviations capture 95% of the observations. Another common measurement for precision is the coefficient of variation (CV) which is calculated as σ/µ. However, because stable isotope measures are ratios (see Chap. 8) and are referenced to a standard (zero means the same as the scaling standard), CV should never be used for precision of stable isotope ratios.

Precision reported for a study should always be based on repeated measures of study samples and not from laboratory standards, or the QC standards. IRMS laboratories base their long-term analytical precision on those standards, but laboratory analytical precision is not the same as precision for a study. Precision based on study samples will include variance from sample preparation, and homogenization. If the repeated measures are from duplicates created during the tree-ring grinding process, then variation introduced from storage and cellulose extraction will also be included in the precision measurement (see Sect. 6.2.3.1).

Study precision reported in a paper should be based on the aggregate of repeated samples for an entire study and not just a typical value from the results of a single IRMS sequence, or from the long-term precision of laboratory standards from the laboratory conducting the analysis. For each set of samples analyzed on the IRMS, 2–3 samples are generally duplicated, and precision is determined from these for that set. This type of precision is a measure of repeatability: the agreement between measures of the same sample analyzed under the same conditions (the same operator, instrument, over a short period of time). To calculate precision of a study, these repeated measures need to be accumulated across sample sets, which is closer to a measure of reproducibility: the agreement between measures of the same sample analyzed under the different conditions. Equation 6.1 is used to calculate precision when samples that are duplicated don’t have identical isotopic values but span a range of isotopic values:

$$s=\sqrt{\frac{\sum {s}^{2}\left(n-1\right)}{\sum \left(n-1\right)}}$$
(6.1)

where s2 is the variance between the duplicates, and n is the number of times the sample was analyzed, generally two. In the case of a study standard (see Sect. 6.4.2.3) that is included in every sample set, n would be much larger equaling the number of sample sets in which the study standard was included. Equation 6.1 sums the variance of all duplicated samples, and weights them by the degrees of freedom (n − 1), before taking the square root to estimate standard deviation. Table 6.2 contains an example of how precision for a study should be calculated. Precision for this set of samples was ±0.06 ‰, slightly better than the uncertainty around the accuracy estimate in Table 6.1.

Table 6.2 Example calculation of precision for a study compiled from four sets of samples on an IRMS for δ13C. Three duplicates were analyzed within each sample set, and sample sequence indicates when in the order of samples that the duplicates were analyzed. Note that the first replicate within a sequence is split between the beginning (Sample Position 7) and the end of the IRMS set (Sample Position 92)

2.3.1 Duplication Versus Replication

Duplicate samples are not statistical replicates but are considered the same sample collected twice or more, and are used to calculate precision only. Duplicates are collected to answer the questions “how much variation exists within the same sample” and “have my samples been unintentionally fractionated since collection”. Tree-ring samples are often composed of combining a particular year (or sub-year) of growth from multiple cores collected from the same tree. These multiple core pieces are then homogenized to make a sample (see Chap. 4). For sample duplication, the sample should be split after homogenization, and stored in separate containers. Variance between these samples will include variance from lack of homogenization, cellulose extraction, sample storage, and analysis on the IRMS. Samples that are split just prior to analysis contain only the variance of homogenization and the analysis on the IRMS.

We recommend creating a duplicate for every twenty samples. Sample size for creating duplicates may be an issue depending on the growth rate of the sampled trees, and the sectioning requirements of the study (i.e. annual increments, separating late- and earlywood). In these cases, duplicates might be created for only larger growth rings, and a lower frequency of duplicates may be necessary.

Only one value from the duplicates should be analyzed in any statistical or other analysis as part of the study. Duplicates are for quality assurance only, and to use them in statistical analysis would be considered pseudo-replication (Hurlbert 1984). The duplicate value used should not be the mean of the two values as that value would be less variable than all the other samples that were not duplicated. Statistical replicates are independent samples collected from a population or group that is to be compared with another population or group. For tree-ring stable isotopes, individual trees are generally considered replicates, but that depends on the study objective.

2.4 Study Uncertainty and the Propagation of Error

In the examples from Tables 6.1 and 6.2 which are from the same study, accuracy was 0.02 ± 0.12 ‰, and our measured precision was ±0.06 ‰, which was lower than the uncertainty around the accuracy estimate. Both of these values should be reported in the methods section of a study, but the larger value of uncertainty is a better reflection of actual study uncertainty. For example, because the uncertainty around accuracy was larger than precision, it was a better reflection of the study uncertainty for comparing numbers within the study to each other.

Uncertainty is also used to determine the number of significant digits to report. Because the uncertainty within the study in Tables 6.1 and 6.2 was approximately 0.1 ‰ for δ13C, then values of δ13C reported for this study should not include digits below 0.1 ‰, even though values with more digits are often provided by output from the IRMS. We included more digits in Tables 6.1 and 6.2 so that variation between values were apparent.

For data included in meta-analysis, where the δ13C values are to be compared to δ13C values from other studies analyzed in other laboratories, then additional uncertainty must be considered. The values assigned to the QC standards have uncertainty around them, and that uncertainty must be propagated along with the uncertainty described above. The root mean squared error is used for propagating error (Eq. 6.2).

$${S}_{Meta}= \sqrt{{S}_{QC}^{2}+{S}_{Study}^{2}}$$
(6.2)

If the assigned values of the QC standards had uncertainties of 0.2 ‰ when calibrated to the internationally certified standard reference material (see Sect. 6.4.2), and the study uncertainty was 0.12 ‰, then the combined uncertainty using Eq. 6.2 would be 0.23 ‰. This uncertainty value should be used with the data for any meta-analysis across studies as it reflects the uncertainty to the international scale (see Sect. 6.3.3). The patterns and trends within a single study with samples all calibrated to the same standards in the same way by the same lab have inherently less uncertainty (0.12 ‰ in this case) relative to each other, as compared to samples that originated from different studies with different calibration procedures and standards.

3 IRMS Errors and Calibration

Errors introduced into isotopic analysis can come from sample collection, cross-dating, sub-sectioning cores, sample handling, extraction processes, from the sample conversion into gases introduced to the IRMS, or from analysis on the IRMS directly. These errors have random and systematic components. Random error cannot be corrected and is the major component of uncertainty. Systematic errors can potentially be corrected with accurate calibration if standards were also subjected to the same process. Errors from sample collection, cross-dating, sub-sectioning cores, handling and extraction are mostly random and cannot be corrected. However, both sample conversion to gases and isotopic analysis of the gases contain both elements of error.

The IRMS and associated peripherals (i.e. elemental analyzers,) need to be optimized for precision and accuracy for the samples being measured, thus laboratory accuracy and precision can vary dramatically depending on instrument maintenance and attention to measurement details. For example, sample volumes need to be within the linear working range of the instrument when the ratio output to input signal is constant and in a direct proportion over the range of instrument voltage output generated by the samples. The introduced gases need to produce appropriate peak shapes. The reasons for increasing errors in isotopic analysis are extensive, and can be related to the ionizing process in the ion source of the mass spectrometer or due to problems with the conversion of the sample to the measuring gas. We advise carefully selecting an IRMS laboratory with experience in the type of analysis required for a study, and with documented and defensible QA/QC procedures. In addition, we advise including an independent QC standard with a known isotopic value with each set of samples if possible. Study standards (see Sect. 6.4.2) can be that independent QC standard if its isotopic value is from multiple IRMS labs.

Measurements of δ13C of wood or cellulose and δ18O of cellulose are relatively standard isotopic measurements that can be made with high precision and accuracy (see Chaps. 9 and 10). Both wood and cellulose have consistent stoichiometry for C and O, thus sample weights can be optimized for isotopic analysis. However, measurements of δ15 N and δ2H in tree rings can be particularly challenging. For δ15 N, the challenge comes from the low amount of nitrogen contained in wood compared to the abundance of carbon (C:N ratio of ~300, see Chap. 12), and for δ2H, the problem is from isotopic exchange of hydroxyl H atoms on cellulose with the last water in which the sample was in contact (see Chap. 11). For measuring δ15 N in wood samples, CO2 volumes far exceed N2, and large sample sizes are required to obtain enough N atoms for accurate analysis. As a result, laboratories need to take special action to ensure the large sample is completely converted to CO2 and N2 and that the volume of CO2 from the previous sample does not interfere with the N2 peak of the current sample for accurate measures of δ15 N (e.g. Brooks et al. 2003). Most laboratories making δ15 N analysis on wood will optimize for δ15 N, and not measure δ13C on the same sample, and thus, require two separate analyses to provide δ13C and δ15 N values. Below, we briefly describe some of the potential errors and calibration procedures so that readers understand some of the complexity into making accurate and precise isotopic analysis.

3.1 Random Measurement Error

Random errors influence the measurement in unpredictable ways, hence showing no statistical pattern, and thus cannot be corrected. For example, unknown parameters in the IRMS such as electronic noise in the detector can introduce random error into the isotopic measurement. This random variance is quantified within the precision measurements described above. Increases in random error within and between sample sets analyzed on the IRMS are often a sign of needed maintenance for the IRMS, as they are often caused by instrument parameters influencing the robustness and ruggedness of an isotopic measurement method. Increased random error must be recognized by the IRMS operator, and the causal factor repaired to improve precision. Laboratories conducting IRMS measurements should monitor the long-term stability of the measurement system or method, by regular measurement of QC standards in every sample set. The long-term instrument stability should be visualized in quality control charts showing QC standard variation over time to allow detection of possible systematic deviations, or increased noise. When such problems are noticed, laboratory personnel should have rules of action for maintenance and troubleshooting possible problems. IRMS laboratories can vary greatly in the degree of quality assurance exercised contributing to overall uncertainty in isotopic analysis. The analysis of duplicate samples and QC standards over time should allow for quantification of random error in precision and accuracy measures described above.

3.1.1 Errors from Sample Tracking, Preparation, and Homogenization

Sample tracking, homogenization and preparation can cause significant random error if good QA practices are not followed. The preparation of tree-ring samples for isotopic analysis requires many steps, each of which can introduce error and mistakes. The process of accurately dating cores, subdividing cores, and combining increments from multiple cores from a single tree into a single sample leads to many points where irreversible mistakes can happen. Duplicates will not capture this type of error because duplicates are generally created after these steps. However, replicate trees within a study design will. Because of sample sizes or cost of analysis, some studies end up pooling replicate trees but doing so can mask these types of errors. Liñán et al. (2011) found good agreement between individual tree and samples pooled across trees, but sacrificed the ability to quantify uncertainty between trees. Sample homogenization and extraction will also lead to random variance, but adequate duplication will account for this error. As IRMS technical advancements allow for analysis on smaller and smaller samples, homogenization becomes more and more important to reduce variance between duplicate samples. Borella et al. (1998) recommended particle sizes of 0.1 mm or smaller for samples weighing 1.5 mg (~250 particles per sample) in δ13C analysis. The nested nature of tree-ring stable isotope analysis, with many years (or sub-years) of samples from multiple trees can also lead to sample tracking problems, particularly if samples are moved through multiple containers for grinding and extraction. A good quality assurance plan should develop a system to make sure that samples are uniquely identifiable so that they don’t get easily confused with one another, allowing for accurate chain of custody and traceability for all samples.

3.2 Systematic Measurement Error

Systematic errors influence the accuracy of the result in the same direction, in a reproducible fashion. Generally, they can be detected and corrected by using calibration standards (see Sect. 6.3.3). Systematic errors are normally caused by erratic instruments, wrong handling of instruments, or changing environmental conditions during analysis of a sample set on the IRMS. Systematic errors are classified into two types: offset and scale factor errors (Fig. 6.1). Offset errors are considered constant across the range of isotopic values being measured. However, the offset may not be constant over time, and is then known as instrument drift. Drift can be defined as a slow change over time of an output signal to the same input parameter. Within the IRMS, drift can occur because the ionizing conditions in the ion source of the mass spectrometer are not constant over time. While a constant offset can be detected by calibration standards measured at one time within the sample set, drift can only be detected and corrected by calibration standards that were analyzed throughout the entire sequence of samples (see below).

Fig. 6.1
figure 1

Typical calibration for adjusting measured sample values to the correct δ scale illustrating the two types of systematic error affecting δ13C. The intercept represents the offset correction error, and the slope the scale factor error. The gray dashed line is a 1:1 line between measured and calibrated isotopic values

Scale factor errors are associated with measurement compression or expansion of the actual range in isotopic values, and can be corrected by measuring calibration standards that span an appropriate isotopic range, ideally a range larger than the isotopic range of samples being analyzed, and that brackets the sample measurements. Scale factor errors (often scale compression effects) in isotope ratio mass spectrometry are often caused by mass discriminatory effects (the heavy isotope moves more slowly than the light isotope) during the transport of the sample gases towards the ion detection in the mass spectrometer (Meier-Augenstein and Schimmelmann 2019). Examples of mass discriminatory effects are peak tailing on GC columns and adsorption in the ion source. Large compressions of the isotopic scale can lead to lower accuracy because the relative variation between samples is reduced, and the overall signal is reduced relative to the noise of the measurement. Scale-factor errors influence the slope of the calibration curve (Fig. 6.1).

3.3 Calibration

Proper calibration of isotopic measurements is essential to providing accurate results. In analytical sciences, the term “calibration” describes a series of operations that connect the measured value of a sample with its real analytical value (true value) using calibration standards that have followed the IT Principle (see Sect. 6.2.1). Calibration standards are certified standard reference materials (SRMs) with known composition (true value including a statement on the measurement uncertainty if applicable). Mathematical equations relating the difference between the certified SRMs measured values and the analytical “true” value are used to correct the measured sample values to deliver accurate results in a reproducible fashion (Fig. 6.1).

Isotopic analysis and data calibration must be performed in an accurate and reliable way under specified conditions (constant environmental conditions like temperature, air humidity etc.), and using certified SRMs spanning a quantified range of isotopic values. As mentioned above, the IRMS needs to be optimally adjusted for accurate and precise measurements, and the amount of sample introduced needs to be within the linear working range of the instrument. Introducing sample volumes that are too small or too large can produce results outside the limits of detection or quantification. Thus, the IRMS and peripheral equipment need to be routinely maintained in working order for samples to be accurately calibrated, correcting for offset problems, and adjusted to the appropriate δ scale.

The first step in calibration is to correct the data for systematic offset errors that change over time or with sample volume such as drift. For sample drift, calibration standards need to be analyzed throughout the set of samples. A systematic long-term drift might be correctable if the trend can be mathematically estimated, whereas a highly variable drift is not correctable and becomes random error, and an increase in measurement uncertainty. Once these systematic drift errors are corrected, the data can then be calibrated to the appropriate isotopic δ scale. Additionally, separating duplicates within the sample set such that one is analyzed early, and one is analyzed towards the end of a set will help capture the variance not corrected by drift corrections into the precision estimate (Table 6.2).

The last step in the calibration process is calibrating to the appropriate international scale (Sect. 6.4.2.1). The isotopic δ scale is determined by two primary scale-defining international standards (Table 6.3), where one standard sets the zero ‰ value, and the second sets the ‰ distance. For example, Vienna Standard Mean Ocean Water (V-SMOW) is defined as 0 ‰ and Standard Light Antarctic Precipitation (SLAP) has the δ2H value of −428 ‰, so the distance between these two measurements is 428 ‰, setting the hydrogen isotopic δ scale. All standard reference materials need to be calibrated to these scale-defining international standards for isotopic measurements to be meaningful and comparable across laboratories, studies and through time.

Table 6.3 International scales for isotope ratio determination (Accepted isotope ratios and R values for certified SRM can change over time. The most current values can be obtained from the CIAAW website: www.ciaaw.org)

This final scaling correction can contain both scale-factor and offset errors and requires at least two or more independent calibration standards with isotope ratios that span the isotopic range of sample values but have similar chemical properties. In the example in Fig. 6.1, three calibration standards were used that spanned 22.1 ‰ in the Vienna Pee Dee Belemnite (VPDB) δ13C scale. The measured span between the calibration standards was only 20 ‰, indicating scale compression error. The correction equation slope reflects this compression with a slope greater than 1, and thus would stretch the data back to the correct δ value (Coplen 1988), and this stretching increases the error associated with each measurement in the set. The measured δ values were also higher than the actual δ values leading to an intercept of −0.58 ‰ which represents the offset factor. Three calibration standards were used in this example, and one independent QC sample that was not used to calculate any correction equations, and is the only standard used to calculate accuracy (see Sect. 6.2.2). The advantage of three calibration standards that span the range of samples is that it would be apparent if one standard has become compromised for any reason. This would be an additional QC check on the calibration equation.

3.3.1 Calibration Error

Improper and inconsistent calibration of data to the correct δ scale is major source of error and data discrepancy between laboratories (Wassenaar et al. 2018; Meier-Augenstein and Schimmelmann 2019). Wassenaar et al. (2018) found that laboratories reporting inaccurate data during a round-robin comparison had four common problems: incorrect data calibration to the δ scale (sometimes called normalization), insufficient coverage of the δ-scale, instrument problems, and/or compromised standard reference material. While their study compared laboratories analyzing water samples, similar problems can be found in laboratories measuring isotopic ratios of other materials. This potential difference between laboratories measuring the same sample is concerning for studies that switch laboratories during a study, or for meta-analysis of isotopic data. For studies that need to switch IRMS laboratories mid-study, we highly recommend conducting repeat analysis at both IRMS laboratories for 10% of samples spanning a range of isotopic values. The researcher should calculate any systematic differences between the laboratories, and correct all data to one laboratory scale. In addition, the researcher should quantify the remaining random differences (after correcting for systematic differences) between the 10% repeated measures using Eq. 6.1 and include that uncertainty using Eq. 6.2. Anyone conducting meta-analysis using stable isotope data from tree rings needs to be aware of this problem as well and increase the uncertainty around the compiled data to reflect the actual reproducibility rather than the repeatability, which is precision based on results from a single laboratory over a short period of time.

4 Traceability and Standards

4.1 Traceability

Traceability is defined as “the value of a standard, whereby it can be related to stated references, usually national or international standards, through an unbroken chain of comparisons all having stated uncertainties” (ISO 15189). For isotope ratio mass spectrometry without real SI units, traceability implies a connection of all measured δ values to an internationally accepted measurement scale related to a scale defining primary reference material (de Bièvre et al. 1997). This unbroken calibration chain must be connected to carefully measured and estimated uncertainties of all involved intermediate materials and used procedures. The combined uncertainty of the whole comparison chain will define the quality of the connection to the internationally agreed scale unit (Table 6.3) and consequently will be also a quality factor for the measured values themselves. To arrive at a combined uncertainty budget (see Sect. 6.2.4 on uncertainty) for any certified reference material, the uncertainty value must include the uncertainty around the international certified standard reference materials that were used in the calibration as well as the uncertainty estimated from all the steps in the isotopic measurement (from conversion to gases for analysis, to the actual measurement of the ion current ratios of a sample, to the reported δ values). This list can contain typical laboratory corrections like 17O correction, linearity, blank, drift, memory and/or offset correction as well as normalization (Dunn et al. 2015). Every certified reference material should have its isotopic value and associated uncertainty reevaluated and connected back to internationally certified scale-defining standard reference material regularly, and this uncertainty should be propagated forward to the measurement of samples. Ideally, studies should report the certified reference materials used in calibration and associated values, as the assigned values to certified reference materials can change over time.

Traceability also relates to the chain of custody and treatment of samples from collection through analysis to the final data analysis and publication (see Sect. 6.3.1.1). Significant error and uncertainty can be introduced without a proper plan for tracking samples from collection to publication of the data. Assigning sample IDs that avoid confusion and errors is critical, and with the nested nature of tree-ring samples, can be very challenging. Additionally, samples go through many steps from collection to accurate dating of rings, to separating rings, to homogenization to cellulose extraction to weighing for isotopic analysis. Sample identity and accurate sample tracking needs to be maintained throughout all these steps. Details of assigning sample IDs and sample tracking should be thought out carefully and explained clearly in a QA plan and appropriate SOPs.

4.2 Types of Isotopic Standards for Tree-Ring Analysis

Three classes of standards are used in isotopic studies: international certified standard reference materials (SRM) (Sect. 6.4.2.1), laboratory standard reference materials (Sect. 6.4.2.2) and study standards (Sect. 6.4.2.3). International-certified SRM are reference materials with certified accuracy and uncertainty including a stated confidence level and supplied by supranational organizations like the International Atomic Energy Agency (IAEA) in highly controlled and limited quantities. Laboratory SRMs are SRMs used in day-to-day operation of the IRMS for calibration and QC. These laboratory SRMs must be routinely calibrated to the appropriate δ scale using international certified SRM and assigned a level of accuracy and uncertainty to their assessed isotopic value. Study standards are large quantities of homogenized material in which the isotopic value may or may not have been previously determined. These standards are used to check for consistency and estimating of processing influences on study precision.

4.2.1 International Certified Standard Reference Material

Measured isotope ratios are typically reported as the deviation from a primary reference material that defines the zero point for a respective scale for each element. These primary SRMs are then scale defining international standards (Table 6.3) and are often defined without uncertainty. Using carbon as an example, the scale-defining primary international SRM is Vienna Pee Dee Belemnite (VPDB, Table 6.3), thus a measured ratio of 13C/12C (13Rsample) would be compared to the 13C/12C of VPDB (13RVPDB, Eq. 6.3).

$$\updelta _{{{\text{C}}_{{{\text{VPDB}}}} }}^{13} = \frac{{{}^{13}{\text{R}}_{{{\text{Sample}}}} - {}^{13}{\text{R}}_{{{\text{VPDB}}}} }}{{{}^{13}{\text{R}}_{{{\text{VPDB}}}} }} = \frac{{{}^{13}{\text{R}}_{{{\text{Sample}}}} }}{{{}^{13}{\text{R}}_{{{\text{VPDB}}}} }} - 1$$
(6.3)

The resulting δ value is typically reported in parts per thousand (‰) by multiplying by 1000. This “‰ unit” is not compatible with the SI system; in recent years the term milli-Urey (mUr) as an attributed SI unit replacing “‰” was suggested (Brand and Coplen 2012). The storage and distribution of international certified scale-defining SRMs are under control of the Internationa Union of Pure and Applied Chemistry (IUPAC), Commission for Isotope Abundances and Atomic Weights (CIAAW, www.ciaaw.org, Brand et al. 2014), and can be ordered from IAEA only once in 3 years per lab.

Interestingly, the corresponding isotope ratios of the international scale-defining primary SRMs are not always known exactly. The original reference material Pee Dee Belemnite (PDB) acting as scale anchor for 13C/12C and 18O/16O since the 1950s was used up a long time ago. To preserve the scale in a consistent way, the virtual standard VPDB was defined via the reference material NBS 19. The original Standard Mean Ocean Water (SMOW) for 18O/16O and 2H/1H virtually did not exist but was defined via reference material NBS-1. As a replacement of the exhausted NBS-1, a large batch of water with an isotopic composition close to SMOW was prepared and termed “VSMOW”, with the V standing for Vienna where IAEA is located. Recently, a new reference water set VSMOW-2 and SLAP-2 was produced to be available as SRM set for the oxygen and hydrogen isotope scale. VSMOW-2 and SLAP-2 are now associated with an uncertainty relative to VSMOW and SLAP. AIR-N2 serves as scale anchor for 15N/14N measurements.

To overcome the constraints of limited supply of the primary SRMs and to ensure a more general applicability of SRMs, several materials of natural and/or synthetic origin have been produced to act as secondary SRMs. These reference materials must satisfy special conditions:

  • Homogeneous material in the isotope range used for measurements and certified as homogeneous to a certain sample size.

  • High purity of the reference material. No extra purification needed when SRM is used with described dedicated analysis techniques. Single chemical compound preferred.

  • Stable and inert material. No need for special treatment during storage or handling of the SRM (no reaction “autodecomposing” with water, air-O2 under normal conditions). Non-hygroscopic.

  • No change of isotope ratio when stored or handled properly under normal conditions. In case SRM should work for oxygen and hydrogen isotope ratios: no or low (and documented) exchangeable hydrogen and/or oxygen atoms in the SRM molecule.

  • Easy handling of SRM (no autodecomposition, no “electrostatic” property, not toxic or explosive). Easily replaceable when exhausted.

  • SRM with chemical form similar to samples available (carbonate for carbonate analysis, water for water analysis, organic when organic samples should be analyzed).

  • Identical chemical form preferably with different isotope ratios available (calibration).

The δ values of these internationally available secondary SRM are derived from careful calibration versus the primary scale-defining SRMs. For this reason, all secondary SRMs have uncertainties assigned with the internationally adopted and agreed δ values, which can change over time. All these properties are also desirable for any reference material used for calibration or QC in isotopic analysis. However, not all certified SRMs can be accurately measured on all peripheral devices used for IRMS analysis. For example, Schimmelmann et al. (2016) found that δ2H values of caffeine cannot be measured accurately in a TC/EA, but need a chromium reactor, thus would be an inappropriate SRM for TC/EA analysis. Nitrogen bearing organic compounds interfere with δ2H analysis during pyrolysis (Nair et al. 2015). The CIAAW provides a list of certified SRM with their isotopic values (http://ciaaw.org/reference-materials.htm).

4.2.2 Laboratory Standard Reference Materials

Because of the limited availability of International certified scale-defining and most secondary SRMs, IRMS laboratories use “laboratory SRMs” for calibration and quality control during routine isotopic analysis. IRMS laboratories must calibrate all of their laboratory SRMs with international certified SRMs. Many laboratories have a large range of laboratory SRMs because the chemical properties of the SRMs used for correcting and calibrating isotopic measurement should be as similar as possible to the samples under investigation following the IT Principle (Werner and Brand 2001; Brand et al. 2009). To the extent possible, these laboratory SRMs should be composed of material that adheres to the special conditions listed above for international certified SRMs. For tree-ring cellulose analysis, IAEA-CH-3 is a certified cellulose standard that laboratories can use. Laboratories are always searching for reference materials that can span a range of isotopic values for proper calibration across the relevant δ scales (see Sect. 6.3). To that end, Qi et al. (2016) developed three whole-wood reference materials that span a range of isotopic values for δ2H, δ18O, δ13C and δ15 N, and they provide the fraction of exchangeable hydrogen for each. These would be ideal laboratory SRMs for tree-ring studies for both calibration and QC. However, they were not certified for their cellulose isotopic values, so they would require in-house calibration with other international SRMs before being used for the IT Principle through cellulose extractions. In addition to IAEA-CH-3, certified wood samples are an excellent start at providing appropriate SRM for isotopic analysis for any study using wood or cellulose from tree rings.

4.2.3 Study Standards

Study standards provide additional information to assess error or for calculating precision for a study (Table 6.2). A study standard is a bulk sample of similar chemical composition as the samples within the study. For tree-ring analysis, this would be a large highly homogenized wood sample that can be used as a QC sample throughout all the steps of preparation such as cellulose extraction (Porter and Middlestead 2012). QC standards added by the IRMS laboratory during isotopic analysis will quantify the accuracy of isotopic analysis, but would not capture the uncertainty associated with sample processing or storage. Sample duplicates will capture some processing and storage uncertainty, but duplicates are often processed and analyzed together in the same sample set through cellulose extraction to analysis on the IRMS. A study standard will allow for assessing error across the entire study. By including one or more study standards into each set of samples that are processed as a unit throughout the study, inter-set variation can be quantified and study standards can potentially identify batches that were compromised during processing (Porter and Middlestead 2012) Comparing a single result with a longer record of data can additionally help researchers to detect isotope fractionation problems during processing of tree-ring samples.

5 Conclusions

A proper quality assurance plan along with proper quality control steps and measures should be a part of every research project. Our purpose in this chapter was to provide tools for researchers to quantify the accuracy and precision of their isotopic data within tree-ring studies, although many of the principles apply to all studies. We discussed some of the challenges with making high quality isotopic measurements, but this chapter was not intended to guide anyone through the details of how to properly operate an IRMS. Instead, we discussed these challenges to highlight how critical it is to use a high-quality laboratory for conducting stable isotope analysis, and how critical instrument condition and proper use of standards are to accurate and precise isotopic data. The use of duplicates and study standards gives researchers the ability to quantify the precision of their analysis as outlined in Sect. 6.2. Following the principles outlined in this chapter, researchers should be able to design a quality assurance plan to minimize potential errors in tree-ring isotopic analysis and quantify the uncertainty around the isotopic values in their study for reporting in the final published manuscripts and to provide with the data when made public. In the methods and materials section of manuscripts, researchers should include at a minimum the accuracy as µ ± σ of the QC standards (Table 6.1) and the precision based on sample duplicates (Table 6.2) along with the number of QC standards and duplicates, the standard reference material used for the QC standard, and the isotopic range of calibration standards used during analysis.