1 Introduction

A proxy is an observable variable, X, that can be utilized to predict an unobservable variable, Y. An ideal relation for the task of prediction is a mathematical one-to-one relation between X and Y, where one X value corresponds to one unique Y value, and vice versa (Kotz et al. 1986, p. 323 therein). The simplest and easiest to interpret one-to-one relation is a linear model,

$$\begin{aligned} Y = \beta _0 + \beta _1 \cdot X, \end{aligned}$$
(1)

where \(\beta _0\) and \(\beta _1\) are the intercept and slope parameter, respectively.

Proxy models are widely employed in natural sciences, social sciences, medicine, and engineering (Cronin 2010; Mahnken et al. 2014; Falcy et al. 2016). The application in the present paper is from paleoclimatology. Here, natural archives (e.g., marine sediments or corals) are sampled, and proxy variables (e.g., chemical concentrations or isotopic compositions) are measured on the samples to predict climate variables (e.g., precipitation or temperature) back in time—over thousands and millions of years, when no devices (e.g., rain gauges or thermometers) were available (Bradley 1999). Typically for paleoclimatology, but for other application fields as well, the proxy relation is established for the recent time period, where paired data for X and Y are available. The determined proxy relation is then called a calibration, and the estimated calibration parameters (\(\beta _0\) and \(\beta _1\) in the linear case) are taken to predict the response variable, Y, through observations of the predictor variable, X.

In the real world, however, a perfect calibration model is never realized, since the calibration data are affected by observational errors (McClelland et al. 2021). In the case of the paleoclimatic application (Sect. 4), the Y data (temperature) of the calibration show errors stemming from the imperfect device (thermometer, which also may not always be available on-site but only somewhere nearby) and the X data (oxygen isotopic composition) reveal fluctuations due to the counting errors in the mass spectrometer and other sources (e.g., imperfect standard material used by the device, biological effects in the sample, or chronological uncertainties of the archive).

Observational error is usually described by a zero-mean stochastic process, and in climate time series analysis (Mudelsee 2014) it has become practice to decompose the process into a variability component (possibly dependent on time, T), such as \(S_X(T)\), which is multiplied by a zero-mean and unit-standard deviation noise process, for example, \(X_\text {noise}(T)\); the combined error process is then given by \(S_X(T) \cdot X_\text {noise}(T)\), and the time-dependent standard deviation of that process equals \(S_X(T)\). If one recognizes that in practice, the observations are available not at continuous time, T, but rather at a finite number, n, of time points, \(T(i), i = 1, \ldots , n\), and if one further considers that the response variable, Y, in the calibration period is also made with observational errors, then one arrives at the real-world (i.e., with noise) linear calibration model,

$$\begin{aligned} Y(i) = \beta _0 + \beta _1 \cdot \left[ X(i) - S_X(i) \cdot X_\text {noise}(i) \right] + S_Y(i) \cdot Y_\text {noise}(i). \end{aligned}$$
(2)

Here, i is a counter which runs from 1 to n, the size of the bivariate calibration time series sample \(\left\{ t(i), x(i), y(i)\right\} _{i=1}^{n}\), and the discrete notation, such as X(i) for X(T(i)), is a handy abbreviation. We note that the temporal spacing, which is given by \([t(i) - t(i-1)], i = 2, \ldots , n\), need not be constant. Note also that this paper follows statistical convention and denotes a theoretical process with a capital letter, such as X(i), and an observed value with a small letter, such as x(i).

Our statistical task is estimation, namely, to determine \(\beta _0\) and \(\beta _1\) on the basis of \(\left\{ t(i), x(i), y(i)\right\} _{i=1}^{n}\). Since the calibration observations are influenced by errors, the parameter estimates (\({\widehat{\beta }}_0\) and \({\widehat{\beta }}_1\)) are not exactly the same as the true values, and the “hat” reminds the researcher of that fact. We note that the noise term, \(S_Y(i) \cdot Y_\text {noise}(i)\), also accommodates proxy errors. For example, in coral paleoclimatology, the predictor variable oxygen isotopic composition may be influenced not only by the desired response variable sea-surface temperature (SST) but also by other climate variables such as the oxygen isotopic composition of seawater, a parameter that is closely related to salinity (Brocas et al. 2019). The problem of predictor noise also plagues climate modeling. For field reconstructions, two versions of a climate variable (e.g., air temperature) are related to each other: one version is the instrumental measurement, the other the climate model output. (See, for example, Mann et al. (2007); Ammann et al. (2010) and Tingley et al. 2012, Sect. 6 therein.)

The calibration model (Eq. 2) is called an errors-in-variables regression, and because there is predictor noise (i.e., \(S_X(i) \ne 0\)), the estimation of the calibration parameters \(\beta _0\) and \(\beta _1\) is not straightforward (Mudelsee 2014). In particular, ignored predictor noise may lead to a negative bias of the absolute value of the slope estimate, \(\beta _1\). This would further mean that in the case of a paleoclimatic coral proxy record, the inferred temperature amplitudes would be too small. Section 2 presents two powerful (i.e., unbiased) procedures for estimation and prediction; one applies to the situation of homoscedastic errors (both \(S_X(i)\) and \(S_Y(i)\) constant), the other to the more difficult situation of heteroscedastic errors (\(S_X(i)\) or \(S_Y(i)\) not constant). Section 3 introduces a bootstrap resampling technique that is able to obtain uncertainty measures that are also reliable in the presence of non-Gaussian distributional shapes and autocorrelations in the noise components \(X_\text {noise}(i)\) and \(Y_\text {noise}(i)\)—features that are rather the norm than the exception in climate sciences (Mudelsee 2014). Section 4 illustrates the mathematical methods for estimation, prediction, and uncertainty determination on paleoclimatic data from the coral archive. The conclusions (Sect. 5) are directed to the practitioner who has to find the optimal calibration line in the presence of predictor noise. The Appendix gives a brief description of the linear calibration software, LINCAL, which is used to implement the concepts described in the present paper.

2 Estimation and Prediction

An estimation is a mathematical procedure, a recipe, to determine the parameters, \(\beta _0\) and \(\beta _1\), of the errors-in-variables calibration model (Eq. 2) on the basis of a time series sample, \(\left\{ t(i), x(i), y(i)\right\} _{i=1}^{n}\). A prediction is the estimation for a new predictor value, \(x(n + 1)\), of the response value, \({\widehat{y}}(n + 1)\), which is achieved by means of the parameter estimates, \({\widehat{\beta }}_0\) and \({\widehat{\beta }}_1\), via the formula

$$\begin{aligned} {\widehat{y}}(n + 1) = {\widehat{\beta }}_0 + {\widehat{\beta }}_0 \cdot x(n + 1). \end{aligned}$$
(3)

Typically, one uses many \(x(n + 1)\) values over a large range to construct a calibration curve.

For the homoscedastic error type, this paper investigates the ordinary least-squares (OLS) estimation procedure, which is enhanced by a bias correction (Sect. 2.1). For the heteroscedastic error type, it employs a weighted estimation procedure, which allows more accurate data points (i.e., with smaller standard deviation) to contribute more strongly to the estimation than less accurate data (Sect. 2.2). A crucial point is that—irrespective of whether there is homo- or heteoroscedasticity—some prior knowledge about the size of the standard deviation(s), \(S_X(i)\) and \(S_Y(i)\), has to be available. The temporal aspect is relevant only for uncertainty determination (Sect. 3), not for estimation. This means that the estimation procedures presented herein are also applicable to experimental situations where no time values, t(i), are available. These four cases—homoscedastic/heteroscedastic errors and availability/unavailability of time values—are also treated separately in LINCAL software (Appendix). We briefly mention two other procedures (Sect. 2.3) which are occasionally found in practical applications. However, both have exhibited poor performance as regards the major target set in the present paper, namely, the delivery of an unbiased calibration slope.

2.1 Ordinary Least-Squares Estimation with Bias Correction

Assume homoscedasticity, that is, \(S_X(i) = S_X\) is a constant. The simple OLS estimation minimizes the unweighted sum of squares,

$$\begin{aligned} S\hspace{-0.08em}S\hspace{-0.05em}Q(\beta _0, \beta _1) = \sum _{i=1}^{n} \left[ y(i) - \beta _0 - \beta _1 \cdot x(i) \right] ^2. \end{aligned}$$
(4)

This yields the estimators

$$\begin{aligned} {\widehat{\beta }}_{1, \text {OLS}}&= \left\{ \left[ \sum _{i=1}^{n} x(i) \right] \cdot \left[ \sum _{i=1}^{n} y(i) \right] \Big / n - \sum _{i=1}^{n} x(i) \cdot y(i) \right\} \nonumber \\&\qquad \qquad \times \left\{ \left[ \sum _{i=1}^{n} x(i) \right] ^2 \Big / n - \sum _{i=1}^{n} x(i)^2 \right\} ^{-1} \end{aligned}$$
(5)

and

$$\begin{aligned} {\widehat{\beta }}_{0, \text {OLS}}&= \left[ \sum _{i=1}^{n} y(i) - {\widehat{\beta }}_{1, \text {OLS}} \sum _{i=1}^{n} x(i) \right] \Big / n. \end{aligned}$$
(6)

However, if the noise components in the calibration model, \(X_\text {noise}(i)\) and \(Y_\text {noise}(i)\), are statistically independent—which is typically the case since the observations usually stem from two independent sources—and also of zero mean and Gaussian shape, then standard textbooks (Draper and Smith 1981, Sect. 2.14 therein) explain that the slope estimator (Eq. 5) is biased downwards as \(E\left( {\widehat{\beta }}_1\right) = \kappa \cdot \beta _1\), where E is the expectation operator and \(\kappa \le 1\) is the attenuation factor. The latter is given by

$$\begin{aligned} \kappa = \left( 1 + {S}_X^2 \big / V\hspace{-0.20em}AR\left[ X_\text {true}(i) \right] \right) ^{-1}, \end{aligned}$$
(7)

where \(V\hspace{-0.20em}AR\) is the variance operator, and \(\left\{ X_\text {true}(i)\right\} _{i=1}^{n}\) are the true (but observed only with error) predictor points. From error propagation (Mudelsee 2014) it follows that \(V\hspace{-0.20em}AR[X(i)] = V\hspace{-0.20em}AR[X_\text {true}(i)] + S_X^2\). This, finally, leads to the unbiased slope estimator and intercept estimator,

$$\begin{aligned} {\widehat{\beta }}_{1, \text {OLSBC}}&= {\widehat{\beta }}_{1, \text {OLS}} \big / \left\{ 1 - S_X^2 \big / V\hspace{-0.20em}AR\left[ X(i) \right] \right\} , \end{aligned}$$
(8)

and

$$\begin{aligned} {\widehat{\beta }}_{0, \text {OLSBC}}&= \left[ \sum _{i=1}^{n} y(i) - {\widehat{\beta }}_{1, \text {OLSBC}} \sum _{i=1}^{n} x(i) \right] \Big / n, \end{aligned}$$
(9)

respectively. The acronym OLSBC stands for OLS with bias correction. It is evident that bias correction for the slope (Eq. 8) requires prior knowledge about \(S_X\).

2.2 Weighted Least-Squares for Both Variables Estimation

The linear calibration model (Eq. 2) shows that the two noise components can be combined as \(S_Y(i) \cdot Y_\text {noise}(i) - \beta _1 \cdot S_X(i) \cdot X_\text {noise}(i)\). This combination hints at the estimation approach by attaching weights to the observations of both variables (Deming 1943; Lindley 1947). Over the years, the variant suggested by York (1966) and others became the standard, namely the minimization of

$$\begin{aligned} S\hspace{-0.08em}S\hspace{-0.05em}Q\hspace{-0.05em}W\hspace{-0.15em}XY(\beta _0, \beta _1) = \sum _{i=1}^{n} \frac{\left[ y(i) - \beta _0 - \beta _1 \cdot x(i) \right] ^2}{S_Y(i)^2 + \beta _1^2 \cdot S_X(i)^2}. \end{aligned}$$
(10)

We abbreviate this estimation procedure as WLSXY. Also, this estimation type requires prior knowledge, namely about \(S_Y(i)\) and \(S_X(i)\).

However, WLSXY minimization of \(S\hspace{-0.08em}S\hspace{-0.05em}Q\hspace{-0.05em}W\hspace{-0.15em}XY(\beta _0, \beta _1)\) is numerically difficult because the slope, \(\beta _1\), appears in the denominator of the least-squares sum. The routine Fitexy (Press et al. 1992) parameterizes the slope as \(\beta _1^\prime = \tan ^{-1}(\beta _1)\), scales the y(i) values, and uses Brent’s search with a starting value for the slope from an initial OLS estimation. Other relevant papers were published by Reed (1989, 1992) and Squire (1990). We follow those authors in the use of WLSXY for estimation, but not for parameter error determination—for that purpose, bootstrap resampling (Sect. 3) is employed instead.

2.3 Other Estimation Procedures

OLS regression (Sect. 2.1) can be geometrically described as a procedure for finding the best-fit regression line via the minimization of the sum of the squares of the vertical distances (i.e., in the Y-direction) between data points and the regression line. OLS estimation can also be applied in the other direction (i.e., of X on Y), which would utilize the horizontal distances (i.e., in the X-direction) and yield a different estimation, as standard textbooks (Davis 1986) explain. Reduced major axis (RMA) regression can be seen as a compromise between the OLS variants, since it involves the minimization of the sum of the products of the vertical and horizontal distances (Davis 1986). The RMA estimators, finally, are

$$\begin{aligned} {\widehat{\beta }}_{1, \text {RMA}}&= \left( S\hspace{-0.08em}S_Y\big / S\hspace{-0.08em}S_X\right) ^{1/2}, \end{aligned}$$
(11)
$$\begin{aligned} {\widehat{\beta }}_{0, \text {RMA}}&= {\bar{Y}} - {\widehat{\beta }}_{1, \text {RMA}} \cdot {\bar{X}}, \end{aligned}$$
(12)

where

$$\begin{aligned} S\hspace{-0.08em}S_Y&= \sum _{i=1}^{n} y(i)^2 - \left[ \sum _{i=1}^{n} y(i) \right] ^2 \Big / n, \end{aligned}$$
(13)
$$\begin{aligned} S\hspace{-0.08em}S_X&= \sum _{i=1}^{n} x(i)^2 - \left[ \sum _{i=1}^{n} x(i) \right] ^2 \Big / n, \end{aligned}$$
(14)
$$\begin{aligned} {\bar{Y}}&= \sum _{i=1}^{n} y(i) \Big / n, \end{aligned}$$
(15)

and

$$\begin{aligned} {\bar{X}}&= \sum _{i=1}^{n} x(i) \Big / n. \end{aligned}$$
(16)

Another estimation approach, also geometrically straightforward to understand, is the Wald–Bartlett (WB) procedure (Mudelsee 2014). The idea is to divide the set of data points into three groups of the same size according to the size of the x(i) values. The center for the first group (smallest x(i) values) is given via the respective x(i) and y(i) means for the group, and analogously for the third group (largest x(i) values). The line that connects both centers then defines the WB slope estimate. The problem with the RMA and WB procedures is that they deliver biased estimations for the calibration slope, \(\beta _1\). This was previously shown in the case of WB (Mudelsee 2014) and will be seen in the case of RMA (Sect. 3.2).

3 Uncertainty Determination

In statistical language, the estimated values for the intercept and slope are called point estimates. In the real world, with limited size of noisy data, the point estimates deviate from the true values. However, one can measure the typical sizes of such deviations. Statistical methodology has therefore developed procedures to determine interval estimates, or confidence intervals (CIs) (Robinson 1982), as a type of uncertainty measure that accompanies an estimate. In fact, a serious interpretation of the significance of an estimate is difficult to make without an uncertainty measure.

Classical CIs are those that can be derived with pencil and paper. The construction of classical CIs for the linear errors-in-variables model (Eq. 2) in earlier work (York 1966; Fuller 1987) made a number of assumptions from the following:

  1. 1.

    Gaussian distributional shapes of the noise components, \(X_\text {noise}(i)\) and \(Y_\text {noise}(i)\);

  2. 2.

    absence of autocorrelation in the noise components;

  3. 3.

    absence of correlation between X(i) and \(X_\text {noise}(i)\) and between Y(i) and \(Y_\text {noise}(i)\); and

  4. 4.

    absence of correlation between \(X_\text {noise}(i)\) and \(Y_\text {noise}(i)\).

Some authors (York 1969; Freedman 1984; Freedman and Peters 1984; Carroll et al. 2006) treat non-Gaussian errors (point 1) and the correlation effects (points 3 and 4). However, allowance for autocorrelations (point 2) seems to have been made by none, and this is a crucial issue since climate fluctuations typically exhibit autocorrelation (memory) besides non-Gaussian shapes.

In the present paper, the interest is in linearly relating two processes, X(i) and Y(i), and the sample, \(\left\{ t(i), x(i), y(i)\right\} _{i=1}^{n}\), may contain the time. Analyses of climate data by numerous authors document that non-Gaussian distributions and autocorrelation phenomena are typical of climate processes (see, e.g., Trenberth (1984); Perron and Sura (2013) and Mudelsee (2014)). We cannot expect the classical method to yield accurate results (CIs) for climate data. Therefore, this paper presents a computing-intensive CI construction method that is based on bootstrap resampling (Efron 1979; Mudelsee 2014). The computational steps are detailed at the algorithmic level (Sect. 3.1).

3.1 Bootstrap Algorithm

The algorithm for CI construction for the estimated errors-in-variables regression parameters is shown below. Explanations and illustrations for various steps are also given (Sect. 3.1.1).

1.

Bivariate time series

\(\Bigl \{t(i), x(i), y(i)\Bigr \}_{i=1}^{n}\)

2.

Parameter estimates

\({\widehat{\beta }}_0, {\widehat{\beta }}_1\)

 

from OLSBC or WLSXY

 

3.

Residuals

\(e_X(i), e_Y(i)\)

4.

Fit values

\(x_\text {fit}(i) = x(i) - e_X(i)\),

  

\(y_\text {fit}(i) = y(i) - e_Y(i)\)

5.

Bias-corrected AR(1)

 
 

parameters

\(\widehat{{\bar{a}}}_X^\prime ,\ \widehat{{\bar{a}}}_Y^\prime \)

 

estimated on residuals,

 
 

block length selection

l

6.

Resampled residuals,

 
 

pairwise-MBB with l

\(\Bigl \{e_X^{*b}(i), e_Y^{*b}(i)\Bigr \}_{i=1}^{n}\) (b, counter)

7.

Resample

\(x^{*b}(i) = x_\text {fit}(i) + e_X^{*b}(i),\)

  

\(y^{*b}(i) = y_\text {fit}(i) + e_Y^{*b}(i), i = 1, \ldots ,n\)

8.

Replication, parameters

\({\widehat{\beta }}^{*b}_0, {\widehat{\beta }}^{*b}_1\)

9.

Replication, prediction

\({\widehat{y}}^{*b}(n+1) = {\widehat{\beta }}^{*b}_0 + {\widehat{\beta }}^{*b}_1 \cdot \Bigl [x(n+1)\)

  

\(\qquad \qquad \qquad + S_X \cdot {\mathcal {E}}_{\text {N}(0,\, 1)}(n+1)\Bigr ]\)

10.

Go to step 6 until

 
 

\(b = B = 2,000\)

 
 

replications exist each

\(\Bigl \{{\widehat{\beta }}^{*b}_0\Bigr \}_{b=1}^{B}, \Bigl \{{\widehat{\beta }}^{*b}_1\Bigr \}_{b=1}^{B}, \Bigl \{{\widehat{y}}^{*b}(n+1)\Bigr \}_{b=1}^{B}\)

11.

Calculate CI on the

 
 

basis of replications

 

3.1.1 Remarks

In step 1, if time values, t(i), are unavailable, then in step 5, set \(l = 1\). In step 3, see Fig. 1 for the definition of the residuals.

Fig. 1
figure 1

CI construction, definition of residuals. The illustration is for a certain data point, \((x(i^\prime ), y(i^\prime ))\) (filled circle), from which the distance to the linear fit (thick, tilted line) is measured via line L; the slope of L is equal to \(-\lambda /{\widehat{\beta }}_1\), where \(\lambda = [S_Y(i^\prime )/S_X(i^\prime )]^2\) (York 1967). The residuals (dashed lines) are given by \(e_X(i^\prime ) = [{\widehat{\beta }}_0 + {\widehat{\beta }}_1 \cdot x(i^\prime ) - y(i^\prime )]/ [\lambda /{\widehat{\beta }}_1 + {\widehat{\beta }}_1]\) and \(e_Y(i^\prime ) = -\lambda \cdot e_X(i^\prime )/{\widehat{\beta }}_1\); they can have positive or negative values. Modified after (Mudelsee 2014, Fig. 8.5 therein)

The overall goal of step 5 is to find a block length that is suitable to preserve the autocorrelation properties of the data-generating process (Künsch 1989). The ordinary bootstrap (Efron 1979), which generates resamples with preserved distributional shape (because one randomly draws from the data with replacement), can then be augmented to preserve autocorrelation (over the length of the randomly drawn time blocks) as well. This renders the moving-block bootstrap (MBB) resampling as suited for climate data (Mudelsee 2014). The AR(1) model is also applicable to unevenly spaced time series if it is formulated by means of a parameter called persistence time (here called \(\tau _X\) or \(\tau _Y\)); see Mudelsee (2002) for a description of the model and a numerical persistence time estimation procedure that includes bias correction and CI construction. With the help of the average temporal spacing, \({\bar{d}} = [t(n) - t(1)]/(n-1)\), the bias-corrected persistence-time estimates, \({\widehat{\tau }}_X^\prime \) and \({\widehat{\tau }}_Y^\prime \), can then be converted into another pair of parameters called equivalent autocorrelation coefficients, \(\widehat{{\bar{a}}}_X^\prime = \exp (-{\bar{d}}/{\widehat{\tau }}_X^\prime )\) and \(\widehat{{\bar{a}}}_Y^\prime = \exp (-{\bar{d}}/{\widehat{\tau }}_Y^\prime )\). The idea of the equivalent autocorrelation coefficient is to have for the case of uneven spacing a persistence parameter that corresponds to the usual autocorrelation coefficient in the case of even spacing. Owing to the construction of the residuals (Fig. 1), it can be shown (Mudelsee 2014) that for the algorithm, both X and Y yield the same value, \(\widehat{{\bar{a}}}_X^\prime = \widehat{{\bar{a}}}_Y^\prime \). This value is plugged into the block length selector by Carlstein (1986) and Sherman et al. (1998),

$$\begin{aligned} l = NINT \left\{ \left[ 6^{1/2} \cdot {\widehat{\bar{a}}}_X^\prime \big / \left( 1 - {\widehat{\bar{a}}}_X^{\prime \, 2} \right) \right] ^{2/3} \cdot n^{1/3} \right\} , \end{aligned}$$
(17)

where \(NINT(\cdot )\) is the nearest integer function. We finally note that there exist other block length selectors that are also based on estimated autocorrelation properties of residuals (Mudelsee 2014), but for CI accuracy (i.e., how close to the nominal confidence level a CI performs in a Monte Carlo simulation experiment), this question is of limited relevance. In other words, the blocks just have to be long enough to preserve autocorrelation.

Step 6 is bootstrap resampling: draw with replacement random blocks of length l from the residuals (step 3). The blocks are allowed to overlap. The resample is filled from the left with the block elements until the sample size, n, has been reached. If necessary, some elements from the right of the last block are discarded. For example, let \(n = 8\) and \(l = 3\); a possible sequence would be \(\left\{ e_X^{*b}(i)\right\} _{i=1}^{8} = \left\{ e_X(3), e_X(4), e_X(5), e_X(1), e_X(2), e_X(3), e_X(6), e_X(7)\right\} \). This procedure is performed in a pairwise manner (Mudelsee 2014), which means that the random indices are taken for \(e_Y^{*b}(i)\); in the example, \(\left\{ e_Y^{*b}(i)\right\} _{i=1}^{8} = \left\{ e_Y(3), e_Y(4), e_Y(5), e_Y(1), e_Y(2), e_Y(3), e_Y(6), e_Y(7)\right\} \). Since the adaptation of the MBB resampling is done for the residuals in a pairwise manner, this adapted bootstrap procedure is called pairwise-MBBres.

Step 9 shows that the usual step of calculating the replications (i.e., copies of the estimates) can also be done for prediction. The important new calculation step, which means one that goes beyond the book by Mudelsee (2014), is the addition of the random term \(S_X \cdot {\mathcal {E}}_{\text {N}(0,\, 1)}(n+1)\), where \({\mathcal {E}}_{\text {N}(0,\, 1)}(\cdot )\) denotes an independent standard Gaussian variable (mean zero, standard deviation unity). This addition serves to include the predictor uncertainty since \(X(n+1)\) is not exactly known but measured with error (standard deviation \(S_X\)). The Gaussian assumption should be well applicable to most calibration problems where data are obtained from measurement devices. This predictor uncertainty is combined with the parameter estimation uncertainties in order to yield a realistic quantification of the prediction uncertainty. Note further that the formulation of step 9 corresponds to the homoscedastic error type. In the case of heteroscedastic errors, one would use \(S_X(n+1)\) instead of \(S_X\) and make some assumptions. In the provided LINCAL software, for example, the user can select the value of \(S_X(n+1)\). Evidently, if prior knowledge indicates other values, then the LINCAL source code should be adapted accordingly and recompiled.

The number of replications in step 10 is set as \(B = 2,000\). Monte Carlo simulation experiments (Mudelsee 2014) show that this value is clearly sufficient for an accurate determination of the CI bounds. For CI construction (step 11), there exist two basic approaches (Efron and Tibshirani 1993). One is via the percentiles of the B replication values. The other, used in the present paper, is via the standard deviation over the replications, which is called bootstrap standard error. For example, in the case of the slope, it is given by

$$\begin{aligned} \widehat{\text {se}}_{{\widehat{\beta }}^{*}_1} = \left\{ \sum _{b=1}^{B} \left[ {\widehat{\beta }}^{*b}_1 - \left\langle {\widehat{\beta }}^{*b}_1\right\rangle \right] ^2\bigg /(B-1)\right\} ^{1/2}, \end{aligned}$$
(18)

where \(\left\langle {\widehat{\beta }}^{*b}_1\right\rangle = \sum _{b=1}^{B} {\widehat{\beta }}^{*b}_1 / B\). The bootstrap Student’s t confidence interval then is (shown for the case of the slope)

$$\begin{aligned} \text {CI}_{{\widehat{\beta }}_1,1-2\alpha } = \left[ {\widehat{\beta }}_1 + t_\nu (\alpha ) \cdot \widehat{\text {se}}_{{\widehat{\beta }}^*_1}; {\widehat{\theta }} - t_\nu (\alpha ) \cdot \widehat{\text {se}}_{{\widehat{\beta }}^*_1} \right] , \end{aligned}$$
(19)

where \(t_\nu (\alpha )\) is the percentage point of the Student’s t distribution function with \(\nu = n - 2\) degrees of freedom, and \(1-2\alpha \) = 95% is the confidence level. The \(\nu \) value is justified by the fact that the calibration problem comprises two parameters to be estimated. The confidence level employed is (besides values of 90% and 99%) a common choice in climate sciences (Mudelsee 2014). The LINCAL source code details the numerical calculation of \(t_\nu (\alpha )\). An interesting alternative to uncertainty determination via the pairwise-MBBres procedure, pointed out by one of the reviewers of the present paper, could be via the effective sample size (Mudelsee 2014; Hu et al. 2017).

3.2 Monte Carlo Simulation Experiment

A Monte Carlo experiment is a computer simulation of a random process (Fishman 1996). The properties of the data-generating process, such as mean and standard deviation, can be prescribed. Hence, unlike in the real world, the truth is known. The random component is brought in by means of a random number generator (Mudelsee 2020). Since the prescribed parameter values are known, Monte Carlo simulation experiments can be used as objective tests of the estimation algorithms of the parameters and of the CI construction methods. The present paper studies the linear calibration model (Eq. 2) with parameters \(\beta _0\) and \(\beta _1\). Emphasis is put on unbiasedness of the parameter estimators (Sect. 2) and accuracy of the bootstrap CIs (Sect. 3.1) under realistic conditions. Six different simulation experiments are carried out, starting with an easy setting and progressing to more challenging conditions.

The easy simulation setting with un-autocorrelated and homoscedastic Gaussian noise processes (Table 1) allows unbiased (i.e., within simulation noise) estimations, with both OLSBC and WLSXY, and for both intercept \(\beta _0\) and slope \(\beta _1\). Additionally, the bootstrap CIs show excellent coverage performance, which means that the nominal level of 95% is nearly achieved for sample sizes (n) as small as 50.

Table 1 Monte Carlo simulation experiment, linear errors-in-variables regression (Eq. 2) with un-autocorrelated and homoscedastic Gaussian noise processes

The so-called naive OLS method fails already for the easy simulation setting (Table 1) in the case of calibration slope estimation, which means that the empirical coverage is far off the nominal value (results not shown). This is due to the negative bias of OLS slope estimation in the presence of predictor noise (Sect. 2.1).

Prescription of a slightly more challenging setting, where \(Y_\text {noise}(i)\) instead has a skewed, lognormal shape (Table 2), has within simulation noise no detectable performance reduction in terms of bias and coverage. This holds for OLSBC and WLSXY estimation, and for both intercept and slope.

Table 2 Monte Carlo simulation experiment, linear errors-in-variables regression (Eq. 2) with un-autocorrelated and homoscedastic Gaussian/lognormal noise processes

However, the introduction of a heteroscedastic noise component (Table 3) demonstrates that a misapplication of the OLSBC estimation method brings a dramatic reduction in coverage accuracy for the slope. Also, the bias for \({\widehat{\beta }}_1\) by means of OLSBC increases (compared to the previous two easy settings) by a factor of between roughly 2 and 100 (dependent on n). On the other hand, use of the indicated (because of the heteroscedasticity) WLSXY estimation yields acceptably accurate results. In that regard, empirical accuracy of 92–94% or 98–96% instead of nominally 95% should be acceptable for applied sciences.

Table 3 Monte Carlo simulation experiment, linear errors-in-variables regression (Eq. 2) with un-autocorrelated and heteroscedastic Gaussian noise processes

Even further increased challenges for the method, the introduction of a lognormal shape (Table 4) and, additionally, of autocorrelation (Table 5) does not change the overall assessment: WLSXY performs acceptably well, while OLSBC fails because of the heteroscedasticity.

Table 4 Monte Carlo simulation experiment, linear errors-in-variables regression (Eq. 2) with un-autocorrelated and heteroscedastic Gaussian/lognormal noise processes
Table 5 Monte Carlo simulation experiment, linear errors-in-variables regression (Eq. 2) with autocorrelated and heteroscedastic Gaussian/lognormal noise processes

The summary assessment of the performance of the estimation methods for the errors-in-variables regression model (Eq. 2) is as follows. For homoscedastic noise processes, both OLSBC and WLSXY perform well, that is, they deliver (within simulation noise) unbiased calibration slopes and bootstrap CIs with an accuracy that is acceptable for applied sciences. The bootstrap resampling (pairwise-MBBres) allows one to take into account both non-Gaussian shapes and autocorrelation of the random components. On the other hand, RMA estimation yields, even in the easy situation with un-autocorrelated and homoscedastic Gaussian noise processes (Table 6), unacceptably inaccurate calibration slopes: positive bias (i.e., overestimation) and undercoverage of CIs.

Table 6 Monte Carlo simulation experiment, linear errors-in-variables regression (Eq. 2) with un-autocorrelated and homoscedastic Gaussian noise processes

Although the present paper attaches priority to an unbiased calibration slope estimation, the standard error, se\(_{{\widehat{\beta }}_1}\), should also be considered since this measures the spread (standard deviation) of the estimate. Both uncertainty measures can be readily combined to form the root mean squared error, RMSE\(_{{\widehat{\beta }}_1}\). Following Mudelsee (2014),

$$\begin{aligned} \text {RMSE}_{{\widehat{\beta }}_1} = \left( \text {se}_{{{\widehat{\beta }}_1}}^2 + \text {Bias}_{{{\widehat{\beta }}_1}}^2\right) ^{1/2}. \end{aligned}$$
(20)

For the easy Monte Carlo setting (un-autocorrelated and homoscedastic Gaussian noise processes) and large sample sizes (say, \(n \ge 500\)), the WLSXY estimation method also outperforms OLSBC and RMA in terms of RMSE\(_{{\widehat{\beta }}_1}\) (Table 7). For medium-sized samples (\(50 \le n < 500\)), however, RMA performs no worse (or even better) than WLSXY or OLSBC. And for small sample sizes (\(n \le 50\)), there are no positive effects of WLSXY or OLSBC in terms of RMSE\(_{{\widehat{\beta }}_1}\), not even when compared with OLS (Table 7). It is emphasized that these assessments apply only to the easy setting; for more complex settings, the WLSXY method should be optimal. Finally, the so-called naive OLS calibration estimation method yields completely unacceptable RMSE\(_{{\widehat{\beta }}_1}\) values (Table 7). Evidently, this poor performance is due to the negative bias of the slope via the OLS estimation (Sect. 2.1).

Table 7 Monte Carlo simulation experiment, linear errors-in-variables regression (Eq. 2) with un-autocorrelated and homoscedastic Gaussian noise processes

4 Application

Among the various natural archives where information about past climate variations is stored, the coral archive is distinguished by (i) a comparably quick growth (i.e., a high accumulation rate of the carbonate material) and (ii) the presence of detectable yearly growth bands. This has allowed paleoclimatologists to infer climate variability on subseasonal timescales with a high relative temporal accuracy (i.e., with respect to an absolute fixpoint), thereby extending the instrumental record back in time by thousands and millions of years (Felis 2020). A mild deficit of the method stems from often-reduced absolute temporal accuracy in fossil corals (i.e., the fixpoint date is uncertain) and the shortness of the records. Typical is a monthly or bimonthly spacing over a length of a few decades to centuries—corals provide accurate snapshots of subseasonal climate, possibly deeply back in time.

A frequently employed proxy variable is oxygen isotopic composition, which is measured by a mass spectrometer on the coral carbonate material. Conventionally, the isotopic values are reported in the delta notation,

$$\begin{aligned} \delta ^{18}\text {O} = \left[ ({}^{18}\text {O}/{}^{16}\text {O})_\text {sample} \big / ({}^{18}\text {O}/{}^{16}\text {O})_\text {VPDB} - 1\right] \cdot 1,000\permille , \end{aligned}$$
(21)

where \(({}^{18}\text {O}/{}^{16}\text {O})\) is the number ratio of oxygen isotopes \({}^{18}\text {O}\) and \({}^{16}\text {O}\), and VPDB stands for the Vienna Pee Dee Belemnite standard material, against which the sample is compared. Corals that grew in the surface water in tropical regions can therefore provide proxy records of SST (Felis 2020). As mentioned in Sect. 1, the real world in paleoclimatology is not perfect, and a number of uncertainties are present in coral proxy thermometry: imperfect devices and material (e.g., mass spectrometer, thermometer, or standard) and confounding climate variables (e.g., sea-surface salinity). Paired measurements, where the \(\delta ^{18}\text {O}\) data are augmented with information from another proxy (e.g., Sr/Ca elemental ratios) obtained on the same sample, can therefore improve proxy-derived inferences (Pfeiffer et al. 2019).

The paper by Brenner et al. (2017) presented an SST–\(\delta ^{18}\text {O}\) calibration obtained from modern Isopora corals on the Great Barrier Reef, off the northeastern coast of Australia. In total, the \(\delta ^{18}\text {O}\) data were measured on five corals from near Heron Island; between one and three corals supplied monthly resolved series between October 1971 and July 1976, while between one and two corals grew between September 2009 and August 2012. For these two time intervals, there are no missing data, and the sample size of the combined monthly series is \(n = 94\). Brenner et al. (2017) utilized the Extended Reconstructed Sea Surface Temperature V3b (ERSST V3b) dataset from the National Oceanic and Atmospheric Administration (NOAA, Boulder, CO, USA), which is available at monthly resolution on a 2.0 degree latitude by 2.0 degree longitude global grid; Heron Island is presented by the grid cell centered at 22\(^\circ \)S, 152\(^\circ \)E. ERSST V3b is described in detail by Smith et al. (2008).

Since the original five \(\delta ^{18}\text {O}\) records from the various corals could possibly be shifted against each other due to local biological or microclimatological factors, Brenner et al. (2017) determined five centered \(\delta ^{18}\text {O}\) series by subtracting from each series the average \(\delta ^{18}\text {O}\) calculated over the shared intervals. These centered series were then averaged to construct a composite coral record. This procedure means that the target of the inference is just the calibration slope, not the intercept.

For an individual coral \(\delta ^{18}\text {O}\) measurement data point, Brenner et al. (2017) report accuracy of 0.11\(\permille \). We take this value for the calculation of the standard error, \(s_X(i)\), of the composite coral \(\delta ^{18}\text {O}\) by means of error propagation. This means that if three corals contribute to an x(i) value, then \(s_X(i) = 0.11\permille /\surd {3} \approx 0.064\permille \); if two corals contribute, then \(s_X(i) = 0.11\permille /\surd {2} \approx 0.078\permille \); and if one coral contributes, then \(s_X(i) = 0.11\permille \). As regards the SST standard error, Smith et al. (2008) give the SST error variances dependent on time and grid cell (available for download at https://psl.noaa.gov/data/gridded/data.noaa.ersst.v3.html, last accessed November 21, 2022); and the standard error, \(s_Y(i)\), is simply the square root of the variance. The time series are shown in Fig. 2, and the full dataset \(\left\{ t(i), x(i), y(i), s_X(i), s_Y(i)\right\} _{i=1}^{94}\) is given in Table 8. The heteroscedastic error components indicate that the estimation method to be employed for the linear calibration is WLSXY (Sect. 2.2).

Fig. 2
figure 2

Application, data. Filled symbols, SST; open symbols, coral \(\delta ^{18}\text {O}\). Note inverted \(\delta ^{18}\text {O}\) axis. See the main text (Sect. 4) for further explanations and Table 8 for numerical values

Table 8 Application, data

The numerical values for the resulting calibration fit (Fig. 3) are as follows. The bias-corrected persistence-time estimate for SST and coral \(\delta ^{18}\text {O}\) is \({\widehat{\tau }}_Y^\prime \) = 1.2 years and \({\widehat{\tau }}_X^\prime \) = 0.9 years, respectively. This leads to a block length of \(l = 9\) for pairwise-MBBres resampling (Eq. 17). The typical number of \(B = 2,000\) resamplings is used for the bootstrap algorithm (Sect. 3.1). The sample size is \(n = 94\). This setting should provide highly accurate Student’s t CIs for the WLSXY-determined slope and also for the SST predictions, as the corresponding entries (for n = 100) in Table 5 indicate. Finally, the slope estimate with 95% CI is

$$\begin{aligned} {\widehat{\beta }}_1 = -6.5\ {}^\circ \text {C}/\permille \ \left[ -8.0\ {}^\circ \text {C}/\permille ; -5.0\ {}^\circ \text {C}/\permille \right] . \end{aligned}$$
(22)
Fig. 3
figure 3

Application, result. The data points (SST, coral \(\delta ^{18}\text {O}\)) are shown as filled symbols, the standard errors as vertical and horizontal bars, the WLSXY-determined calibration curve as a solid line with a 95% Student’s t CI band (shaded) for the prediction. Note the inverted \(\delta ^{18}\text {O}\) axis

The fitted calibration curve (Fig. 3) also serves for SST prediction from new coral \(\delta ^{18}\text {O}\) values, denoted as \(x(n+1)\) in the statistical algorithm (Sect. 3.1). The assumed predictor uncertainty is \(s_X(n+1)\) = 0.11\(\permille \), which corresponds to a situation where just one coral is available for prediction of new SST values in the area of the Great Barrier Reef. For the application, a wide range of densely spaced \(x(n+1)\) values are shown, between \(-1.5\permille \) and \(+1.5\permille \) (Fig. 3). The resulting 95% CIs for the predictions are plotted as a shaded band. The borders of the confidence band show a minimal degree of jaggedness (visible in the electronic version of the present paper by means of zooming), which stems from the fact that only \(B = 2,000\) resamplings are performed (and not an infinite amount). As with any extrapolation, a sound degree of caution in the interpretation for data regions extending beyond the sampled range should be exercised.

The persistence time estimates of 1.2 years for SST and 0.9 years for coral \(\delta ^{18}\text {O}\) are rather long when compared with other instrumental series of similar length and resolution (Mudelsee 2014). Evidently, these positive autocorrelations reflect the seasonal cycles, which are clearly expressed in the time series data (Fig. 2). Although in principle a harmonic seasonal cycle model for the data could be formulated instead of the AR(1) (Sect. 3.1.1), such a model mis-specification should have negligible effects on the CI bounds owing to the fact that, in practice, the block length (determined as \(l = 9\) in this application) must only be large enough to preserve the autocorrelations. However, the autocorrelations have to be taken into account for uncertainty determination. This is illustrated by means of a numerical experiment, where \(l = 1\) (i.e., no preserved autocorrelation) is prescribed, which would lead to a clearly too narrow CI for the slope (as compared with the true CI given in Eq. 22), namely

$$\begin{aligned} {\widehat{\beta }}_{1,\; l = 1} = -6.5\ {}^\circ \text {C}/\permille \ \left[ -7.3\ {}^\circ \text {C}/\permille ; -5.7\ {}^\circ \text {C}/\permille \right] . \end{aligned}$$
(23)

Still, however, it is wise to keep the seasonal data for the calibration (and to not perform an annual downsampling), since this means that (i) the sample size is kept and not reduced, and (ii) the calibration range of the coral \(\delta ^{18}\text {O}\) values is kept wide.

The calibration slope—SST change per coral \(\delta ^{18}\text {O}\) change—is estimated as \({\widehat{\beta }}_1 = -6.5\ {}^\circ \text {C}/\permille \). The negative sign means that an increase in coral \(\delta ^{18}\text {O}\) indicates a cooling of SST. This estimate is obtained for the correct regression model (Eq. 2), namely, of the response variable Y (SST) on the predictor variable X (coral \(\delta ^{18}\text {O}\)). The regression direction is correct because the task of the proxy, X, is to predict the response, SST. It may be instructive to quantify the effects of the selection of the incorrect regression direction (of X on Y). This would lead to a slope estimate of \(-0.15\permille /{}^\circ \text {C}\), which, when inverted, corresponds to \({\widehat{\beta }}_{1,\ \text {incorrect}} = -6.7\ {}^\circ \text {C}/\permille \). This incorrect estimate is clearly within the CI for the correct estimate (Eq. 22). However, one cannot conclude that the selection of the incorrect regression direction has always negligible effects—this depends on the accuracy of the model, that is, the sizes of standard deviations and autocorrelations of the noise components in Eq. (2). Bearing this caveat about the incorrect regression direction in mind, one can compare the slope estimate of \({\widehat{\beta }}_1 = -6.5\ {}^\circ \text {C}/\permille \) with 95% CI \(\left[ -8.0\ {}^\circ \text {C}/\permille ; -5.0\ {}^\circ \text {C}/\permille \right] \) with values from previous studies. A remark on proxy system or forward modeling may be appropriate. The criticism expressed here on the incorrect regression direction is from a pure statistical viewpoint. It has nothing to do with the emerging field of proxy system modeling, which has led over the past few years to high-resolution insights into past climates. Proxy system modelers have the vision of full mathematical/physical descriptions of how the environment (e.g., climate) influences a sensor (e.g., \(\delta ^{18}\text {O}\)) in an archive (e.g., a coral), which is then sampled and used for measurements; see, for example, the review by Evans et al. (2013).

The direct comparison partner with our result is the estimate from the paper by Brenner et al. (2017), who used the same data (kindly sent to the present author), monthly \(\delta ^{18}\text {O}\) measured on Isopora corals from Heron Island and SST from ERSST V3b. Brenner et al. (2017) gave preferences to the estimate \({\widehat{\beta }}_{1,\ \text {incorrect}} = -5.4\ {}^\circ \text {C}/\permille \), which was determined via RMA regression. The other value obtained by these authors on the same data is \({\widehat{\beta }}_{1,\ \text {incorrect}} = -8.1\ {}^\circ \text {C}/\permille \), which stems from generalized least-squares regression, which is a method similar to OLS (Mudelsee 2014). The present paper is left to note that for statistical/methodological reasons, the preference should be on neither value but rather on Eq. (22) and \({\widehat{\beta }}_1 = -6.5\ {}^\circ \text {C}/\permille \); however, the two values presented by Brenner et al. (2017) cover roughly the range of the 95% CI given in Eq. (22).

The same data as used here or by Brenner et al. (2017) were studied in a thesis by Lemley (2012); however, he used only a strongly reduced subset, as Fig. 12 of the thesis reveals. The statistical estimation technology was not clearly communicated, and the resulting calibration slope of \(-6.7\ {}^\circ \text {C}/\permille \), which is in excellent agreement with the estimate from the present paper (Eq. 22), could well be a mere coincidence.

Also, Felis et al. (2014) presented a calibration of \(\delta ^{18}\text {O}\) measured on Isopora, including corals from Heron Island, but deviated as follows: they (i) used bulk coral values for reef settings with different average temperatures (so-called core-top calibration) along a latitudinal transect (from Papua New Guinea to Heron Island) instead of monthly values from records; (ii) employed another SST data product; and (iii) applied OLS and RMA regression techniques. The estimates of \({\widehat{\beta }}_{1,\ \text {incorrect}} = -5.1\ {}^\circ \text {C}/\permille \) (for OLS) and \(-4.6\ {}^\circ \text {C}/\permille \) (for RMA) obtained by Felis et al. (2014) therefore deviate from the value of \({\widehat{\beta }}_1 = -6.5\ {}^\circ \text {C}/\permille \) from Eq. (22).

Finally, Nishida et al. (2014) established a calibration on data from cultivated Isopora corals in laboratory experiments where the water temperature could be advantageously controlled; the range employed in that study (between 21.1 and 29.5 \({}^\circ \text {C}\)) is comparable to what is shown in Fig. 3. Their result (\({\widehat{\beta }}_{1,\ \text {incorrect}} = -6.7\ {}^\circ \text {C}/\permille \)) is apparently in excellent agreement with the result from the present paper (Eq. 22). To assess this, we consulted the JMP statistical software used by Nishida et al. (2014), which seems to perform OLS in an analysis-of-variance framework. Unfortunately, Nishida et al. (2014) presented only significance tests on the slope (but no CIs), and therefore we conclude that the excellent agreement may to some extent be spurious.

It is tempting to speculate about the general applicability of the obtained proxy calibration slope (Eq. 22) of \({\widehat{\beta }}_1 = -6.5\ {}^\circ \text {C}/\permille \). Regions other than Heron Island? Unfortunately, published results on Isopora corals from elsewhere are sparse. Other coral species? (Brenner et al. 2017, Table 3 therein) give many estimates from Porites corals and other regions, and these values seem in rough agreement with the value from Eq. (22). This value is also, within uncertainty bounds, in agreement with the slope estimate (\(-6.1\ {}^\circ \text {C}/\permille \)) of a Porites calibration work from the Red Sea using the correct regression direction and OLS (Felis et al. 2000). Additionally, an often-cited calibration study on Diploria corals from the Atlantic Ocean reported similar values (Hetzinger et al. 2006). If the calibration slopes obtained from various regions and various coral species agree (within confidence limits), then one would be in a favorable position and would be allowed to calculate an overall calibration slope (from all data points), an estimate that would then enjoy wider applicability. In that regard, is is perhaps apt to remind readers of the work done by the fathers of isotope paleoclimatology: Epstein et al. (1953) went so far as to assume a universal calibration slope of \({\widehat{\beta }}_1 = -4.3\ {}^\circ \text {C}/\permille \) for all marine carbonates. (To be more precise, these authors also assumed a quadratic term.) Notably, Epstein et al. (1953) applied the correct regression direction, namely of the response variable temperature on the predictor \(\delta ^{18}\text {O}\).

5 Conclusions

The linear calibration model is a powerful statistical tool that can be utilized to predict an unknown variable, Y, by means of observations of a proxy variable, X. Due to the nature of most prediction tasks, it is the relative change that is relevant for practical applications (i.e., the slope, which determines the change in Y per change in X), while the absolute value (i.e., the intercept) is less important. Note that the intercept is often arbitrarily constrained via use of anomaly values or employment of certain standards or units.

Since the calibration involves the estimation of regression model parameters on the basis of observed noisy datasets of a limited size, there are uncertainties involved. It is therefore of key importance to achieve a calibration slope estimation without bias. However, due to the fact that the proxy observations exhibit errors (i.e., predictor noise), advanced statistical routines have to be employed for correctly fitting the linear errors-in-variables calibration model (Eq. 2) to data. A similar quest for a state-of-the-art, data-driven uncertainty determination algorithm exists, which is rooted in the typically rather complex properties of the associated noise processes (i.e., for both Y and X), namely, non-Gaussian distributional shapes and autocorrelations. Such so-called problematic noise properties are ubiquitous in climate sciences, but they also appear in other branches of natural sciences, social sciences, medicine, or engineering.

The present paper shows that WLSXY estimation is able to deliver accurate and unbiased slope estimations under heteroscedasticity. In the case of homoscedasticity, besides WLSXY, the OLSBC estimation approach also performs well. On the other hand, estimation techniques to be avoided include OLS and RMA regression estimation, each of which may lead to severely biased calibration slope estimates. It is further necessary to take the correct regression direction into account, which is of the response variable, Y (e.g., SST), on the predictor or proxy variable, X (e.g., coral \(\delta ^{18}\text {O}\)).

Both methods of choice, however, require prior knowledge about the size of the predictor noise; this means that for homoscedasticity, OLSBC requires information about \(S_X\), and for heteroscedasticity, WLSXY about \(S_X(i)\). If such knowledge is unavailable, then there is a problem with calibration since a biased slope estimate may result. However, the experimenter, who generates the prediction values, X(i), has some control to reduce the bias. As Eq. (8) informs, such control can be gained by making \(V\hspace{-0.20em}AR[ X(i)]\) large against a guessed value of \(S_X^2\); then the results obtained using OLS lead to only a small bias. (Mudelsee 2014, Sect. 8.2.1, Sect. 8.3.2, Sect. 8.3.3 and Sect. 8.3.4 therein) presents ideas on modeling incomplete prior knowledge and several Monte Carlo simulation experiments on that theme. The Monte Carlo simulation results further demonstrate that a certain number of points (say, \(n \ge 50\)) may be necessary for the methodological advantages of OLSBC or WLSXY to take significant effect.

The present paper also introduces a powerful bootstrap resampling scheme (Sect. 3.1) which helps to obtain accurate CIs and prediction intervals under the abovementioned problematic noise properties. A Monte Carlo simulation experiment with artificial time series reveals good coverage performance of 95% Student’s t CIs under real-world conditions.

The errors-in-variables regression problem has long been known in climate sciences as well; see DeLong et al. 2007; Mann et al. 2007; Ammann et al. 2010; Mudelsee 2014, Ch. 8 therein; Thompson 2022, Sect. 3.6 therein and references cited in these works. The two methodological novelties presented in this paper are as follows. First, the bootstrap approach for CI construction (Sect. 3.1) is introduced to the peer-reviewed literature; the core of it has already been described in the book by Mudelsee (2014), but the present paper gives the full formula for the determination of the predictor uncertainty (Sect. 3.1). Second, the Monte Carlo simulation experiment (Sect. 3.2) presents new computer runs which show quantitatively that the WLSXY and OLSBC methods perform well for the proxy calibration problem in the presence of predictor noise, while the OLS and RMA methods fail. To the best of the author’s knowledge, there exist no other Monte Carlo simulation experiments published in the peer-reviewed literature on such tests of uncertainty measures for proxy calibration. The present paper therefore takes a step beyond the state of the art as regards method validation for that estimation problem. In addition to those novelties, the present paper has a new paleoclimatic application (Sect. 4) and presents a novel calibration software tool (Appendix).

The data-analytical machinery developed in the present paper was applied to a published dataset of SST predictions by means of \(\delta ^{18}\text {O}\) measurements made on the coral climate archive. Evidently, there is room for future application of the proxy technique shown herein on the basis of new datasets employed. However, it should also be enlightening to reanalyze existing data series using the presented state-of-the-art methodological approach in a consistent manner. The approach itself is put into Fortran software (LINCAL), which is made publicly available as source code (see Appendix). As a reminder, paleoclimatic science has over the past few decades developed powerful laboratory-based approaches to improve the prediction quality of proxy variables (Bradley 1999; Cronin 2010; Brenner et al. 2017; Felis 2020). This means that the calibration results presented here (Fig. 3) are not the last word. Improved calibrations for coral \(\delta ^{18}\text {O}\) as an SST proxy are expected to come from paired measurements, that is, from \(\delta ^{18}\text {O}\) and Sr/Ca measured on the same sample.