Skip to main content
Log in

Quantifying changes and their uncertainties in probability distribution of climate variables using robust statistics

  • Published:
Climate Dynamics Aims and scope Submit manuscript

Abstract

Robust tools are presented in this manuscript to assess changes in probability density function (pdf) of climate variables. The approach is based on order statistics and aims at computing, along with their standard errors, changes in various quantiles and related statistics. The technique, which is nonparametric and simple to compute, is developed for both independent and dependent data. For autocorrelated data, serial correlation is addressed via Monte Carlo simulations using various autoregressive models. The ratio between the average standard errors, over several quantiles, of quantile estimates for correlated and independent data, is then computed. A simple scaling-law type relationship is found to hold between this ratio and the lag-1 autocorrelation. The approach has been applied to winter monthly Central England Temperature (CET) and North Atlantic Oscillation (NAO) time series from 1659 to 1999 to assess/quantify changes in various parameters of their pdf. For the CET, significant changes in median (or scale) and also in low and high quantiles are observed between various time slices, in particular between the pre- and post-industrial revolution. Observed changes in spread and also quartile skewness of the pdf, however, are not statistically significant (at 95% confidence level). For the NAO index we find mainly large significant changes in variance (or scale), yielding significant changes in low/high quantiles. Finally, the performance of the method compared to few conventional approaches is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. The observed numerical value is normally noted using lower case letter, e.g. s 2 X ,

  2. Differentiating and squaring Eq. 21 yields:

    $$ \left(\delta S \right)^{2} = 4 \frac{\left[ \left( \delta X_{\gamma}- \delta X_{\beta} \right) \left( X_{\eta} - X_{\beta} \right) - \left( X_{\gamma} - X_{\beta} \right) \left( \delta X_{\eta} - \delta X_{\beta} \right) \right]^{2}}{\left( X_{\eta} - X_{\beta} \right)^{4}}, $$

    where the variables X α, α=β, γ, η, are to be fixed to ξα=F −1 (α), and the perturbations δX α refer to random perturbations. Now taking expectation of both sides, keeping in mind that for example EX γ − δX β)2 2β,γ , and (see Appendix, Eq. 40) \(E \left(\delta X_{\gamma} \delta X_{\beta}\right) = {\rm cov}\left(\delta X_{\gamma}, \delta X_{\beta}\right) = \frac{\beta(1-\gamma)}{n f(\xi_{\beta}) f(\xi_{\gamma})},\) i.e.

    $$\sigma_{\beta} \sigma_{\gamma} \sqrt{\frac{\beta (1-\gamma)}{\gamma (1-\beta)}},$$

    yields precisely (Eq. 22).

  3. Not any sequence of numbers ρ12,...,ρ p constitute the first p lagged autocorrelations of a stationary time series. The first lagged autocorrelations ρ1,...,ρ p must satisfy a set of consistency relationships, see e.g. Box et al. (1994.)

  4. The indicator of a set is a function that is one inside the set and zero elsewhere. I k ( ) can also be written using the Heaviside function \(\mathcal{H}(\,),\) which is one for positive numbers and zero otherwise, yielding \(I_{k} (x) = 1 - \mathcal{H} (X_{k} - x).\)

  5. http://ftp.ncep.noaa.gov/pub/cpc/wd52dg/data/indices/sstoi.indices

References

  • Antoniadou T, Besse P, Fougères AL, Le Gall C, and Stephenson DB (2001) L’Oscillation Atlantique Nord (NAO) et son role sur le climat Européen. Statistique Appliquée XLIX(3):39–60

    Google Scholar 

  • Bahadur RR (1966) A note on quantiles in large samples. Ann Math Statist 37:577–580

    Article  Google Scholar 

  • Berry DA, Lindgren BW (1990) Statistics: theory and methods. Pacific Grove, Brooks/Cole California

    Google Scholar 

  • Box GEP, Jenkins GM, Reinsel GC (1994) Time series analysis: forecasting and control. Prentice Hall, New Jersey

    Google Scholar 

  • Craigmile PF, Guttorp P, Percival DB (2004) Trend assessment in a long memory dependence model using the discrete wavelet transform. Environmetrics 15:313–335

    Article  Google Scholar 

  • Dai A, Trenberth KE, Karl TR (1999) Effects of clouds, soil moisture, precipitation and water vapor on diurnal temperature range. J Clim 12:2451–2473

    Article  Google Scholar 

  • David HA (1981) Order statistics, 2nd edn. Wiley, New York

    Google Scholar 

  • DeGroot MH, Schervish MJ (2002) Probability and statistics. Addison Wesley, Boston

    Google Scholar 

  • Drezner Z, Wesolowsky GO (1989) On the computation of the bivariate normal integral. J Statist Comput Simul 35:101–107

    Article  Google Scholar 

  • Feldstein SB (2000) The timescale, power spectra, and climate noise properties of teleconnection patterns. J Clim 13:4430–4440

    Article  Google Scholar 

  • Ferro CAT, Hannachi A, Stephenson DB (2005) Simple non-parametric techniques for exploring changing probability distributions of weather. J Clim 18:4344–4354

    Article  Google Scholar 

  • Freeman H (1963) Introduction to statistical inference. Addison-Wesly, Massachusetts

    Google Scholar 

  • Gilchrist WG (2000) Statistical modelling with quantile functions. Chapman and Hall, Boca Raton

    Google Scholar 

  • Granger CWJ, Joyeux R (1980) Introduction to long-memory time series models and fractional differencing. J Time Series Anal 1:15–29

    Article  Google Scholar 

  • Hall A, Manabe S (1997) Can local linear stochastic theory explain sea surface temperature and salinity variability? Clim Dyn 13:167–18

    Article  Google Scholar 

  • Hannachi A, Stephenson DB, Sperber KR (2003) Probability-based methods for quantifying nonlinearity in the ENSO. Clim Dyn 20:241–256

    Google Scholar 

  • Hannachi A, Stephenson DB, Sperber KR (2004) Erratum: probability-based methods for quantifying nonlinearity in the ENSO. Clim Dyn 22:69–70

    Article  Google Scholar 

  • Hasselmann K (1976) Stochastic climate models. Part I. Theory. Tellus 28:474–485

    Article  Google Scholar 

  • Hasselmann K (1988) PIPs and POPs-A general formalism for the reduction of dynamical systems in terms of principal interaction patterns and principal oscillation patterns. J Geophys Res 93:11015–11020

    Article  Google Scholar 

  • Heidelberg P, Lewis PAW (1984) Quantile estimation in dependent sequences. Oper Res 32:185–209

    Article  Google Scholar 

  • Hesse CH (1990) A Bahadur-type representation for empirical quantiles of a large class of stationary, possibly infinite-variance, linear processes. Ann Statist 18:1188–1202

    Article  Google Scholar 

  • Ho HC, Hsing T (1996) On the asymptotic expansion of the empirical process of long-memory moving averages. Ann Statist 24:992–1024

    Article  Google Scholar 

  • Hosking JRM (1981) Fractional differencing. Biometrika 68:165–176

    Article  Google Scholar 

  • IPCC (2001) Climate change 2001: synthesis report. A Contribution of working groups I, II, and III to the third assessment report of the intergovernmental panel on climate change: (Eds) Watson RT and the core writing team, Cambridge University Press, Cambridge

  • Ivchenko G, Medvedev Yu (1990) Mathematical statistics. Mir Publishers, Moscow

    Google Scholar 

  • Jungo P, Beniston M (2001) Changes in the anomalies of extreme temperature anomalies in the 20th century at Swiss climatological stations located at different latitudes and altitudes. Theor Appl Climatol 69:1–12

    Article  Google Scholar 

  • Karl TR, Jones PD, Knight RW, Kukla G, Plummer N, Razuvayev V, Gallo KP, Lindseay L, Charlson RJ, Peterson TC (1993) Asymmetric trends of daily minimum and maximum temperature. Bull Amer Meteor Soc 74:1007–1023

    Article  Google Scholar 

  • Katz RW, Brown BG (1992) Extreme events in a changing climate: variability is more important than averages. Clim Change 21:289–302

    Article  Google Scholar 

  • Kendall MG, Stuart A (1973) The advanced theory of statistics: inference and relationship, Vol 2, 3d edition. Charles Griffin and Company, London

  • Kendall MG, Stuart A (1977) The advanced theory of statistics: inference and relationship, Vol 1. Charles Griffin and Company, London

  • Kotz S, Johnson NL (eds) (1986) Encyclopedia of statistical sciences, vol 7. Wiley, New York

  • Kreyszig E (1970) Introductory mathematical statistics. Wiley, New York

    Google Scholar 

  • Lamb HH (1977) Climate-present, past and future, Volume 2: Climatic history and future. Methuen, London

  • Lawrance AJ, Kottegoda NT (1977) Stochastic modelling of river flow time series. J Roy Statist Soc A 140:1–47

    Article  Google Scholar 

  • Lazante JR (1996) Resistant, robust and non-parametric techniques for the analysis of climate data: Theory and examples, including applications to historical radiosonde station data. Int J Climatol 16:1197–1226

    Article  Google Scholar 

  • Luterbacher J et al (2001) Extending North Atlantic oscillation reconstructions back to 1500. Atmos Sci Lett 2:114–124

    Article  Google Scholar 

  • Manley G (1974) Central England temperatures: monthly means 1659 to 1973. Q J Roy Meteorol Soc 100:389–405

    Article  Google Scholar 

  • Mearns LO, Katz RW, Schneider SH (1984) Extreme high-temperature events: changes in their probabilities with changes in mean temperature. J Clim Appl Meteor 23:1601–1613

    Article  Google Scholar 

  • Parker DE, Legg TP, Folland CK (1992) A new daily Central England Temperature series 1772–1991. Int J Climatol 12:317–342

    Article  Google Scholar 

  • Penland C, Sardeshmukh PD (1995) The optimal growth of tropical sea surface temperature anomalies. J Clim 8:1999–2024

    Article  Google Scholar 

  • Percival BD, Overland JW, Mofjeld HO (2001) Interpretation of North Pacific variability as a short- and long-memory process. J Clim 14:4545–4559

    Article  Google Scholar 

  • Plackett RL (1954) A reduction formula for normal multivariate probabilities. Biometrika 41:351–360

    Google Scholar 

  • Sen PK (1968) Asymptotic normality of sample quantiles for m-dependent processes. Ann math Statist 39:1724–1730

    Google Scholar 

  • Sen PK (1972) On the Bahadur representation of sample quantiles for sequences of ϕ-mixing random variables. J Multiv Anal 2:77–95

    Article  Google Scholar 

  • Stephenson DB, Pavan V, Bojariu R (2002) Is the North Atlantic oscillation a random walk? Int J Climatol 20:1–18

    Article  Google Scholar 

  • Tarleton LF, Katz RW (1995) Statistical explanation for trends in extreme summer temperature at Phoenix, Arizona. J Clim 8:1704–1708

    Article  Google Scholar 

  • Von Storch J (1995) Multivariate statistical modelling: POP model as a first order approximation. In: von Storch H, Navarra A (eds) Analysis of climate variability: application of statistical techniques. Springer, Berlin Heidelberg New York, pp 281–297

    Google Scholar 

  • Wigley TML (1985) Impact of extreme events. Nature 316:106–107

    Google Scholar 

  • Wilcox RR (1997) Introduction to robust estimation and hypothesis testing. Academic, New York

    Google Scholar 

  • Wilks DS (1995) Statistical methods in the atmospheric sciences. Academic, San Diego

    Google Scholar 

  • Wu WB (2005) On the Bahadur representation of sample quantiles for dependent sequences. Ann Statist 33:1934–1963

    Article  Google Scholar 

  • Wunsch C (2003) The spectral description of climate change including the 100 ky energy. Clim Dyn 20:353–363

    Google Scholar 

Download references

Acknowledgments

This work was funded by the Centre for Global Atmospheric Modelling (CGAM), the University of Reading. The author wishes to thank S. Pezzulli, C.A.T. Ferro for beneficial discussion and for their encouragement, and D.B. Stephenson for critically reading the manuscript and pointing to the IPCC report. Thanks are also due to three anonymous reviewers for their thorough comments that helped improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Hannachi.

Appendix

Appendix

1.1 On quantile estimates and their standard errors

This appendix provides a simple guideline to compute quantiles and few other robust, and resistant measures of scale and shape of pdfs along with their standard errors in independent and dependent data.

1.1.1 Independent data

We consider a time series x t , t=1,..., n, assumed to be a a realisation of a sequence X 1,X 2,...,X n of independent and identically distributed (iid) random variables having the same continuous distribution as some random variable X whose cdf and its corresponding pdf are respectively F( ) and f( ). The cdf F( ) is monotonically increasing and can be inverted. The number ξ p =F −1 (p), for any 0<p<1, is the pth quantile of X. For a given x let \(I_{k} (x) = 1_{X_{k} \le x}\) be the indicator of {X k x }, i.e. I k (x) is one only if X k x, and zero otherwise. Then I k (x), k=1, ..., n, are also iid as I(x)=1Xx. The edf of the sample is given by:

$$F_n (x) = \frac{1}{n} \sum\limits_{k=1}^{n} I_k(x). $$
(32)

The use of the edf to express the sample quantile has been initiated first by Bahadur (1966) using the following asymptotic representation:

$$X_{(p)} = \xi_{p} + \frac{p - F_{n} (\xi_{p})}{f (\xi_{p})} + \mathcal{R}_{n} $$
(33)

where \(\mathcal{R}_{n}\) is a remainder that goes to zero as n increases. Note that for any 1 ≤ kn, I k p ) is a binary random variable taking values 0 and 1 with respective probabilities 1−p and p. Now, for large n, X (p)− ξ p is approximately normally distributed with zero mean and variance (see also Kendall and Stuart 1977; Wilcox 1997):

$$\sigma_{p}^{2} = \frac{p(1-p)}{n f^{2} (\xi_{p})}. $$
(34)

Both the estimators (Eqs. 33 and 34) are parametric, and they will be used later for dependent data. Here we are interested in non-parametric and resistant estimators for the quantiles and their standard errors. Let x 1:n x 2:n ... ≤ x n:n be the sorted sample of the time series. Similarly, let X 1:n X 2:n ... ≤ X n:n be the order statistics of the sequence X 1,X 2,...,X n . These order statistics can be used to construct a resistant, robust, and non-parametric estimator X (p) for the pth quantile, namely:

$$X_{(p)} = X_{p_{0}:n} $$
(35)

where p 0=[np+0.5], i.e. the integer part of np+0.5 (Kendall and Stuart 1973; David 1981.) Now at a given significance level α, the probability that X r:nX (p)X s:n, can be computed, and is given by (see e.g. Kendall and Stuart 1973):

$$1 - \alpha = J_{p} (r, n-r+1) - J_{p} (s, n-s+1), $$
(36)

where \(J_p (a,b) = \frac{1}{B(a,b)} \int_{0}^{p} t^{a} (1 - t)^{b-1} {\rm d}t\) is the cdf of the beta distribution with parameters a and b.

For a given n, α, and 0<p<1, Eq. 36 is solved by choosing the interval [r, s] to be centered at the rank p 0 corresponding to \(X_{(p)} = X_{p_{0}:n}.\) Hence, taking p 0=[np+0.5], we then choose s=2p 0r. The parameter r is then obtained by solving the nonlinear equation:

$$N(x) = J_{p} (x, n-x+1) - J_{p} (2p_{0} - x, n-2p_{0} + x + 1) -1 + \alpha = 0 $$
(37)

Note that in Eq. 37 the arguments nx+1, 2p 0x, and n−2p 0+x +1 should all be non-negative and that 1≤ xn. This readily yields:

$$\max \left(1,2p_{0}-n \right) \le x \le \min \left(n,2p_{0} -1 \right). $$
(38)

Equation 37 is solved using the root finder fzero in MATLAB within the interval provided by Eq. 38.

An approximation of the standard error σ p of the estimator X (p) of ξ p is obtained using:

$$2 \sigma_{p} \Phi^{-1} \left(1 - \frac{\alpha}{2} \right) = E \left( X_{s:n} - X_{r:n} \right), $$
(39)

where Φ is the cdf of the standard normal distribution, and E( ) stands for the expectation operator. The estimate of the standard error from the observed sample is obtained from Eq. 39 by substituting X s:n and X r:n for x s:n and x r:n respectively. In Table 1 we show values of r for α=5%, sample sizes ranging from 100 to 2,000, and various values of probability p.

In a similar manner it is possible to compute the standard errors of other quantities such as the quantile difference D p,q (see Eq. 19.) In this case, it can be seen from Eq. 33 that \(f(\xi_{p}) f(\xi_{q}) {\rm cov} \left( X_{(p)}, X_{(q)} \right) = \frac{1}{n^{2}} \sum_{k,l} {\rm cov} \left(I_{k} (\xi_{p}), I_{l} (\xi_{q}) \right),\) and since cov (I k p ), I l q ) )=p(1−q) δ kl , where δ kl is the Kronecker delta, one gets:

$${\rm cov} \left(X_{(p)}, X_{(q)} \right) = \frac{p(1-q)}{nf(\xi_{p}) f(\xi_{q})}, $$
(40)

which reproduces (Eq. 34) when p=q. The nonparametric expression of σ 2p,q =E(D p,q)2 (see Eq. 20) is then obtained from σ 2 p , σ 2 q , and Eq. 40 after eliminating f p ) and f q ) using Eq. 34.

1.1.2 Dependent data

Interest in computing quantiles from data has, since the work of Bahadur (1966), attracted many researchers. Extensions of Eq. 33 to sequences of random variables (or stochastic processes) with dependency structures have been performed to include a broad class of linear processes

$$X_{t} = \sum\limits_{k=1}^{\infty} a_{k} \varepsilon_{t - k}. $$
(41)

with iid ɛ k satisfying certain regularity properties. Sen (1968, 1972) extended Bahadur’s result to m-dependent and strongly mixing processes. Hesse (1990) extended the result to a broad class of linear short range dependence processes, whereas Ho and Hsing (1996) extended it to include long-range dependence. Wu (2005) generalised and refined those results for linear and some nonlinear processes.

In summary, Eq. 33 is found to apply to a large class of linear and nonlinear time series including short- and long-range dependence. Note that for the latter case a correction term

$$-\frac{1}{2} \overline{X_{n}}^{2} \frac{{\rm d}Log f}{{\rm d}x} (\xi_{p})$$

is included in Eq. 33 (Wu 2005).

From Eq. 33 the variance σ 2 p of X (p) now involves \({\rm var} \left[\frac{1}{n} \sum_{k=1}^{n} I_{k} (\xi_{p}) \right],\) precisely:

$$n \sigma_{p}^{2} f^{2}(\xi_{p})= \frac{1}{n} {\rm var} \left[ \sum_{k=1}^{n} I_{k} (\xi_{p}) \right]. $$
(42)

Later we give the expression of σ 2 p when the process is Gaussian. Using the stationarity of the sequence I k (x), k=1,2,..., n, yields, for large n, the approximation:

$$n \sigma_{p}^{2} f^{2} (\xi_{p}) \approx \sum\limits_{k=-(n-1)}^{n-1} {\rm cov} \left[I_{n} (\xi_{p}), I_{n+k} (\xi_{p}) \right]. $$
(43)

Heidelberg and Lewis (1984) interpret the right hand side of Eq. 43 as the value at zero frequency of the spectrum:

$$h(\omega, x) = \sum\limits_{k= -\infty}^{\infty} {\rm cos} (2 \pi \omega k) {\rm cov} \left[I_{n} (x), I_{n+k}(x) \right], $$
(44)

of the binary process I k (x), k=1,2, ..., for − (1/2) ≤ ω ≤ (1/2). The variance σ 2 p is then given by:

$$\sigma_{p}^{2} = \frac{h(0, \xi_{p})}{n f^{2} (\xi_{p})}. $$
(45)

The ratio R 2 p between variances of quantile estimates is then obtained as the ratio between Eqs. 45 and 34. Here, as in Heidelberg and Lewis (1984), the log-periodogram:

$$L(k, p) = \hbox{Log}\,\left[ \frac{1}{n} \left| \sum\limits_{j=1}^{n} I_{j} (\xi_{p}) {\rm e}^{-2{\rm i}(j-1)k/n}\right|^{2} \right], $$
(46)

estimated from the binary sequence, is used to compute h(0, ξ p ) using the intercept of a least square regression of the log-periodogram (Eq. 46). Note that the log-periodogram provides an estimator of the logarithm of the spectrum.

Now suppose that the sample is drawn from a zero-mean Gaussian process with covariance and correlation functions γ k and ρ k , k=0,1,...,n−1. In this case the joint distribution of any subset (X 1, ..., X p ) is completely specified by the covariance matrix \(\mbox{\boldmath$\Sigma$}_{p} = \left(\gamma_{i-j} \right)\) of the random vector X=(X 1, ..., X p )T. It can be verified that:

$${\rm cov} \left[I_{k} (\xi_{p}), I_{l} (\xi_{p}) \right] = {\rm Pr} \left(X_{k} \le \xi_{p}, X_{l} \le \xi_{p} \right) - p^{2} = \int\limits_{-\infty}^{\xi_{p}} \int\limits_{-\infty}^{\xi_{p}} g_{k,l} (x,y) {\rm d}x\,{\rm d}y - p^{2}, $$
(47)

which involves the joint pdf

$$g_{k,l}(x,y) = \frac{1}{2\pi\left|\mbox{\boldmath$\Sigma$} \right|^{1/2}} \hbox{exp}\,\left[-\frac{\gamma_{0}}{2\left|\mbox{\boldmath$\Sigma$}\right|} \left(x^{2} - 2 \rho_{k-l} xy + y^{2} \right) \right] $$
(48)

of X k and X l , when kl, which is a bivariate normal with zero mean and covariance matrix

$$ \mbox{\boldmath$\Sigma$} = \left(\begin{array}{*{20}c} \gamma_{0} & \gamma_{k-l} \\ \gamma_{k-l} & \gamma_{0} \\ \end{array}\right). $$

Note that for k=l one gets var(I k p ))=pp 2. Now using Eqs. 42 and 47, and letting

$$\Psi(k) = \frac{1}{2 \pi \sqrt{1 - \rho_{k}^{2}}} \int\limits_{-\infty}^{\xi_{p}/\sqrt{\gamma_{0}}} \int\limits_{-\infty}^{\xi_{p}/\sqrt{\gamma_{0}}} \hbox{exp}\,\left[ -\frac{1}{2(1 - \rho_{k}^{2})} \left(x^{2} - 2 \rho_{k} xy + y^{2}\right)\right] {\rm d}x\,{\rm d}y,$$
(49)

the variance of the pth quantile estimates becomes:

$$\sigma_{p}^{2}=\frac{1}{n f^{2} (\xi_{p})} \left[2 \sum\limits_{k=1}^{n-1} \left(1 - \frac{k}{n}\right) \Psi (k) + p - n p^{2}\right].$$
(50)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hannachi, A. Quantifying changes and their uncertainties in probability distribution of climate variables using robust statistics. Clim Dyn 27, 301–317 (2006). https://doi.org/10.1007/s00382-006-0132-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00382-006-0132-x

Keywords

Navigation