Abstract
Robust tools are presented in this manuscript to assess changes in probability density function (pdf) of climate variables. The approach is based on order statistics and aims at computing, along with their standard errors, changes in various quantiles and related statistics. The technique, which is nonparametric and simple to compute, is developed for both independent and dependent data. For autocorrelated data, serial correlation is addressed via Monte Carlo simulations using various autoregressive models. The ratio between the average standard errors, over several quantiles, of quantile estimates for correlated and independent data, is then computed. A simple scaling-law type relationship is found to hold between this ratio and the lag-1 autocorrelation. The approach has been applied to winter monthly Central England Temperature (CET) and North Atlantic Oscillation (NAO) time series from 1659 to 1999 to assess/quantify changes in various parameters of their pdf. For the CET, significant changes in median (or scale) and also in low and high quantiles are observed between various time slices, in particular between the pre- and post-industrial revolution. Observed changes in spread and also quartile skewness of the pdf, however, are not statistically significant (at 95% confidence level). For the NAO index we find mainly large significant changes in variance (or scale), yielding significant changes in low/high quantiles. Finally, the performance of the method compared to few conventional approaches is discussed.
Similar content being viewed by others
Notes
The observed numerical value is normally noted using lower case letter, e.g. s 2 X ,
Differentiating and squaring Eq. 21 yields:
$$ \left(\delta S \right)^{2} = 4 \frac{\left[ \left( \delta X_{\gamma}- \delta X_{\beta} \right) \left( X_{\eta} - X_{\beta} \right) - \left( X_{\gamma} - X_{\beta} \right) \left( \delta X_{\eta} - \delta X_{\beta} \right) \right]^{2}}{\left( X_{\eta} - X_{\beta} \right)^{4}}, $$where the variables X α, α=β, γ, η, are to be fixed to ξα=F −1 (α), and the perturbations δX α refer to random perturbations. Now taking expectation of both sides, keeping in mind that for example E(δX γ − δX β)2=σ 2β,γ , and (see Appendix, Eq. 40) \(E \left(\delta X_{\gamma} \delta X_{\beta}\right) = {\rm cov}\left(\delta X_{\gamma}, \delta X_{\beta}\right) = \frac{\beta(1-\gamma)}{n f(\xi_{\beta}) f(\xi_{\gamma})},\) i.e.
$$\sigma_{\beta} \sigma_{\gamma} \sqrt{\frac{\beta (1-\gamma)}{\gamma (1-\beta)}},$$yields precisely (Eq. 22).
Not any sequence of numbers ρ1,ρ2,...,ρ p constitute the first p lagged autocorrelations of a stationary time series. The first lagged autocorrelations ρ1,...,ρ p must satisfy a set of consistency relationships, see e.g. Box et al. (1994.)
The indicator of a set is a function that is one inside the set and zero elsewhere. I k ( ) can also be written using the Heaviside function \(\mathcal{H}(\,),\) which is one for positive numbers and zero otherwise, yielding \(I_{k} (x) = 1 - \mathcal{H} (X_{k} - x).\)
References
Antoniadou T, Besse P, Fougères AL, Le Gall C, and Stephenson DB (2001) L’Oscillation Atlantique Nord (NAO) et son role sur le climat Européen. Statistique Appliquée XLIX(3):39–60
Bahadur RR (1966) A note on quantiles in large samples. Ann Math Statist 37:577–580
Berry DA, Lindgren BW (1990) Statistics: theory and methods. Pacific Grove, Brooks/Cole California
Box GEP, Jenkins GM, Reinsel GC (1994) Time series analysis: forecasting and control. Prentice Hall, New Jersey
Craigmile PF, Guttorp P, Percival DB (2004) Trend assessment in a long memory dependence model using the discrete wavelet transform. Environmetrics 15:313–335
Dai A, Trenberth KE, Karl TR (1999) Effects of clouds, soil moisture, precipitation and water vapor on diurnal temperature range. J Clim 12:2451–2473
David HA (1981) Order statistics, 2nd edn. Wiley, New York
DeGroot MH, Schervish MJ (2002) Probability and statistics. Addison Wesley, Boston
Drezner Z, Wesolowsky GO (1989) On the computation of the bivariate normal integral. J Statist Comput Simul 35:101–107
Feldstein SB (2000) The timescale, power spectra, and climate noise properties of teleconnection patterns. J Clim 13:4430–4440
Ferro CAT, Hannachi A, Stephenson DB (2005) Simple non-parametric techniques for exploring changing probability distributions of weather. J Clim 18:4344–4354
Freeman H (1963) Introduction to statistical inference. Addison-Wesly, Massachusetts
Gilchrist WG (2000) Statistical modelling with quantile functions. Chapman and Hall, Boca Raton
Granger CWJ, Joyeux R (1980) Introduction to long-memory time series models and fractional differencing. J Time Series Anal 1:15–29
Hall A, Manabe S (1997) Can local linear stochastic theory explain sea surface temperature and salinity variability? Clim Dyn 13:167–18
Hannachi A, Stephenson DB, Sperber KR (2003) Probability-based methods for quantifying nonlinearity in the ENSO. Clim Dyn 20:241–256
Hannachi A, Stephenson DB, Sperber KR (2004) Erratum: probability-based methods for quantifying nonlinearity in the ENSO. Clim Dyn 22:69–70
Hasselmann K (1976) Stochastic climate models. Part I. Theory. Tellus 28:474–485
Hasselmann K (1988) PIPs and POPs-A general formalism for the reduction of dynamical systems in terms of principal interaction patterns and principal oscillation patterns. J Geophys Res 93:11015–11020
Heidelberg P, Lewis PAW (1984) Quantile estimation in dependent sequences. Oper Res 32:185–209
Hesse CH (1990) A Bahadur-type representation for empirical quantiles of a large class of stationary, possibly infinite-variance, linear processes. Ann Statist 18:1188–1202
Ho HC, Hsing T (1996) On the asymptotic expansion of the empirical process of long-memory moving averages. Ann Statist 24:992–1024
Hosking JRM (1981) Fractional differencing. Biometrika 68:165–176
IPCC (2001) Climate change 2001: synthesis report. A Contribution of working groups I, II, and III to the third assessment report of the intergovernmental panel on climate change: (Eds) Watson RT and the core writing team, Cambridge University Press, Cambridge
Ivchenko G, Medvedev Yu (1990) Mathematical statistics. Mir Publishers, Moscow
Jungo P, Beniston M (2001) Changes in the anomalies of extreme temperature anomalies in the 20th century at Swiss climatological stations located at different latitudes and altitudes. Theor Appl Climatol 69:1–12
Karl TR, Jones PD, Knight RW, Kukla G, Plummer N, Razuvayev V, Gallo KP, Lindseay L, Charlson RJ, Peterson TC (1993) Asymmetric trends of daily minimum and maximum temperature. Bull Amer Meteor Soc 74:1007–1023
Katz RW, Brown BG (1992) Extreme events in a changing climate: variability is more important than averages. Clim Change 21:289–302
Kendall MG, Stuart A (1973) The advanced theory of statistics: inference and relationship, Vol 2, 3d edition. Charles Griffin and Company, London
Kendall MG, Stuart A (1977) The advanced theory of statistics: inference and relationship, Vol 1. Charles Griffin and Company, London
Kotz S, Johnson NL (eds) (1986) Encyclopedia of statistical sciences, vol 7. Wiley, New York
Kreyszig E (1970) Introductory mathematical statistics. Wiley, New York
Lamb HH (1977) Climate-present, past and future, Volume 2: Climatic history and future. Methuen, London
Lawrance AJ, Kottegoda NT (1977) Stochastic modelling of river flow time series. J Roy Statist Soc A 140:1–47
Lazante JR (1996) Resistant, robust and non-parametric techniques for the analysis of climate data: Theory and examples, including applications to historical radiosonde station data. Int J Climatol 16:1197–1226
Luterbacher J et al (2001) Extending North Atlantic oscillation reconstructions back to 1500. Atmos Sci Lett 2:114–124
Manley G (1974) Central England temperatures: monthly means 1659 to 1973. Q J Roy Meteorol Soc 100:389–405
Mearns LO, Katz RW, Schneider SH (1984) Extreme high-temperature events: changes in their probabilities with changes in mean temperature. J Clim Appl Meteor 23:1601–1613
Parker DE, Legg TP, Folland CK (1992) A new daily Central England Temperature series 1772–1991. Int J Climatol 12:317–342
Penland C, Sardeshmukh PD (1995) The optimal growth of tropical sea surface temperature anomalies. J Clim 8:1999–2024
Percival BD, Overland JW, Mofjeld HO (2001) Interpretation of North Pacific variability as a short- and long-memory process. J Clim 14:4545–4559
Plackett RL (1954) A reduction formula for normal multivariate probabilities. Biometrika 41:351–360
Sen PK (1968) Asymptotic normality of sample quantiles for m-dependent processes. Ann math Statist 39:1724–1730
Sen PK (1972) On the Bahadur representation of sample quantiles for sequences of ϕ-mixing random variables. J Multiv Anal 2:77–95
Stephenson DB, Pavan V, Bojariu R (2002) Is the North Atlantic oscillation a random walk? Int J Climatol 20:1–18
Tarleton LF, Katz RW (1995) Statistical explanation for trends in extreme summer temperature at Phoenix, Arizona. J Clim 8:1704–1708
Von Storch J (1995) Multivariate statistical modelling: POP model as a first order approximation. In: von Storch H, Navarra A (eds) Analysis of climate variability: application of statistical techniques. Springer, Berlin Heidelberg New York, pp 281–297
Wigley TML (1985) Impact of extreme events. Nature 316:106–107
Wilcox RR (1997) Introduction to robust estimation and hypothesis testing. Academic, New York
Wilks DS (1995) Statistical methods in the atmospheric sciences. Academic, San Diego
Wu WB (2005) On the Bahadur representation of sample quantiles for dependent sequences. Ann Statist 33:1934–1963
Wunsch C (2003) The spectral description of climate change including the 100 ky energy. Clim Dyn 20:353–363
Acknowledgments
This work was funded by the Centre for Global Atmospheric Modelling (CGAM), the University of Reading. The author wishes to thank S. Pezzulli, C.A.T. Ferro for beneficial discussion and for their encouragement, and D.B. Stephenson for critically reading the manuscript and pointing to the IPCC report. Thanks are also due to three anonymous reviewers for their thorough comments that helped improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 On quantile estimates and their standard errors
This appendix provides a simple guideline to compute quantiles and few other robust, and resistant measures of scale and shape of pdfs along with their standard errors in independent and dependent data.
1.1.1 Independent data
We consider a time series x t , t=1,..., n, assumed to be a a realisation of a sequence X 1,X 2,...,X n of independent and identically distributed (iid) random variables having the same continuous distribution as some random variable X whose cdf and its corresponding pdf are respectively F( ) and f( ). The cdf F( ) is monotonically increasing and can be inverted. The number ξ p =F −1 (p), for any 0<p<1, is the pth quantile of X. For a given x let \(I_{k} (x) = 1_{X_{k} \le x}\) be the indicator of {X k ≤ x }, i.e. I k (x) is one only if X k ≤ x, and zero otherwise. Then I k (x), k=1, ..., n, are also iid as I(x)=1X ≤ x. The edf of the sample is given by:
The use of the edf to express the sample quantile has been initiated first by Bahadur (1966) using the following asymptotic representation:
where \(\mathcal{R}_{n}\) is a remainder that goes to zero as n increases. Note that for any 1 ≤ k ≤ n, I k (ξ p ) is a binary random variable taking values 0 and 1 with respective probabilities 1−p and p. Now, for large n, X (p)− ξ p is approximately normally distributed with zero mean and variance (see also Kendall and Stuart 1977; Wilcox 1997):
Both the estimators (Eqs. 33 and 34) are parametric, and they will be used later for dependent data. Here we are interested in non-parametric and resistant estimators for the quantiles and their standard errors. Let x 1:n ≤ x 2:n ... ≤ x n:n be the sorted sample of the time series. Similarly, let X 1:n ≤ X 2:n ... ≤ X n:n be the order statistics of the sequence X 1,X 2,...,X n . These order statistics can be used to construct a resistant, robust, and non-parametric estimator X (p) for the pth quantile, namely:
where p 0=[np+0.5], i.e. the integer part of np+0.5 (Kendall and Stuart 1973; David 1981.) Now at a given significance level α, the probability that X r:n ≤ X (p) ≤ X s:n, can be computed, and is given by (see e.g. Kendall and Stuart 1973):
where \(J_p (a,b) = \frac{1}{B(a,b)} \int_{0}^{p} t^{a} (1 - t)^{b-1} {\rm d}t\) is the cdf of the beta distribution with parameters a and b.
For a given n, α, and 0<p<1, Eq. 36 is solved by choosing the interval [r, s] to be centered at the rank p 0 corresponding to \(X_{(p)} = X_{p_{0}:n}.\) Hence, taking p 0=[np+0.5], we then choose s=2p 0 − r. The parameter r is then obtained by solving the nonlinear equation:
Note that in Eq. 37 the arguments n−x+1, 2p 0 − x, and n−2p 0+x +1 should all be non-negative and that 1≤ x ≤ n. This readily yields:
Equation 37 is solved using the root finder fzero in MATLAB within the interval provided by Eq. 38.
An approximation of the standard error σ p of the estimator X (p) of ξ p is obtained using:
where Φ is the cdf of the standard normal distribution, and E( ) stands for the expectation operator. The estimate of the standard error from the observed sample is obtained from Eq. 39 by substituting X s:n and X r:n for x s:n and x r:n respectively. In Table 1 we show values of r for α=5%, sample sizes ranging from 100 to 2,000, and various values of probability p.
In a similar manner it is possible to compute the standard errors of other quantities such as the quantile difference D p,q (see Eq. 19.) In this case, it can be seen from Eq. 33 that \(f(\xi_{p}) f(\xi_{q}) {\rm cov} \left( X_{(p)}, X_{(q)} \right) = \frac{1}{n^{2}} \sum_{k,l} {\rm cov} \left(I_{k} (\xi_{p}), I_{l} (\xi_{q}) \right),\) and since cov (I k (ξ p ), I l (ξ q ) )=p(1−q) δ kl , where δ kl is the Kronecker delta, one gets:
which reproduces (Eq. 34) when p=q. The nonparametric expression of σ 2p,q =E(D p,q)2 (see Eq. 20) is then obtained from σ 2 p , σ 2 q , and Eq. 40 after eliminating f(ξ p ) and f(ξ q ) using Eq. 34.
1.1.2 Dependent data
Interest in computing quantiles from data has, since the work of Bahadur (1966), attracted many researchers. Extensions of Eq. 33 to sequences of random variables (or stochastic processes) with dependency structures have been performed to include a broad class of linear processes
with iid ɛ k satisfying certain regularity properties. Sen (1968, 1972) extended Bahadur’s result to m-dependent and strongly mixing processes. Hesse (1990) extended the result to a broad class of linear short range dependence processes, whereas Ho and Hsing (1996) extended it to include long-range dependence. Wu (2005) generalised and refined those results for linear and some nonlinear processes.
In summary, Eq. 33 is found to apply to a large class of linear and nonlinear time series including short- and long-range dependence. Note that for the latter case a correction term
is included in Eq. 33 (Wu 2005).
From Eq. 33 the variance σ 2 p of X (p) now involves \({\rm var} \left[\frac{1}{n} \sum_{k=1}^{n} I_{k} (\xi_{p}) \right],\) precisely:
Later we give the expression of σ 2 p when the process is Gaussian. Using the stationarity of the sequence I k (x), k=1,2,..., n, yields, for large n, the approximation:
Heidelberg and Lewis (1984) interpret the right hand side of Eq. 43 as the value at zero frequency of the spectrum:
of the binary process I k (x), k=1,2, ..., for − (1/2) ≤ ω ≤ (1/2). The variance σ 2 p is then given by:
The ratio R 2 p between variances of quantile estimates is then obtained as the ratio between Eqs. 45 and 34. Here, as in Heidelberg and Lewis (1984), the log-periodogram:
estimated from the binary sequence, is used to compute h(0, ξ p ) using the intercept of a least square regression of the log-periodogram (Eq. 46). Note that the log-periodogram provides an estimator of the logarithm of the spectrum.
Now suppose that the sample is drawn from a zero-mean Gaussian process with covariance and correlation functions γ k and ρ k , k=0,1,...,n−1. In this case the joint distribution of any subset (X 1, ..., X p ) is completely specified by the covariance matrix \(\mbox{\boldmath$\Sigma$}_{p} = \left(\gamma_{i-j} \right)\) of the random vector X=(X 1, ..., X p )T. It can be verified that:
which involves the joint pdf
of X k and X l , when k ≠ l, which is a bivariate normal with zero mean and covariance matrix
Note that for k=l one gets var(I k (ξ p ))=p − p 2. Now using Eqs. 42 and 47, and letting
the variance of the pth quantile estimates becomes:
Rights and permissions
About this article
Cite this article
Hannachi, A. Quantifying changes and their uncertainties in probability distribution of climate variables using robust statistics. Clim Dyn 27, 301–317 (2006). https://doi.org/10.1007/s00382-006-0132-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00382-006-0132-x