Abstract
Evaluation and comparison of efficiencies of widely used objective homogenisation methods (OHOMs) are presented relying on some test-datasets and efficiency measures. Problems related to the choice of efficiency measure, creation of appropriate test-datasets and use of OHOM parameterisation are discussed. The detection parts of the OHOMs are examined only. Power of detection, false alarm rate, detection skill and skill of linear trend estimation are calculated and compared for eight OHOMs and six test-datasets. Each test-dataset comprises 10,000 100 year-long artificially simulated time series. In the simplest test dataset, each time series contains one inhomogeneity (IH), while a structure of inhomogeneities that is similar to that of real central European temperature time series is included in the most complex simulated dataset. Distinct attention is given to OHOMs that contain (1) cutting algorithm, (2) semihierarchic algorithm, (3) direct detection of multiple IHs, (4) detection of change-point and trend-line shaped IHs. Results show that Caussinus–Mestre method and Multiple Analysis of Series for Homogenization are the most powerful tools in detecting and correcting IHs in climatic time series.
Similar content being viewed by others
References
Aguilar E, Auer I, Brunet M, Peterson TC, Wieringa J (2003) WMO Guidelines on climatemetadata and homogenization. WMO, Geneva, WCDMP-No. 53, WMO-TD No 1186
Alexandersson H (1986) A homogeneity test applied to precipitation data. J Climatol 6:661–675
Alexandersson H, Moberg A (1997) Homogenization of Swedish temperature data. Part I: homogeneity test for linear trends. Int J Climatol 17:25–34
Auer I et al (2005) A new instrumental precipitation dataset for the greater Alpine region for the period 1800-2002. Int J Climatol 25:139–166. doi:10.1002/joc.1135
Beaulieu C, Seidou O, Ouarda TBMJ, Zhang X, Boulet G, Yagouti A (2008) Intercomparison of homogenization techniques for precipitation data. Water Resour Res 44:W02425. doi:10.1029/2006WR005615
Brunet M, Saladié O, Jones P, Sigró J, Aguilar E, Moberg A, Lister D, Walther A, Lopez D, Almarza C (2006) The development of a new dataset of Spanish daily adjusted temperature series (SDATS) (1850-2003). Int J Climatol 26:1777–1802. doi:10.1002/joc.1338
Buishand TA (1982) Some methods for testing the homogeneity of rainfall records. J Hydrol 58:11–27
Caussinus H, Lyazrhi F (1997) Choosing a linear model with a random number of change-points and outliers. Ann Inst Stat Math 49(4):761–775
Caussinus H, Mestre O (2004) Detection and correction of artificial shifts in climate series. J Roy Stat Soc Series C 53:405–425
DeGaetano AT (2006) Attributes of several methods for detecting discontinuities in mean temperature series. J Climate 19:838–853. doi:10.1175/JCLI3662.1
Domonkos P (2006a) Testing of homogenisation methods: purposes, tools and problems of implementation. In: Szalai S (ed) Proceedings of the fifth seminar for homogenization and quality control in climatological databases. Hungarian Meteorological Service, Budapest, pp 126–145
Domonkos P (2006b) Application of objective homogenization methods: inhomogeneities in time series of temperature and precipitation. Időjárás 110:63–87
Domonkos P, Štěpánek P (2009) Statistical characteristics of detectable inhomogeneities in observed meteorological time series. Studia Geoph et Geod 53:239–260. doi:10.007/s11200-009-0015-9
Drogue G, Mestre O, Hoffmann L, Iffly J-F, Pfister L (2005) Recent warming in a small region with semi-oceanic climate, 1949-1998: what is the ground truth? Theor Appl Climatol 81:1–10. doi:10.1007/s00704-004-0088-x
Ducré-Robitaille J-F, Vincent LA, Boulet G (2003) Comparison of techniques for detection of discontinuities in temperature series. Int J Climatol 23:1087–1101. doi:10.1002/joc.924
Easterling DR, Peterson TC (1995) A new method for detecting undocumented discontinuities in climatological time series. Int J Climatol 15:369–377
Gérard-Marchant PGF, Stooksbury DE, Seymour L (2008) Methods for starting the detection of undocumented multiple changepoints. J Climate 21:4887–4899. doi:10.1175/2008JCLI1956.1
Hawkins DM (1972) On the choice of segments in piecewise approximation. J Inst Math Appl 9:250–256
Lanzante JR (1996) Resistant, robust and non-parametric techniques for the analysis of climate data: theory and examples, including applications to historical radiosonde station data. Int J Climatol 16:1197–1226
Menne MJ, Williams CN Jr (2005) Detection of undocumented changepoints using multiple test statistics and composite reference series. J Climate 18:4271–4286. doi:10.1175/JCLI3524.1
Menne MJ, Williams CN Jr (2009) Homogenization of temperature series via pairwise comparisons. J Climate 22:1700–1717. doi:10.1175/2008JCLI2263.1
Mestre O, Domonkos P, Lebarbier E, Picard F, Robin S (2008) Comparison of change-point detection methods in the mean of Gaussian processes. In: Sixth seminar for homogenization and quality control in climatological databases (in print)
Moberg A, Alexandersson H (1997) Homogenization of Swedish temperature data. Part II: homogenized gridded air temperature compared with a subset of global gridded air temperature since 1861. Int J Climatol 17:35–54
Peterson TC et al (1998) Homogeneity adjustments of in situ atmospheric climate data: a review. Int J Climatol 18:1493–1517
Sneyers R (1997) Climate chaotic instability. Statistical determination – theoretical backgrounds. Environmetrics 8:517–532
Syrakova M (2003) Homogeneity analysis of climatological time series – experiments and problems. Időjárás 107:31–48
Szentimrey T (1999) Multiple Analysis of Series for Homogenization (MASH). In: Szalai S, Szentimrey T, Szinell CS (ed) Proceedings of the second seminar for homogenization of surface climatological data. World Meteorological Organization, WCDMP-41, WMO-TD 932: 27-46
Titchner HA, Thorne PW, McCarthy MP, Tett SFB, Haimberger L, Parker DE (2009) Critically reassessing tropospheric temperature trends from radiosondes using realistic validation experiments. J Climate 22:465–485. doi:10.1175/2008JCLI2419.1
Vincent LA (1998) A technique for the identification of inhomogeneities in Canadian temperature series. J Climate 11:1094–1104
Wang XL, Wen QH, Wu Y (2007) Penalized maximal t test for detecting undocumented mean change in climate data series. J Appl Meteor Climatol 46/6: 916-931. doi:10.1175/JAM2504.1
Acknowledgements
The research was partially funded by the COST ES0601 project. The author thanks Matthew Menne and an anonymous reviewer for their useful comments.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix I
Simulation of the standard test-dataset
-
1.
196-year-long series are generated, and always, the slices of years 48–147 are the target series.
-
2.
IHs and noises are introduced in each year (but their values can be 0, naturally).
-
3.
Types of the terms for introduction to time series: (a) long-term IH (y), (b) short-term IH (z) and (c) white noise (w). A certain part of y- and z-type terms is handled as noise (cf. step 10).
-
4.
Forms of the IHs: (a) sudden shift, (b) gradual change, (c) platform-like change, (d) bias for one specific year. Form (d) is a specific case of class (c).
-
5.
Introduction of long-term IHs.
-
5.1:
Size and direction of the IH
This term includes an IH whose magnitude can be large, with the probability given in K 1, as well as a small IH with the probability given in K 2:
$$ \Delta {y\prime_i} = {K_1}\left( {{q_1}} \right) \cdot {\hbox{sign}}\left( {0.5 - {q_2}} \right) \cdot \left( {8 + 4p} \right) \cdot q_3^{{6 + 4p}}) + {K_2}\left( {{q_4}} \right) \cdot {G_1}, $$(A1)where K 1(a) = 1, if a < 0.012, and K 1(a) = 0 otherwise; K 2(a) = 1, if a < 0.07, and K 2(a) = 0 otherwise; q (with all indices): variable of the uniform distribution over the period [0,1) p has the same distribution as q does, but p is constant for a given time series. Δ denotes that (A1) is not for substituting, but for modifying the earlier value of y i. Apostrophe above y shows that values gained by (A1) are modified in certain cases (see below) before the introduction of Δy i . If Δy i′ = 0 the steps 5.2 and 5.3 are omitted.
-
5.2:
Form of the IH
The form of Δy i’ is (A) sudden shift, (B) gradual change or (C) platform-like change, with 0.4, 0.25 and 0.35 probability, respectively.
For (A)- and (B)-form IHs a negative autocorrelation is present:
$$ \Delta {y_i} = \sqrt {{1 - {r^2}}} \cdot \Delta {y\prime_i} + r \cdot F, $$(A2)where \( F = 0 \) for the first (A)- or (B)-form IH of the series, and F = Δy k otherwise, k indicates the year of the previous introduction of (A)- or (B)-form IH, and r = –0.5.
For (C)-form IHs:
$$ \Delta {y_i} = \Delta {y\prime_i} $$(A3) -
5.3:
Calculation of the y i components of the series
(A)-form IHs:
$$ {y_j} = {y_{{j, - 1}}} + \Delta {y_i}\;\;\;\;\;{\hbox{for}}\;{\hbox{each}}\;j \in \left[ {i,n} \right], $$(A4)where y j,-1 denotes the value of term y j before the ongoing modification.
For (B)- and (C)-form IHs, duration-values must be paired at first. For B-form, IHs the duration D 1 is:
$$ {D_1} = 5 + 2 \cdot Int\left( {48 \cdot q_5^{{1.5}}} \right) $$(A5)(“Int” denotes integer part), and the appearance of the IH is:
$$ {y_j} = {y_{{j, - 1}}} + \frac{{\left( {j - i + 0.5{D_1}} \right)\Delta {y_i}}}{{{D_1}}}\;{\hbox{for}}\;{\hbox{each}}\;j \in \left[ {i - 0.5{D_1},i + 0.5{D_1} - 1} \right], $$(A6)while for (C)-form IHs:
$$ {D_2} = Int\left( {30 \cdot q_6^{{1.5}}} \right), $$(A7)$$ {y_j} = {y_{{j, - 1}}} + \Delta {y_i}\;\;{\hbox{for}}\;{\hbox{each}}\;j \in \left[ {i,i + {D_2}} \right]. $$(A8)
-
5.1:
-
6.
Introduction of short-term IHs
The size and the direction of this term is calculated by the same functions as those of long-term IHs (A1), but the frequencies (determined by the K-functions) are different:
$$ \Delta {z\prime_i} = {K_3}\left( {{q_7}} \right) \cdot sign\left( {0.5 - {q_8}} \right) \cdot \left( {8 + 4p} \right) \cdot q_9^{{6 + 4p}}) + {K_4}\left( {{q_{{10}}}} \right) \cdot {G_2}, $$(A9)where K 3(a) = 1, if \( a < 0.04 - 0.03p \), and K 3(a) = 0 otherwise; K 4(a) = 1, if \( a < 0.5 - 0.4p \), and K 4(a) = 0 otherwise. The ongoing modification has a negative autocorrelation (r = –0.5) with the z value accumulated prior.
$$ \Delta {z_i} = \sqrt {{1 - {r^2}}} \cdot \Delta {z\prime_i} + r \cdot {z_{{i, - 1}}} $$(A10)The form of this term is always platform-like change. Its duration is given by D 3.
$$ {D_3} = Int\left( {\frac{{12 \cdot q_{{11}}^3}}{{1 + 0.3\left| {\Delta {z_i}} \right|}}} \right), $$(A11)$$ {z_j} = {z_{{j, - 1}}} + \Delta {z_i}\;\;{\hbox{for}}\;{\hbox{each}}\;j \in \left[ {i,i + {D_3}} \right]. $$(A12) -
7.
Introduction of white noise term:
$$ {w_i} = {G_3} $$(A13) -
8.
$$ {\mathbf{X}} = {\mathbf{Y}} + {\mathbf{Z}} + {\mathbf{W}} $$(A14)
-
9.
Serial correlation of X is calculated, and the series is added to the test-dataset if the value is not lower than 0.4, while it is discarded otherwise.
-
10.
A part of long-term IHs (Y) and short-term IHs (Z) is not considered to be errors of the candidate series, so it is handled as noise. The rate of this type noise increases with decreasing IH magnitudes, and it is higher for platform-like changes than for change-points and gradual changes. As a consequence of these noise terms, the model series of the standard dataset is
$$ {\mathbf{X}} = {\mathbf{H}} + {\mathbf{W}} + {\mathbf{W}}* $$(A15)where
$$ {\mathbf{W}}* = {\mathbf{Y}}_{{\mathbf{w}}} + {\mathbf{Z}}_{{\mathbf{w}}} $$(A16)$$ {\mathbf{H}} = {\mathbf{Y}}--{{\mathbf{Y}}_{{\mathbf{w}}}} + {\mathbf{Z}}--{{\mathbf{Z}}_{{\mathbf{w}}}} $$(A17)The index w denotes noise part. The probability (P) of that a given term is considered to be noise, is determined according to the rules below:
For platform-like IHs, the probability P 1 is given by:
$$ {P_1} = \max \left( {0.6 - 0.4 \cdot \left| {\Delta {y_i}} \right|,0} \right), $$(A18)where \( \Delta {y_i} \) is determined by Formulae (A1) and (A3). (A18) is applied also for Δz-type IHs.
For change-points and gradual changes
$$ {P_2} = \max \left( {0.3 - 0.4 \cdot \left| {\Delta {y_i}} \right|,0} \right). $$(A19)
where Δy i is determined by Formulae (A1) and (A2).
Appendix II
Simulation of the quasi-standard test-dataset
The procedure is the same as for the standard dataset, except that K 1 is always equal to 0 in formula (A1). As a result of this change, the frequency of persistent large IHs is much lower in this dataset than in the standard dataset.
Rights and permissions
About this article
Cite this article
Domonkos, P. Efficiency evaluation for detecting inhomogeneities by objective homogenisation methods. Theor Appl Climatol 105, 455–467 (2011). https://doi.org/10.1007/s00704-011-0399-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00704-011-0399-7