Skip to main content
Log in

Efficiency evaluation for detecting inhomogeneities by objective homogenisation methods

  • Original Paper
  • Published:
Theoretical and Applied Climatology Aims and scope Submit manuscript

Abstract

Evaluation and comparison of efficiencies of widely used objective homogenisation methods (OHOMs) are presented relying on some test-datasets and efficiency measures. Problems related to the choice of efficiency measure, creation of appropriate test-datasets and use of OHOM parameterisation are discussed. The detection parts of the OHOMs are examined only. Power of detection, false alarm rate, detection skill and skill of linear trend estimation are calculated and compared for eight OHOMs and six test-datasets. Each test-dataset comprises 10,000 100 year-long artificially simulated time series. In the simplest test dataset, each time series contains one inhomogeneity (IH), while a structure of inhomogeneities that is similar to that of real central European temperature time series is included in the most complex simulated dataset. Distinct attention is given to OHOMs that contain (1) cutting algorithm, (2) semihierarchic algorithm, (3) direct detection of multiple IHs, (4) detection of change-point and trend-line shaped IHs. Results show that Caussinus–Mestre method and Multiple Analysis of Series for Homogenization are the most powerful tools in detecting and correcting IHs in climatic time series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Aguilar E, Auer I, Brunet M, Peterson TC, Wieringa J (2003) WMO Guidelines on climatemetadata and homogenization. WMO, Geneva, WCDMP-No. 53, WMO-TD No 1186

    Google Scholar 

  • Alexandersson H (1986) A homogeneity test applied to precipitation data. J Climatol 6:661–675

    Article  Google Scholar 

  • Alexandersson H, Moberg A (1997) Homogenization of Swedish temperature data. Part I: homogeneity test for linear trends. Int J Climatol 17:25–34

    Article  Google Scholar 

  • Auer I et al (2005) A new instrumental precipitation dataset for the greater Alpine region for the period 1800-2002. Int J Climatol 25:139–166. doi:10.1002/joc.1135

    Article  Google Scholar 

  • Beaulieu C, Seidou O, Ouarda TBMJ, Zhang X, Boulet G, Yagouti A (2008) Intercomparison of homogenization techniques for precipitation data. Water Resour Res 44:W02425. doi:10.1029/2006WR005615

    Article  Google Scholar 

  • Brunet M, Saladié O, Jones P, Sigró J, Aguilar E, Moberg A, Lister D, Walther A, Lopez D, Almarza C (2006) The development of a new dataset of Spanish daily adjusted temperature series (SDATS) (1850-2003). Int J Climatol 26:1777–1802. doi:10.1002/joc.1338

    Article  Google Scholar 

  • Buishand TA (1982) Some methods for testing the homogeneity of rainfall records. J Hydrol 58:11–27

    Article  Google Scholar 

  • Caussinus H, Lyazrhi F (1997) Choosing a linear model with a random number of change-points and outliers. Ann Inst Stat Math 49(4):761–775

    Article  MathSciNet  MATH  Google Scholar 

  • Caussinus H, Mestre O (2004) Detection and correction of artificial shifts in climate series. J Roy Stat Soc Series C 53:405–425

    Article  MathSciNet  MATH  Google Scholar 

  • DeGaetano AT (2006) Attributes of several methods for detecting discontinuities in mean temperature series. J Climate 19:838–853. doi:10.1175/JCLI3662.1

    Article  ADS  Google Scholar 

  • Domonkos P (2006a) Testing of homogenisation methods: purposes, tools and problems of implementation. In: Szalai S (ed) Proceedings of the fifth seminar for homogenization and quality control in climatological databases. Hungarian Meteorological Service, Budapest, pp 126–145

    Google Scholar 

  • Domonkos P (2006b) Application of objective homogenization methods: inhomogeneities in time series of temperature and precipitation. Időjárás 110:63–87

    Google Scholar 

  • Domonkos P, Štěpánek P (2009) Statistical characteristics of detectable inhomogeneities in observed meteorological time series. Studia Geoph et Geod 53:239–260. doi:10.007/s11200-009-0015-9

    Article  ADS  Google Scholar 

  • Drogue G, Mestre O, Hoffmann L, Iffly J-F, Pfister L (2005) Recent warming in a small region with semi-oceanic climate, 1949-1998: what is the ground truth? Theor Appl Climatol 81:1–10. doi:10.1007/s00704-004-0088-x

    Article  ADS  Google Scholar 

  • Ducré-Robitaille J-F, Vincent LA, Boulet G (2003) Comparison of techniques for detection of discontinuities in temperature series. Int J Climatol 23:1087–1101. doi:10.1002/joc.924

    Article  Google Scholar 

  • Easterling DR, Peterson TC (1995) A new method for detecting undocumented discontinuities in climatological time series. Int J Climatol 15:369–377

    Article  Google Scholar 

  • Gérard-Marchant PGF, Stooksbury DE, Seymour L (2008) Methods for starting the detection of undocumented multiple changepoints. J Climate 21:4887–4899. doi:10.1175/2008JCLI1956.1

    Article  ADS  Google Scholar 

  • Hawkins DM (1972) On the choice of segments in piecewise approximation. J Inst Math Appl 9:250–256

    Article  MathSciNet  MATH  Google Scholar 

  • Lanzante JR (1996) Resistant, robust and non-parametric techniques for the analysis of climate data: theory and examples, including applications to historical radiosonde station data. Int J Climatol 16:1197–1226

    Article  Google Scholar 

  • Menne MJ, Williams CN Jr (2005) Detection of undocumented changepoints using multiple test statistics and composite reference series. J Climate 18:4271–4286. doi:10.1175/JCLI3524.1

    Article  ADS  Google Scholar 

  • Menne MJ, Williams CN Jr (2009) Homogenization of temperature series via pairwise comparisons. J Climate 22:1700–1717. doi:10.1175/2008JCLI2263.1

    Article  ADS  Google Scholar 

  • Mestre O, Domonkos P, Lebarbier E, Picard F, Robin S (2008) Comparison of change-point detection methods in the mean of Gaussian processes. In: Sixth seminar for homogenization and quality control in climatological databases (in print)

  • Moberg A, Alexandersson H (1997) Homogenization of Swedish temperature data. Part II: homogenized gridded air temperature compared with a subset of global gridded air temperature since 1861. Int J Climatol 17:35–54

    Article  Google Scholar 

  • Peterson TC et al (1998) Homogeneity adjustments of in situ atmospheric climate data: a review. Int J Climatol 18:1493–1517

    Article  Google Scholar 

  • Sneyers R (1997) Climate chaotic instability. Statistical determination – theoretical backgrounds. Environmetrics 8:517–532

    Article  Google Scholar 

  • Syrakova M (2003) Homogeneity analysis of climatological time series – experiments and problems. Időjárás 107:31–48

    Google Scholar 

  • Szentimrey T (1999) Multiple Analysis of Series for Homogenization (MASH). In: Szalai S, Szentimrey T, Szinell CS (ed) Proceedings of the second seminar for homogenization of surface climatological data. World Meteorological Organization, WCDMP-41, WMO-TD 932: 27-46

  • Titchner HA, Thorne PW, McCarthy MP, Tett SFB, Haimberger L, Parker DE (2009) Critically reassessing tropospheric temperature trends from radiosondes using realistic validation experiments. J Climate 22:465–485. doi:10.1175/2008JCLI2419.1

    Article  ADS  Google Scholar 

  • Vincent LA (1998) A technique for the identification of inhomogeneities in Canadian temperature series. J Climate 11:1094–1104

    Article  ADS  Google Scholar 

  • Wang XL, Wen QH, Wu Y (2007) Penalized maximal t test for detecting undocumented mean change in climate data series. J Appl Meteor Climatol 46/6: 916-931. doi:10.1175/JAM2504.1

Download references

Acknowledgements

The research was partially funded by the COST ES0601 project. The author thanks Matthew Menne and an anonymous reviewer for their useful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Domonkos.

Appendices

Appendix I

Simulation of the standard test-dataset

  1. 1.

    196-year-long series are generated, and always, the slices of years 48–147 are the target series.

  2. 2.

    IHs and noises are introduced in each year (but their values can be 0, naturally).

  3. 3.

    Types of the terms for introduction to time series: (a) long-term IH (y), (b) short-term IH (z) and (c) white noise (w). A certain part of y- and z-type terms is handled as noise (cf. step 10).

  4. 4.

    Forms of the IHs: (a) sudden shift, (b) gradual change, (c) platform-like change, (d) bias for one specific year. Form (d) is a specific case of class (c).

  5. 5.

    Introduction of long-term IHs.

    1. 5.1:

      Size and direction of the IH

      This term includes an IH whose magnitude can be large, with the probability given in K 1, as well as a small IH with the probability given in K 2:

      $$ \Delta {y\prime_i} = {K_1}\left( {{q_1}} \right) \cdot {\hbox{sign}}\left( {0.5 - {q_2}} \right) \cdot \left( {8 + 4p} \right) \cdot q_3^{{6 + 4p}}) + {K_2}\left( {{q_4}} \right) \cdot {G_1}, $$
      (A1)

      where K 1(a) = 1, if a < 0.012, and K 1(a) = 0 otherwise; K 2(a) = 1, if a < 0.07, and K 2(a) = 0 otherwise; q (with all indices): variable of the uniform distribution over the period [0,1) p has the same distribution as q does, but p is constant for a given time series. Δ denotes that (A1) is not for substituting, but for modifying the earlier value of y i. Apostrophe above y shows that values gained by (A1) are modified in certain cases (see below) before the introduction of Δy i . If Δy i′ = 0 the steps 5.2 and 5.3 are omitted.

    2. 5.2:

      Form of the IH

      The form of Δy i’ is (A) sudden shift, (B) gradual change or (C) platform-like change, with 0.4, 0.25 and 0.35 probability, respectively.

      For (A)- and (B)-form IHs a negative autocorrelation is present:

      $$ \Delta {y_i} = \sqrt {{1 - {r^2}}} \cdot \Delta {y\prime_i} + r \cdot F, $$
      (A2)

      where \( F = 0 \) for the first (A)- or (B)-form IH of the series, and F = Δy k otherwise, k indicates the year of the previous introduction of (A)- or (B)-form IH, and r = –0.5.

      For (C)-form IHs:

      $$ \Delta {y_i} = \Delta {y\prime_i} $$
      (A3)
    3. 5.3:

      Calculation of the y i components of the series

      (A)-form IHs:

      $$ {y_j} = {y_{{j, - 1}}} + \Delta {y_i}\;\;\;\;\;{\hbox{for}}\;{\hbox{each}}\;j \in \left[ {i,n} \right], $$
      (A4)

      where y j,-1 denotes the value of term y j before the ongoing modification.

      For (B)- and (C)-form IHs, duration-values must be paired at first. For B-form, IHs the duration D 1 is:

      $$ {D_1} = 5 + 2 \cdot Int\left( {48 \cdot q_5^{{1.5}}} \right) $$
      (A5)

      (“Int” denotes integer part), and the appearance of the IH is:

      $$ {y_j} = {y_{{j, - 1}}} + \frac{{\left( {j - i + 0.5{D_1}} \right)\Delta {y_i}}}{{{D_1}}}\;{\hbox{for}}\;{\hbox{each}}\;j \in \left[ {i - 0.5{D_1},i + 0.5{D_1} - 1} \right], $$
      (A6)

      while for (C)-form IHs:

      $$ {D_2} = Int\left( {30 \cdot q_6^{{1.5}}} \right), $$
      (A7)
      $$ {y_j} = {y_{{j, - 1}}} + \Delta {y_i}\;\;{\hbox{for}}\;{\hbox{each}}\;j \in \left[ {i,i + {D_2}} \right]. $$
      (A8)
  6. 6.

    Introduction of short-term IHs

    The size and the direction of this term is calculated by the same functions as those of long-term IHs (A1), but the frequencies (determined by the K-functions) are different:

    $$ \Delta {z\prime_i} = {K_3}\left( {{q_7}} \right) \cdot sign\left( {0.5 - {q_8}} \right) \cdot \left( {8 + 4p} \right) \cdot q_9^{{6 + 4p}}) + {K_4}\left( {{q_{{10}}}} \right) \cdot {G_2}, $$
    (A9)

    where K 3(a) = 1, if \( a < 0.04 - 0.03p \), and K 3(a) = 0 otherwise; K 4(a) = 1, if \( a < 0.5 - 0.4p \), and K 4(a) = 0 otherwise. The ongoing modification has a negative autocorrelation (r = –0.5) with the z value accumulated prior.

    $$ \Delta {z_i} = \sqrt {{1 - {r^2}}} \cdot \Delta {z\prime_i} + r \cdot {z_{{i, - 1}}} $$
    (A10)

    The form of this term is always platform-like change. Its duration is given by D 3.

    $$ {D_3} = Int\left( {\frac{{12 \cdot q_{{11}}^3}}{{1 + 0.3\left| {\Delta {z_i}} \right|}}} \right), $$
    (A11)
    $$ {z_j} = {z_{{j, - 1}}} + \Delta {z_i}\;\;{\hbox{for}}\;{\hbox{each}}\;j \in \left[ {i,i + {D_3}} \right]. $$
    (A12)
  7. 7.

    Introduction of white noise term:

    $$ {w_i} = {G_3} $$
    (A13)
  8. 8.
    $$ {\mathbf{X}} = {\mathbf{Y}} + {\mathbf{Z}} + {\mathbf{W}} $$
    (A14)
  9. 9.

    Serial correlation of X is calculated, and the series is added to the test-dataset if the value is not lower than 0.4, while it is discarded otherwise.

  10. 10.

    A part of long-term IHs (Y) and short-term IHs (Z) is not considered to be errors of the candidate series, so it is handled as noise. The rate of this type noise increases with decreasing IH magnitudes, and it is higher for platform-like changes than for change-points and gradual changes. As a consequence of these noise terms, the model series of the standard dataset is

    $$ {\mathbf{X}} = {\mathbf{H}} + {\mathbf{W}} + {\mathbf{W}}* $$
    (A15)

    where

    $$ {\mathbf{W}}* = {\mathbf{Y}}_{{\mathbf{w}}} + {\mathbf{Z}}_{{\mathbf{w}}} $$
    (A16)
    $$ {\mathbf{H}} = {\mathbf{Y}}--{{\mathbf{Y}}_{{\mathbf{w}}}} + {\mathbf{Z}}--{{\mathbf{Z}}_{{\mathbf{w}}}} $$
    (A17)

    The index w denotes noise part. The probability (P) of that a given term is considered to be noise, is determined according to the rules below:

    For platform-like IHs, the probability P 1 is given by:

    $$ {P_1} = \max \left( {0.6 - 0.4 \cdot \left| {\Delta {y_i}} \right|,0} \right), $$
    (A18)

    where \( \Delta {y_i} \) is determined by Formulae (A1) and (A3). (A18) is applied also for Δz-type IHs.

    For change-points and gradual changes

    $$ {P_2} = \max \left( {0.3 - 0.4 \cdot \left| {\Delta {y_i}} \right|,0} \right). $$
    (A19)

where Δy i is determined by Formulae (A1) and (A2).

Appendix II

Simulation of the quasi-standard test-dataset

The procedure is the same as for the standard dataset, except that K 1 is always equal to 0 in formula (A1). As a result of this change, the frequency of persistent large IHs is much lower in this dataset than in the standard dataset.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domonkos, P. Efficiency evaluation for detecting inhomogeneities by objective homogenisation methods. Theor Appl Climatol 105, 455–467 (2011). https://doi.org/10.1007/s00704-011-0399-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00704-011-0399-7

Keywords

Navigation