Skip to main content
Log in

Impact of missing data on the efficiency of homogenisation: experiments with ACMANTv3

  • Original Paper
  • Published:
Theoretical and Applied Climatology Aims and scope Submit manuscript


The impact of missing data on the efficiency of homogenisation with ACMANTv3 is examined with simulated monthly surface air temperature test datasets. The homogeneous database is derived from an earlier benchmarking of daily temperature data in the USA, and then outliers and inhomogeneities (IHs) are randomly inserted into the time series. Three inhomogeneous datasets are generated and used, one with relatively few and small IHs, another one with IHs of medium frequency and size, and a third one with large and frequent IHs. All of the inserted IHs are changes to the means. Most of the IHs are single sudden shifts or pair of shifts resulting in platform-shaped biases. Each test dataset consists of 158 time series of 100 years length, and their mean spatial correlation is 0.68–0.88. For examining the impacts of missing data, seven experiments are performed, in which 18 series are left complete, while variable quantities (10–70%) of the data of the other 140 series are removed.

The results show that data gaps have a greater impact on the monthly root mean squared error (RMSE) than the annual RMSE and trend bias. When data with a large ratio of gaps is homogenised, the reduction of the upper 5% of the monthly RMSE is the least successful, but even there, the efficiency remains positive. In terms of reducing the annual RMSE and trend bias, the efficiency is 54–91%. The inclusion of short and incomplete series with sufficient spatial correlation in all cases improves the efficiency of homogenisation with ACMANTv3.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others


  • Acquaotta F, Fratianni S (2014) The importance of the quality and reliability of the historical time series for the study of climate change. Rev Bras Climatol 10:20–38

    Google Scholar 

  • Aguilar E, Auer I, Brunet M, Peterson TC, Wieringa J (2003) Guidelines on climate metadata and homogenization. World Meteorological Organization (WMO)-TD No. 1186, WCDMP No. 53, Geneva, Switzerland, 55 pp

  • Auer I, Böhm R, Jurkovic A, Orlik A, Potzmann R, Schöner W, Ungersböck M, Brunetti M, Nanni T, Maugeri M, Briffa K, Jones P, Efthymiadis D, Mestre O, Moisselin J-M, Begert M, Brazdil R, Bochnicek O, Cegnar T, Gajic-Capka M, Zaninovic K, Majstorovicp Z, Szalai S, Szentimrey T, Mercalli L (2005) A new instrumental precipitation dataset for the Greater Alpine Region for the period 1800–2002. Int J Climatol 25:139–166.

    Article  Google Scholar 

  • Auer I, Böhm R, Jurkovic A, Lipa W, Orlik A, Potzmann R, Schöner W, Ungersboeck M, Matulla C, Briffa K, Jones P, Efthymiadis D, Brunetti M, Nanni T, Maugeri M, Mercalli L, Mestre O, Moisseline JM, Begert M, Muller-Westermeier G, Kveton V, Bochnicek O, Stastny P, Lapin M, Szalai S, Szentimrey T, Cegnar T, Dolinar M, Gajic-Capka M, Zaninovic K, Majstorovic Z, Nieplova E (2007) HISTALP—historical instrumental climatological surface time series of the Greater Alpine Region HISTALP. Int J Climatol 27:17–46.

    Article  Google Scholar 

  • Borges P, Franke J, Tanaka M, Weiss H, Bernhofer C (2013) Spatial interpolation of climatological information: comparison of methods for the development of precipitation distribution in Distrito Federal, Brazil. Atmos Clim Sci 3(2):208–217.

    Google Scholar 

  • Borges P, Franke J, Santos Silva FD, Weiss H, Bernhofer C (2014) Differences between two climatological periods (2001–2010 vs. 1971–2000) and trend analysis of temperature and precipitation in Central Brazil. Theor Appl Climatol 116:191–202.

    Article  Google Scholar 

  • Brunet M, Saladié O, Jones P, Sigró J, Aguilar E, Moberg A, Lister D, Walther A, Lopez D, Almarza C (2006) The development of a new dataset of Spanish daily adjusted temperature series (SDATS) (1850–2003). Int J Climatol 26:1777–1802.

    Article  Google Scholar 

  • Coll J, Curley M, Walsh S, and Sweeney J (2018) HOMERUN: relative homogenisation of the Irish precipitation network. EPA Research Report 2012-CCRP-FS.11 Report No. 242. Environmental Protection Agency, Wexford, pp32

  • Costa AC, Soares A (2009) Homogenization of climate data: review and new perspectives using geostatistics. Math Geosci 41(3):291–305

    Article  Google Scholar 

  • Domonkos P (2011a) Efficiency evaluation for detecting inhomogeneities by objective homogenisation methods. Theor Appl Climatol 105:455–467.

    Article  Google Scholar 

  • Domonkos P (2011b) Adapted Caussinus-Mestre algorithm for networks of temperature series (ACMANT). Int J Geosci 2:293–309.

    Article  Google Scholar 

  • Domonkos P (2015) Homogenization of precipitation time series with ACMANT. Theor Appl Climatol 122:303–314.

    Article  Google Scholar 

  • Domonkos P, Coll J (2017a) Homogenisation of temperature and precipitation time series with ACMANT3: method description and efficiency tests. Int J Climatol 37:1910–1921.

    Article  Google Scholar 

  • Domonkos P, Coll J (2017b) Time series homogenisation of large observational datasets: the impact of the number of partner series on the efficiency. Clim Res 74:31–42.

    Article  Google Scholar 

  • Gimmi U, Luterbacher J, Pfister C, Wanner H (2007) A method to reconstruct long precipitation series using systematic descriptive observations in weather diaries: the example of the precipitation series for Bern, Switzerland (1760–2003). Theor Appl Climatol 87:185–199.

    Article  Google Scholar 

  • Gubler S, Hunziker S, Begert M, Croci-Maspoli M, Konzelmann T, Brönnimann S, Schwierz C, Oria C, Rosas G (2017) The influence of station density on climate data homogenization. Int J Climatol 37:4670–4683.

    Article  Google Scholar 

  • Guentchev G, Barsugli JJ, Eischeid J (2010) Homogeneity of gridded precipitation datasets for the Colorado River Basin. J Appl Meteorol Climatol 49:2404–2415.

    Article  Google Scholar 

  • Guijarro JA (2014) User’s Guide to Climatol.

  • Guijarro JA, López JA, Aguilar E, Domonkos P, Venema V, Sigró J, Brunet M (2017) Comparison of homogenization packages applied to monthly series of temperature and precipitation: the MULTITEST project. Ninth Seminar for Homogenization and Quality Control in Climatological Databases (Ed Szentimrey T, Lakatos M, Hoffmann L) WMO WCDMP-85:46–62

  • Hannart A, Mestre O, Naveau P (2014) An automatized homogenization procedure via pairwise comparisons with application to Argentinean temperature series. Int J Climatol 34:3528–3545.

    Article  Google Scholar 

  • Hausfather Z, Menne MJ, Williams CN, Masters T, Broberg R, Jones D (2013) Quantifying the effect of urbanization on U.S. Historical Climatology Network temperature records. J Geophys Res Atmos 118:481–494.

    Article  Google Scholar 

  • Hua W, Shen SSP, Weithmann A, Wang H (2017) Estimation of sampling error uncertainties in observed surface air temperature change in China. Theor Appl Climatol 129:1133–1144.

    Article  Google Scholar 

  • Huang J, van den Dool HM, Barnston AG (1996) Long-lead seasonal temperature prediction using optimal climate normals. J Clim 9:809–817

    Article  Google Scholar 

  • Hunziker S, Brönnimann S, Calle J, Moreno I, Andrade M, Ticona L, Huerta A, Lavado-Casimiro W (2018) Effects of undetected data quality issues on climatological analyses. Clim Past 14:1–20.

    Article  Google Scholar 

  • Huth R, Nemesova I (1995) Estimation of missing daily temperatures: can a weather categorization improve its accuracy? J Clim 8:1901–1916

    Article  Google Scholar 

  • Jones PD, Lister DH (2010) The urban heat island in Central London and urban-related warming trends in Central London since 1900. Weather 64:323–327.

    Article  Google Scholar 

  • Kemp WP, Burnell DG, Everson DO, Thomson AJ (1983) Estimating missing daily maximum and minimum temperatures. J Clim Appl Meteorol 22:1587–1593

    Article  Google Scholar 

  • Killick RE (2016) Benchmarking the performance of homogenisation algorithms on daily temperature data. PhD thesis, University of Exeter, UK.

  • Klok EJ, Klein Tank AMG (2009) Updated and extended European dataset of daily climate observations. Int J Climatol 29:1182–1191.

    Article  Google Scholar 

  • Lindau R, Venema V (2016) The uncertainty of break positions detected by homogenization algorithms in climate records. Int J Climatol 36:576–589.

    Article  Google Scholar 

  • Menne MJ, Williams CN (2009) Homogenization of temperature series via pairwise comparisons. J Clim 22:1700–1717.

    Article  Google Scholar 

  • Menne MJ, Williams CN, Vose RS (2009) The U.S. Historical Climatology Network monthly temperature data, version 2. Bull Am Meteorol Soc 90:993–1007.

    Article  Google Scholar 

  • Mestre O, Gruber C, Prieur C, Caussinus H, Jourdain S (2011) SPLIDHOM: a method for homogenization of daily temperature observations. J Appl Meteorol Climatol 50:2343–2358.

    Article  Google Scholar 

  • Mestre O, Domonkos P, Picard F, Auer I, Robin S, Lebarbier E, Böhm R, Aguilar E, Guijarro J, Vertacnik G, Klancar M, Dubuisson B, Štěpánek P (2013) HOMER: homogenization software in R—methods and applications. Idojaras Q J Hung Meteorol Serv 117:47–67

    Google Scholar 

  • Oyler JW, Ballantyne A, Jencso K, Sweet M, Runninga SW (2015) Creating a topoclimatic daily air temperature dataset for the conterminous United States using homogenized station data and remotely sensed land skin temperature. Int J Climatol 35:2258–2279.

    Article  Google Scholar 

  • Peterson TC, Easterling DR (1994) Creation of homogeneous composite climatological reference series. Int J Climatol 14:671–679

    Article  Google Scholar 

  • Peterson TC, Easterling DR, Karl TR, Groisman P, Nicholls N, Plummer N, Torok S, Auer I, Böhm R, Gullett D, Vincent L, Heino R, Tuomenvirta H, Mestre O, Szentimrey T, Salingeri J, Førland EJ, Hanssen-Bauer I, Alexandersson H, Jones P, Parker D (1998) Homogeneity adjustments of in situ atmospheric climate data: a review. Int J Climatol 18:1493–1517

    Article  Google Scholar 

  • Prohom M, Barriendos M, Sanchez-Lorenzo A (2016) Reconstruction and homogenization of the longest instrumental precipitation series in the Iberian Peninsula (Barcelona, 1786–2014). Int J Climatol 36:3072–3087.

    Article  Google Scholar 

  • Ribeiro S, Caineta J, Costa AC (2016) Review and discussion of homogenisation methods for climate data. Phys Chem Earth 94:167–179.

    Article  Google Scholar 

  • Rienzner M, Gandolfi C (2011) A composite statistical method for the detection of multiple undocumented abrupt changes in the mean value within a time series. Int J Climatol 31:742–755.

    Article  Google Scholar 

  • Spinoni J, Szalai S, Szentimrey T, Lakatos M, Bihari Z, Andrea Nagy A, Németh Á, Kovács T, Mihic D, Dacic M, Petrovic P, Kržič A, Hiebl J, Auer I, Milkovic J, Štepánek P, Zahradnícek P, Kilar P, Limanowka D, Pyrc R, Cheval S, Birsan M-V, Dumitrescu A, Deak G, Matei M, Antolovic I, Nejedlík P, Štastný P, Kajaba P, Bochnícek O, Galo D, Mikulová K, Nabyvanets Y, Skrynyk O, Krakovska S, Gnatiuk N, Tolasz R, Antofie T, Vogt J (2015) Climate of the Carpathian region in the period 1961–2010: climatologies and trends of 10 variables. Int J Climatol 35:1322–1341.

    Article  Google Scholar 

  • Szentimrey T (1999) Multiple analysis of series for homogenization (MASH). In: Szalai S, Szentimrey T, Szinell Cs (eds) Proc 2nd Seminar for Homo-genization of Surface Climatological Data. WMO WCDMP 41, pp 27–46

  • Szentimrey T (2010) Methodological questions of series comparison. In: Lakatos M, Szentimrey T, Bihari Z, Szalai S (eds) 6th Seminar for Homogenization and Quality Control in Climatological Databases. WMO WCDMP-76, pp 1–7

  • Tardivo G (2015) Spatial and time correlation of thermometers and pluviometers in a weather network database. Theor Appl Climatol 120:19–28.

    Article  Google Scholar 

  • Tardivo G, Berti A (2012) A dynamic method for gap filling in daily temperature datasets. J Appl Meteorol Climatol 51:1079–1086.

    Article  Google Scholar 

  • Thorne PW, Menne MJ, Williams CN, Rennie JJ, Lawrimore JH, Vose RS, Peterson TC, Durre I, Davy R, Esau I, Klein-Tank AMG, Merlone A (2016) Reassessing changes in diurnal temperature range: a new data set and characterization of data biases. J Geophys Res Atmos 121:5115–5137.

    Article  Google Scholar 

  • Thorne PW, Allan R, Ashcroft L, Brohan P, Dunn R, Menne M, Pearce P, Picas J, Willett K, Benoy M, Bronnimann S, Canziani P, Coll J, Crouthamel R, Compo G, Cuppett D, Curley M, Duffy C, Gillespie I, Guijarro J, Jourdain S, Kent E, Kubota H, Legg T, Li Q, Matsumoto J, Murphy C, Rayner N, Rennie J, Rustemeier E, Slivinski L, Slonosky V, Squintu A, Tinz B, Valente M, Walsh S, Wang X, Westcott N, Wood K, Woodruff S, Worley S (2017) Towards an integrated set of surface meteorological observations for climate science and applications. Bull Am Meteorol Soc 98:2689–2702.

    Article  Google Scholar 

  • Venema V, Mestre O, Aguilar E, Auer I, Guijarro JA, Domonkos P, Vertacnik G, Szentimrey T, Štěpánek P, Zahradnicek P, Viarre J, Müller-Westermeier G, Lakatos M, Williams CN, Menne M, Lindau R, Rasol D, Rustemeier E, Kolokythas K, Marinova T, Andresen L, Acquaotta F, Fratianni S, Cheval S, Klancar M, Brunetti M, Gruber C, Duran MP, Likso T, Esteban P, Brandsma T (2012) Benchmarking monthly homogenization algorithms. Clim Past 8:89–115.

    Article  Google Scholar 

  • Willett KM, Williams CN, Jolliffe I, Lund R, Alexander L, Brönniman S, Vincent LA, Easterbrook S, Venema V, Berry D, Warren R, Lopardo G, Auchmann R, Aguilar E, Menne M, Gallagher C, Hausfather Z, Thorarinsdottir T, Thorne PW (2014) A framework for benchmarking of homogenisation algorithm performance on the global scale. Geosci Instrum Method Data Syst 3:187–200.

    Article  Google Scholar 

  • WMO (2016). Web site of the Task Team on HOMOGENIZATION (OPACE2, WMO Commission for Climatology). Accessed Aug 2017

Download references


The authors thank Kate Willett and her colleagues for giving open access to the temperature database they developed.


The second author was funded by the Irish Environmental Protection Agency under project 2012-CCRP-FS.11.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Peter Domonkos.


I Gap filling within ACMANTv3

  1. i)

    Concepts and definitions

The dataset consists of N monthly time series of n years length, but the series are incomplete. Series s (s = 1,2,…N) can be presented as

$$ {\mathbf{X}}_{\mathbf{s}}={x}_{s,1},{x}_{s,2},\dots {x}_{s,h}\dots {x}_{s,12n} $$

without indicating possible data gaps, h stands for the serial number of month from the beginning of the time series. The relation between h, the serial number of year from the beginning of time series i and the serial number of calendar month m is.

$$ h=12\left(i-1\right)+m\kern0.5em i\in \left\{1,2,\dots n\right\} $$

We will denote the cluster of observed values (distinguishing from missing values) of series s with Js, and its sub-cluster for month m with Js,m, and the number of elements in there with Ks and Ks,m, respectively.

Before any other operation with the data, the seasonal cycle is removed by extracting the monthly climatic normal (Us,m) from the observed values, then the deseasonalised series are denoted with As (Eqs. A3A5).

$$ {\mathbf{A}}_{\mathbf{s}}={a}_{s,1},{a}_{s,2},\dots {a}_{s,h}\dots {a}_{s,12n} $$
$$ {a}_{s,h}={x}_{s,h}-{U}_{s,m} $$
$$ {U}_{s,m}=\frac{1}{K_{s,m}}{\sum}_{{\mathrm{J}}_{\mathrm{s},\mathrm{m}}}{x}_{s,h} $$

Missing data of a candidate series (Ag) will be filled with interpolation using the synchronous values of partner series (As). In the selection of partner series and for weighting their contribution, spatial correlations are considered.

The spatial correlation between series g and series s (rg,s) is defined by the Pearson correlation coefficient. The sample size for its calculation includes each h for which both series have observed values. When the sample size is lower than 50, the correlation is zero by our definition.

  1. ii)

    Gap filling

The method of gap filling has remained similar to that in the first version of ACMANT (Domonkos 2011b), but some details have been changed since then.

The interpolation for a missing value of month h0 in the candidate series relies on the synchronous observed values of surrounding stations, but the values of the partner series are tuned to a section mean value characterising the common effect of local climate and possible inhomogeneity effects at the timing of the missing value in the candidate series. For this purpose, window [h1, h2] around h0 is constructed. This window must be wide enough to have sufficient data for the reliable estimation of the difference between the section means for the candidate series Ag and its partner series As, but narrow enough to exclude the effects of temporally distant IHs.

The window width can be regulated by parameterising it directly, or via the minimum number of the value pairs for series g and s. In practice, the window width is varied according to the frequency of missing data around h0 in the time series participating in the interpolation, hence h2 – h1 is functions of both h and s. Table 3 shows various sets of conditions for the window constructions in terms of the half window width (L) and the number of observed value pairs within the window (k). Moving down in Table 3 the conditions soften, and always the strictest conditions allowed by data availability are selected for the window construction.

Table 3 Connections between half window width (L), number of observed value pairs for the candidate series and its partner series (n’) and coefficient of weight correction (c) in the construction of window around the timing of the missing data (h). Always the strictest conditions allowed by data availability are applied

All the series with rg,s ≥ 0.4 are considered as partner series, if they have observed value for month h0. Eq. (A6) shows the tuning of value as,h0 to the candidate series.

$$ {a}_{s,h0}^{\prime }={a}_{s,h0}+\frac{1}{k}{\sum}_{h={h}_1}^{h_2}\left({a}_{g,h}-{a}_{s,h}\right) $$

When at least one of the two series do not have observed data (h is not ϵ Jg∩Js), then as,h = ag,h in (6) by definition. The interpolated value will be the weighted average of the tuned values of N* partner series (N* ≤ N − 1). The weights are the squared spatial correlations corrected by coefficient c depending on the window width (Table 3).

When the sum of the corrected weights (p) is lower than 0.4, zero anomaly (ag,h = 0) is presumed for the missing value with a certain or entire weight, according to Eqs. (A7) and (A8).

$$ {a}_{g,h0}=\frac{1}{p}{\sum}_{s=1}^{N^{\ast }}{c}_s{r}_{g,s}^2{a}_{s,h0}^{\prime } $$
$$ p=\max \left(0.4,{\sum}_{s=1}^{N^{\ast }}{c}_s{r}_{g,s}^2\right) $$

Note that the optimal sample for the calculation of U and r (Huang et al. 1996; Tardivo 2015) may also differ from the sample including all available data and used in this study. However, this difference from the optimal parameterisation likely has a minor effect on the accuracy of interpolation.

The gap filling is performed three times within the homogenisation, first with the use of raw data, then with the use of pre-homogenised data in the second and third stages of the procedure.

II Automatic networking

Appropriately constructed networks for homogenisation with ACMANT have three positive attributes: (i) The candidate series have a sufficient number of highly correlated partner series; (ii) Each section of the candidate series is covered with a sufficient number of synchronous observed data of the partner series; (iii) There is no unnecessary excess of the network size. The algorithm presented here is structured to give solutions with these positive attributes.

The number of partner series and the number of effective partners (see its definition in Sect. 3.4) are denoted with M and F, respectively. The spatial correlations used here (r*) are not the same as those which are used for the interpolation, namely r* is calculated from the first difference (increment) series of the deseasonalised monthly temperatures, and following from how it was introduced to time series homogenisation by Peterson and Easterling (1994).

For the homogenisation of each candidate series, one distinct network is constructed. First, the most highly correlated partner series are selected up to 30 series. When the number of potential partner series with r* ≥ 0.4. is higher than 30, the following steps are performed recursively.

  1. i)

    Possible improvements in F by the inclusion of any further partner series (s) are considered using score S1 for dates of F < 10 (Eqs. A9, A10).

$$ \mathrm{S}1(s)=\sum \limits_{h=1}^{12n}5{r}_s^{\ast 4}{\left(12-{F}_{s,h}^{\ast}\right)}^3 $$
$$ {F}^{\ast }=\left\{\begin{array}{c}F\kern0.24em \mathrm{if}\kern0.24em F<10\\ {}12\kern0.24em \mathrm{if}\kern0.24em F\ge 10\end{array}\right\}\kern0.5em \mathrm{for}\kern0.17em \mathrm{every}\;s\;\mathrm{and}\;h $$
  1. ii)

    Possible improvements in F by the inclusion of any further partner series are considered using score S2 for decadal sections of the candidate series where F < 10 in at least 25% of the decade. Months belonging to at least one of such decades are denoted with m in Eqs. A11-A12.

$$ \mathrm{S}2(s)=\sum \limits_m5{r}_s^{\ast 4}{\left(20-{F}_{s,m}^{\ast \ast}\right)}^2 $$
$$ {F}^{\ast \ast }=\left\{\begin{array}{c}F\kern0.24em \mathrm{if}\kern0.24em F<20\\ {}20\kern0.24em \mathrm{if}\kern0.24em F\ge 20\end{array}\right\}\kern0.5em \mathrm{for}\kern0.17em \mathrm{every}\;s\;\mathrm{and}\;m $$
  1. iii)

    The exceedance of M above 30 is penalised with score S3 (Eq. A13).

$$ \mathrm{S}3={\left(M-30\right)}^2 $$
  1. iv)

    Summarised score S is calculated for each s (Eq. A14).

$$ \mathrm{S}(s)=\mathrm{S}1(s)+\mathrm{S}2(s)-\mathrm{S}3 $$

Then the series with the highest S is selected, and the procedure continues with step i.

If S ≤ 0 for all s, no further partner series is selected, and the procedure terminates.

The development of this algorithm is based on subjective decisions, but the important elements of the procedure can be reasoned well. Frequent occurrence of low F within a relatively short period is considered more destructive to the efficiency of homogenisation than its sporadic occurrences; therefore, higher minimum threshold of F is applied in S2 than in S1. It is more important to raise the smallest F values (if it is possible) than to raise a large number of F values, that is why the second factors of (A9) and (A11) are assigned higher powers. When more series are comparably useful in raising F, it is important to give preference to the one with relatively high correlation, therefore the power of r* is raised by 4. Note that some parameter values are close to those of the networking in PHA (Menne and Williams 2009), as in PHA the 40 best correlating partner series are taken at the first step, and the correlation threshold is 0.5.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Domonkos, P., Coll, J. Impact of missing data on the efficiency of homogenisation: experiments with ACMANTv3. Theor Appl Climatol 136, 287–299 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: