Skip to main content
Log in

Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

  • Original Paper
  • Published:
Theoretical and Applied Climatology Aims and scope Submit manuscript

Abstract

This study aims to compare several imputation methods to complete the missing values of spatio–temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation–maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio–temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Abbreviations

AEG:

Aegean region

C :

Correlation sum

CAN:

Central Anatolia

CD:

Correlation dimension

CV:

Coefficient of variation

CVRMSE:

Coefficient of variation root mean squared error

DA:

Data augmentation

EAN:

Eastern Anatolia

EM:

Expectation–maximization

EM-MCMC:

EM Monte Carlo Markov Chain

IDWM:

Inverse distance weighting method

MAR:

Missing at random

MCAR:

Missing completely at random

MDA:

Multiple discriminant analysis

MI:

Multiple imputation

MLP:

Multilayer perceptron

MLPNN:

MLP neural network

MLR:

Multiple linear regression

MNAR:

Missing not at random

MED:

Mediterranean region

NDTSA:

Nonlinear dynamic time series analysis

NN:

Neural network

NR:

Normal ratio

NRWC:

NR weighted with correlations

BLS:

Black Sea region

PS:

Phase space

RMSE:

Root mean squared error

SAA:

Simple arithmetic average

SAN:

Southeastern Anatolia

SOM:

Self-organizing maps

TDE:

Time delay embedding

MAR:

Marmara region

TSMS:

Turkish state meteorological service

References

  • Allison PD (2001) Missing data. Sage university papers series on quantitative applications in the social sciences. Sage, Thousand Oaks, pp 07–136

    Google Scholar 

  • Aly A, Pathak C, Teegavarapu RSV, Ahlquist J, Fuelberg H (2009) Evaluation of improvised spatial interpolation methods for infilling missing precipitation records. Proc World Environ Water Resour Congr. doi:10.1061/41036(342)598

  • Asar O, Kartal E, Aslan S, Ozturk MZ, Yozgatligil C, Cınar I, Batmaz I, Koksal G, Turkes M ve Tatlı H (2010) Handling and analysis of Turkish precipitation data for the period 1950–2006 using descriptive data mining techniques (in Turkish). – 7. İstatistik Günleri Sempozyumu, 28–30 Haziran 2010, METU, Ankara, Türkiye

  • Cano S, Andreu J (2010) Using multiple imputation to simulate time series: a proposal to solve the distance effect. WSEAS Trans Comput 9(7):768–777

    Google Scholar 

  • Coulibaly P, Evora ND (2007) Comparison of neural network methods for in filling missing daily weather records. J Hydrol 341:27–41

    Article  Google Scholar 

  • Demirtas H, Freels SA, Yucel RM (2008) Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment. J Stat Comput Simul 78(1):69–84

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39:1–38

    Google Scholar 

  • Eischeid JK, Pasteris PA, Diaz HF, Plantico MS, Lott NJ (2000) Creating a serially complete, national daily time series of temperature and precipitation for the western United States. J Appl Meteorol 39(9):1580–1591

    Article  Google Scholar 

  • Enders C (2010) Applied missing data analysis. Guilford, New York

    Google Scholar 

  • Erinc S (1984) Climatology and its methods. University of Istanbul Press, Turkey (in Turkish)

    Google Scholar 

  • Evrendilek F, Berberoglu S (2008) Quantifying spatial patterns of bioclimatic zones and controls in Turkey. Theor Appl Climatol 91(1–4):35–50

    Article  Google Scholar 

  • Fraser AM, Swinney HL (1986) Independent coordinates for strange attractors from mutual information. Phys Rev A 33:1134

    Article  Google Scholar 

  • Grassberger P, Procaccia I (1983) Measuring the strangeness of strange attractors. Physica D 9:189–208

    Article  Google Scholar 

  • Haykin S (1999) Neural networks: A comprehensive foundation. Prentice-Hall, Upper Saddle River

    Google Scholar 

  • IBM SPSS (2009) IBM SPSS 18.0.0. URL http://www-01.ibm.com/software/analytics/spss/

  • Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38(18):2895–2907

    Article  Google Scholar 

  • Kadıoglu M (2000) Regional variability of seasonal precipitation over Turkey. Int J Climatol 20:1743–1760

    Article  Google Scholar 

  • Kalteh AM, Berndtsson R (2007) Interpolating monthly precipitation by self-organizing map (SOM) and multilayer perceptron (MLP). Hydrol Sci J 52(2):305–317

    Article  Google Scholar 

  • Kalteh AM, Hjorth P (2009) Imputation of missing values in a precipitation-runoff process database. Hydrol Res 40(4):420–432

    Article  Google Scholar 

  • Kantz H, Schreiber T (2003) Nonlinear time series analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Kennel MB, Brown R, Abarbanel HDI (1992) Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys Rev A 45:3403

    Article  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York

    Google Scholar 

  • Lo Presti R, Barca E, Passarella G (2010) A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy). Environ Monit Assess 160(1–4):1–22

    Article  Google Scholar 

  • Lucio PS, Conde FC, Cavalcanti IFA, Serrano AI, Ramos AM, Cardoso AO (2007) Spatiotemporal monthly rainfall reconstruction via artificial neural network—case study: south of Brazil. Adv Geosci 10:67–76

    Article  Google Scholar 

  • Makhuvha T, Pegram G, Sparks R, Zucchini W (1997) Patching rainfall data using regression methods. 2. Comparisons of accuracy, bias and efficiency. J Hydrol 198(1–4):308–318

    Article  Google Scholar 

  • McKnight EP, McKnight MK, Sidani S, Figueredo JA (2007) Missing data. Guilford, New York

    Google Scholar 

  • McLachlan G, Krishnan T (1997) The EM algorithm and extension. Wiley, New York

    Google Scholar 

  • Paulhus JLH, Kohler MA (1952) Interpolation of missing precipitation records. Mon Weather Rev 80:129–133

    Article  Google Scholar 

  • Ramos-Calzado P, Gómez-Camacho J, Pérez-Bernal F, Pita-López MF (2008) A novel approach to precipitation series completion in climatological datasets: application to Andalusia. Int J Climatol 28(11):1525–1534, Rev A 45, 3403

    Article  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592

    Article  Google Scholar 

  • Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York

    Book  Google Scholar 

  • Sahin S, Cigizoglu HK (2010) Homogeneity analysis of Turkish meteorological data set. Hydrol Process 24(8):981–992

    Article  Google Scholar 

  • Saris F, Hannah DM, Eastwood WJ (2010) Spatial variability of precipitation regimes over Turkey. Hydrol Sci J-J Sci Hydrol 55:234–249

    Article  Google Scholar 

  • SAS Institute Inc (2007) SAS/STAT Software, Version 9.1.3 Cary, NC.URL http://www.sas.com/

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall/CRC, London

    Book  Google Scholar 

  • Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147–177

    Article  Google Scholar 

  • Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Clim 14:853–871

    Article  Google Scholar 

  • Sen Z, Habib Z (2000) Spatial analysis of monthly precipitation in Turkey. Theor Appl Climatol 67(1–2):81–96

    Google Scholar 

  • Sen Z, Habib Z (2001a) Monthly spatial rainfall correlation functions and interpretations for Turkey. [Fonctionsmensuelles de corrélationspatiale de la pluieetinterprétations en Turquie]. Hydrol Sci J 46(4):525–535

    Article  Google Scholar 

  • Sen Z, Habib Z (2001b) Spatial rainfall pattern identification by optimum interpolation technique and application for Turkey. Nord Hydrol 32(2):85–98

    Google Scholar 

  • Small M (2005) Applied nonlinear time series analysis: applications in physics, physiology and finance. Nonlinear Science Series A, World Scientific.vol 52

  • Smith KW, Aretxabaleta AL (2007) Expectation–maximization analysis of spatial time series. Nonlinear Process Geophys 14(1):73–77

    Article  Google Scholar 

  • Takens F (1981) Detecting strange attractors in turbulence. Lecture notes in Math. Springer, New York

    Google Scholar 

  • Tan MT, Tian GL, Ng KW (2010) Bayesian missing data problems. Chapman and Hall/CRC, London

    Google Scholar 

  • Tanner W (1996) Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions. Springer, New York

    Google Scholar 

  • Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82(398):528–540

    Article  Google Scholar 

  • Tatli H, Dalfes HN, Mentes SS (2005) Surface air temperature variability over Turkey and its connection to large-scale upper air circulation via multivariate techniques. Int J Climatol 25:331–350

    Article  Google Scholar 

  • Tayanc M, Im U, Doruel M, Karaca M (2009) Climate change in Turkey for the last half century. Clim Chang 94(3–4):483–502

    Article  Google Scholar 

  • Teegavarapu RSV, Chandramouli V (2005) Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. J Hydrol 312(1–4):191–206

    Article  Google Scholar 

  • Turkes M (1996) Spatial and temporal analysis of annual rainfall variations in Turkey. Int J Climatol 16(9):1057–1076

    Article  Google Scholar 

  • Turkes M (1998) Influence of geopotential heights, cyclone frequency and southern oscillation on rainfall variations in Turkey. Int J Climatol 18:649–680

    Article  Google Scholar 

  • Turkes M (1999) Vulnerability of Turkey to desertification with respect to precipitation and aridity conditions. Turk J Eng Environ Sci 23:363–380

    Google Scholar 

  • Turkes M, Erlat E (2003) Precipitation changes and variability in Turkey linked to the north Atlantic oscillation during the period 1930–2000. Int J Climatol 23(14):1771–1796

    Article  Google Scholar 

  • Turkes M, Erlat E (2005) Climatological responses of winter precipitation in Turkey to variability of the North Atlantic Oscillation during the period 1930–2001. Theor Appl Climatol 81(1–2):45–69

    Article  Google Scholar 

  • Turkes M, Erlat E (2008) Influence of the arctic oscillation on the variability of winter mean temperatures in Turkey. Theor Appl Climatol 92:75–85

    Article  Google Scholar 

  • Turkes M, Koc T, Saris F (2009) Spatiotemporal variability of precipitation total series over Turkey. Int J Climatol 29(8):1056–1074

    Article  Google Scholar 

  • Turkes M, Erlat E (2009) Winter mean temperature variability in Turkey associated with the North Atlantic oscillation. Meteorog Atmos Phys 105(3–4):211–225

    Article  Google Scholar 

  • Turkes M, Sumer UM (2004) Spatial and temporal patterns of trends and variability in diurnal temperature ranges of Turkey. Theor Appl Climatol 77(3–4):195–227

    Article  Google Scholar 

  • Turkes M, Sumer UM, Kilic G (1995) Variations and trends in annual mean air temperatures in Turkey with respect to climatic variability. Int J Climatol 15(5):557–569

    Article  Google Scholar 

  • Turkes M, Sumer UM, Kilic G (1996) Observed changes in maximum and minimum temperatures in Turkey. Int J Climatol 16:463–477

    Article  Google Scholar 

  • Turkes M, Sumer UM, Demir I (2002) Re-evaluation of trends and changes in mean, maximum and minimum temperatures of Turkey for the period 1929–1999. Int J Climatol 22(8):947–977

    Article  Google Scholar 

  • Turkes M, Tatli H (2011) Use of the spectral clustering to determine coherent precipitation regions in Turkey for the period 1929–2007. Int J Climatol 31(14):2055–2067

    Article  Google Scholar 

  • Unal Y, Kindap T, Karaca M (2003) Redefining the climate zones of Turkey using cluster analysis. Int J Climatol 23(9):1045–1055

    Article  Google Scholar 

  • Visual Recurrence Analysis (2005) VRA 4.9, URL http://nonlinear.110mb.com/vra/

  • WMO (1983) Guide to climatological practices. 2nd ed. World Meteorological Organization: WMO no 100. Secretariat of the World Meteorological Organization: Geneva, Switzerland

  • WMO (1988) Analyzing long time series of hydrological data with respect to climate variability, project description, WMO/TD 224. World Meteorological Organization, Geneva

    Google Scholar 

  • WMO (2008) Guide to meteorological instruments and methods of observation. WMO, 7th ed. WMO-8, World Meteorological Organization, Geneva, Switzerland

  • WMO (2011) Guide to climatological practices. 3rd ed. World Meteorological Organization: WMO no 100. Secretariat of the World Meteorological Organization: Geneva, Switzerland

  • Xia Y, Fabian P, Stohl A, Winterhalter M (1999a) Forest climatology: estimation of missing values for Bavaria Germany. Agric For Meteorol 96(1–3):131–144

    Article  Google Scholar 

  • Xia Y, Fabian P, Stohl A, Winterhalter M (1999b) Forest climatology: reconstruction of mean climatological data for Bavaria, Germany. Agric For Meteorol 96(1–3):117–129

    Article  Google Scholar 

  • Young KC (1992) A three-way model for interpolating for monthly precipitation values. Mon Weather Rev 120:2562–2569

    Article  Google Scholar 

Download references

Acknowledgments

This study is supported by Middle East Technical University, Ankara, Turkey, under contract number BAP-2008-01-09-02. The authors would like to extend their thanks to Prof. Dr. Murat Türkeş for his valuable comments and to all members of the NINLIL research group http://www.stat.metu.edu.tr/, for their support. The authors also would like to thank Gary Conlan (School of Foreign Languages, METU, Turkey) for assessment of the language qualification of our manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ceylan Yozgatligil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yozgatligil, C., Aslan, S., Iyigun, C. et al. Comparison of missing value imputation methods in time series: the case of Turkish meteorological data. Theor Appl Climatol 112, 143–167 (2013). https://doi.org/10.1007/s00704-012-0723-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00704-012-0723-x

Keywords

Navigation