Abstract
The presence of missing values in daily rainfall data may hamper the analyses to determine effective results for solving problems of hydrological, agricultural, and climatological issues. The study attempts to select an appropriate method for estimating the missing value of daily rainfall data of Bangladesh. For this purpose, eight methods and seven comparison techniques are employed. For imputation of missing values employing these methods, three sets of daily rainfall data (1, 5, and 10% missing values) with 1000 repetitions are considered randomly for five regions of the country. These samples are artificially created as missing and then imputation for these missing values is made applying the selected methods. The relative performance of the methods are examined using some comparison criteria. The following observations can be made from the study regarding the choice of the appropriate missing value estimation technique: for imputation of the missing values of daily rainfall data, the arithmetic average method for rainfall stations Chittagong and Rajshahi in the south-east region and the north-west region, respectively, is found as the best methods. Further, the single best estimator method for rainfall stations Sylhet and Dhaka in the north-east region and the mid-region, respectively, and the EM-MCMC method for rainfall station Khulna of the south-east region are also identified as the best methods in respect of Kolmogorov-Smirnov test, the lowest bias of estimate, the value of S index, etc.
Similar content being viewed by others
References
Ahrens B (2006) Distance in spatial interpolation of daily rain gauge data. Hydrol Earth Syst Sci 10:197–208
Asati SR (2012) Analysis of rainfall data for drought investigation at Agra U. P. Int J Life Sci Biotechnol Pharm Res 1(4):81–86
Bangladesh Economic Review (2016) Economic adviser’s wing, finance division, Ministry of Finance, Government of the People’s Republic of Bangladesh
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?—arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250
Chen FW, Liu CW (2012) Estimation of the spatial rainfall distribution using inverse distance weighting (IDW) in the middle of Taiwan. Paddy Water Environ 10(3):209–222
Chowdhury MRK (2013) Country report: Bangladesh meteorological department (BMD), People’s republic of Bangladesh
Collins LM, Schafer JL, Kam CM (2001) A comparison of inclusive and restrictive strategies in modern missing-data procedures. Psychol Methods 6:330–351
Cong RG, Brady M (2012) The interdependence between rainfall and temperature: copula analyses. Sci World J 2012:1–11
Coulibaly P, Evora ND (2007) Comparison of neural network methods for infilling missing daily weather records. J Hydrol 341:27–41
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Dumedah G, Coulibaly P (2011) Evaluation of statistical methods for infilling missing values in high-resolution soil moisture data. J Hydrol 400(1–2):95–102
Eischeid JK, Baker CB, Karl TR, Diaz HGF (1995) The quality control of long-term climatological data using objective data analysis. J Appl Meteorol 34:2787–2795
Eischeid JK, Pasteris PA, Diaz HF, Plantico MS, Lott NJ (2000) Creating a serially complete, national daily time series of temperature and precipitation for the western United States. J Appl Meteorol 39(9):1580–1591
Ferrari GT, Ozaki V (2014) Missing data imputation of climate datasets: implications to modeling extreme drought events. Rev Bras Meteorol 29(1):21–28
Garcia B, Sentelhas P, Tapia L, Sparovek G (2006) Filling in missing rainfall data in the Andes region of Venezuela, based on a cluster analysis approach. Rev Bras Agrometeorol 14(2):225–233
Garcia M, Peters-Lidard CD, Goodrich DC (2008) Spatial interpolation in a dense gauge network for monsoon storm events in the southwestern United States. Water Resour Res 44:W05S13. https://doi.org/10.1029/2006WR005788
Goodison B, Louie PYT, Yang D (1998) WMO solid precipitation measurement inter comparison. Final report
Graham JW, Hofer SM, Donaldson SI, MacKinnon DP, Schafer JL (1997) Analysis with missing data in prevention research. The science of prevention: methodological advances from alcohol and substance abuse research, 1, pp 325-366
Hubbard KG (1994) Spatial variability of daily weather variables in the high plains of the USA. Agric For Meteorol 68:29–41
Kemp WP, Burnell DG, Everson DO, Thomson AJ (1983) Estimating missing daily maximum and minimum temperatures. J Climate Appl 22:1587–1593
Kripalani RH, Inamdar S, Sontakke NA (1996) Rainfall variability over Bangladesh and Nepal: comparison and connections with features over India. Int J Climatol 16(6):689–703
Lam NSN (1983) Spatial interpolation methods : a review. Am Cartographer 10(2):129–149
Lennon JJ, Turner JRG (1995) Predicting the spatial distribution of climate: temperature in Great Britain. J Anim Ecol 64:370–392
Li X, Z Zhao (2001) Measures of performance for evaluation of estimators and filters. Proc. 2001 SPIE Conf. on Signal and Data Processing, (July–August), pp 1–12
Little JRA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Lo Presti R, Barca E, Passarella G (2010) A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy). Environ Monit Assess 160:1–22
Massey FJ (1951) The Kolmogorov-Smirnov test for goodness of fit. JASA 46(253):68–78
National Hurricane Center of USA n.d. http://www.nhc.noaa.gov/gccalc.shtml
Paulhus JLH, Kohler MA (1952) Interpolation of missing precipitation records. Mon Weather Rev 80(8):129–133
Rashid H-e (1991) Geography of Bangladesh (2nd edition). In: Dhaka University Press Limited, Dhaka
Rubel F, Hantel M (1999) Correction of daily gauge measurements in the Baltic Sea drainage basin. Nord Hydrol 30:191–208
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Rubin DB (1978) Multiple imputation in sample surveys—a phenomenological Bayesian approach to nonresponse. Proceedings of the Survey Research Methods Section, ASA, pp 20–34
Rubin DB (1987) Multiple imputation for non-response in surveys. Wiley, New York
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London
Scheffer J (2002) Dealing with missing data. Res Lett Inf Math Sci 3:53–160
Shepard D (1968) A two-dimensional interpolation functions for irregularly spaced data. Proceeding of the Twenty-Third National Conference of the ACM, Washington, DC, pp 517–524
Silva RP, Dayawansa NDK, Ratnasiri MD (2007) A comparison of methods used in estimating missing rainfall data. J Agric Sci 3(May):101–108
Simanton JR, Osborn HB (1980) Reciprocal-distance estimate of point rainfall. J Hydraul Eng 106:1242–1246
Simolo C, Brunetti M, Maugeri M, Nanni T (2010) Improving estimation of missing values in daily precipitation series by a probability density function-preserving approach. Int J Climatol 30:1564–1576
Suhalia J, Sayang MD, Jemain AA (2008) Revised spatial weighting methods for estimation of missing rainfall data. Asia-Pac J Atmos Sci 44(2):93–104
Tabios GQ, Salas JD (1985) A comparative analysis of techniques for spatial interpolation of precipitation. Water Resour Bull 21:365–380
Tabony RC (1983) The estimation of missing climatological data. J Climatol 3:297–314
Tang WY, Kassim AHM, Abubakar SH (1996) Comparative studies of various missing data treatment methods-Malaysian experience. Atmos Res 42:247–262
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. JASA 82(398):528–540
Teegavarapu RSV, Chandramouli V (2005) Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. J Hydrol 312:191–206
Tronci N, Molteni F, Bozzini M (1986) A comparison of local approximation methods for the analysis of meteorological data. Arch Meteorol Geophys Bioclimatol A 36:189–211
Walther BA, Moore JL (2005) The concept of bias, precison and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimators. Ecography 28:815–829
Wilks DS (1995) Statistical methods in the atmospheric sciences. Academic Press, New York
Williams P (1998) Modelling seasonality and trends in daily rainfall data. Adv Neural Inf Proces Syst 10:985–991
Wallis JR, Letten Mayer DP, Wood EF (1991) A daily hydro climatological data set for the continental United States. Water Resour Res 27:1657–1663
Wilmott CJ (1981) On the validation of models. Phys Geogr 2:194–194
Xia Y, Fabian P, Stohl A, Winterhalter M (1999) Forest climatology: estimation of missing values for Bavaria, Germany. Agric For Meteorol 96:131–144
Yim C (2015) Imputing missing data with SAS. SAS Global Forum 2015, April 26–29, 2015, Dallas, pp 1–21
Yozgatligil C, Aslan S, Iyigun C, Batmaz I (2013) Comparison of missing value imputation methods in time series: the case of Turkish meteorological data. Theor Appl Climatol 112(1–2):143–167
Young KC (1992) A three way model for interpolating monthly precipitation values. Mon Weather Rev 120:2561–2569
Acknowledgements
This study is supported under the HEQEP sub-project, CP-3293, in the Department of Applied Statistics, East West University funded by World Bank and implemented by University Grants Commission of Bangladesh (UGC). The authors are also grateful to Bangladesh Meteorological Department (BMD) for providing the data. We acknowledge the critical comments from anonymous reviewers and editor.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jahan, F., Sinha, N.C., Rahman, M.M. et al. Comparison of missing value estimation techniques in rainfall data of Bangladesh. Theor Appl Climatol 136, 1115–1131 (2019). https://doi.org/10.1007/s00704-018-2537-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00704-018-2537-y