A Heuristic Gap Filling Method for Daily Precipitation Series

Kim, Jungjin; Ryu, Jae H.

doi:10.1007/s11269-016-1284-z

A Heuristic Gap Filling Method for Daily Precipitation Series

Published: 15 March 2016

Volume 30, pages 2275–2294, (2016)
Cite this article

Water Resources Management Aims and scope Submit manuscript

Jungjin Kim¹ &
Jae H. Ryu¹

813 Accesses
18 Citations
Explore all metrics

Abstract

The gap filling is common practice to complete hydrological data series without missing values for environmental simulations and water resources modeling in a changing climate. However, gap filling processes are often cumbersome because physical constraints, such as complex terrain and density of weather stations, often limit the ability to improve the performance. Although several studies of gap filling methods have been developed and improved by researchers, it is still challenging to find the best gap filling method for broad applications. This research explores a gap filling method to improve climate data estimates (e.g., daily precipitation) using gamma distribution function with statistical correlation (GSC) in conjunction with cluster analysis (CA). The daily dataset at the source stations (SSs) is utilized to estimate missing values at the target stations (TSs) in the study area. Three standard gap filling methods, including Inverse Distance Weight (IDW), Ordinary Kriging (OK), and Gauge Mean Estimator (GME) are evaluated along with cluster analysis based on statistical measures (RMSE, MAE, R) and skill scores (HSS, PSS, CSI). The result indicates that cluster analysis can improve estimation performances regardless of the gap filling methods used. However, the GSC method associated with cluster analysis, in particular, outperformed other methods when the performance comparison task was conducted under rain and no-rain conditions in the study area. The proposed method, GSC, therefore, will be used as a case toward advancing gap filling methods in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of Machine Learning and Remote Sensing for Gap-filling Daily Precipitation Data of a Sparsely Gauged Basin in East Africa

Article Open access 13 February 2023

Spatio-temporal estimation of climatic variables for gap filling and record extension using Reanalysis data

Article 09 October 2018

Missing Rainfall Daily Data: A Comparison Among Gap-Filling Approaches

Article Open access 11 July 2023

References

Adhikary SK, Muttil N, Yilmaz AG (2015) Genetic programming-based ordinary kriging for spatial interpolation of rainfall. J Hydrol Eng. doi:10.1061/(ASCE)HE.1943-5584.0001300, 04015062
Ahrens B (2006) Distance in spatial interpolation of daily rain gauge data. Hydrol Earth Syst Sci 20:197–208
Article Google Scholar
ASCE (1996) Hydrology handbook, 2nd edn. American Society of Civil Engineers ASCE, New York
Google Scholar
Barry RG, Chorley RJ (1987) Atmosphere, weather and climate, 5th edn. Routledge, London
Google Scholar
Breiger RL, Boorman SA, Arabie P (1975) An algorithm for clustering relation data with applications to social network analysis and comparison with multidimensional scaling. J Math Psychol 12(3):328–384
Article Google Scholar
Bunkers MJ, Miller JR, Degaetano AT (1996) Definition of climate regions in the northern plains using an objective cluster modification technique. J Climatol 9:130–146
Article Google Scholar
Chang KT (2009) Introduction to geographic information systems. McGraw Hill, New York
Google Scholar
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Article Google Scholar
Chmielewski MS, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331
Article Google Scholar
Davis JC (2002) Statistics and data analysis in geology. Wiley, New York
Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:14863–14868
Article Google Scholar
Faloutsos C, Oard DW (1998) A survey of information retrieval and filtering methods. Technical Report of the Computer Science Department, University of Maryland, No. CS-TR-3514
Garcia BIL, Sentelhas PC, Tapia L, Sparovek G (2006) Filling in missing rainfall data in the Andes region of Venezuela, based on a cluster analysis approach. Rev Bras Agrometeorologia 14(2):225–233
Google Scholar
Gardner JW (1991) Detection of vapours and odours from a multisensory array using pattern recognition Part 1. Principal and component and cluster analysis. Sens Actuators B 4:109–115
Article Google Scholar
Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79(1):191–215
Google Scholar
Hasan MM, Croke BFW (2013) “Filling gaps in daily rainfall data: a statistical approach.” 20th International Congress on Modeling and Simulation, Adelaide, Australia, 1-6 December 2013
Hruschka ER, Ebecken NF (2003) A genetic algorithm for cluster analysis. Intell Data Anal 7(1):15–25
Google Scholar
Isaake HE, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, Oxford
Google Scholar
Journel AG, Huijbregts CJ (1978) Mining Geostatistics Academic, New York
Kanevski M, Maignan M (2004) Analysis and modeling of spatial environmental data. EPEL Press, Italy
Google Scholar
Kim JJ, Ryu JH (2015) Quantifying a threshold of missing values for gap filling processes in daily precipitation series. Water Resour Manag 29(11):4173–4184
Article Google Scholar
Kumar CNS, Ramulu VS, Reddy KS, Kotha S, Kumar CM (2012) Spatial data mining using cluster analysis. Int J Comput Sci Inf Technol (IJCSIT) 4(4):71–77
Google Scholar
Llyod CD (2005) Assessing the effect of integrating elevation data into the estimation of monthly precipitation in Great Britain. J Hydrol 308:128–150
Article Google Scholar
Lu GY, Wong DW (2008) An adaptive inverse-distance weighting spatial interpolation technique. Comput Geosci 34(9):1044–1055
Article Google Scholar
Lund R, Li B (2010) Revisiting climate region definitions via clustering. J Climatol 22:1787–1800
Article Google Scholar
Maarek YS, Berry DM, Kaiser GE (1991) An information retrieval approach for automatically constructing software libraries. IEEE Trans Softw Eng 17(8):800–813
Article Google Scholar
MacQueen JB (1967) “Some methods for classification and analysis of multivariate observation.” Proceedings of 5-th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297
Google Scholar
Mair A, Fares A (2011) Comparison of rainfall interpolation methods in a mountainous region of a tropical island. J Hydrol Eng 16(4):371–383
Article Google Scholar
McCuen RH (1998) Hydrologic analysis and design. Prentice-Hall, NJ
Google Scholar
Pan W, Lin J, Le CT (2002) Model-based cluster analysis of microarray gene-expression data. Genome Biol 3(2):1–8
Article Google Scholar
Pedrycz W (1990) Fuzzy sets in pattern recognition: methodology and methods. Pattern Recogn 23(1):121–146
Article Google Scholar
Ryu J, Palmer R, Matthew W, Jeong S (2009) Mid-range streamflow forecasts based on climate modeling-Statistical correction and evaluation. J Am Water Resour Assoc 45(2):355–368
Article Google Scholar
Simanton JR, Osborn HB (1980) Reciprocal-distance estimate of point rainfall. J Hydraul Eng 106:1242–1246
Google Scholar
Simolo C, Brunetti M, Maugeri M, Nanni T (2010) Improving estimation of missing values in daily precipitation series by a probability density function-preserving approach. Int J Climatol 30(10):1564–1576
Google Scholar
Teegavarapu RSV (2007) Use of universal function approximation in variance-dependent surface interpolation method: An application in hydrology. J Hydrol 332(1–2):16–29
Article Google Scholar
Teegavarapu RSV (2012) Spatial interpolation using nonlinear mathematical programming models for estimation of missing precipitation records. Hydrol Sci 57(3):383–406
Article Google Scholar
Teegavarapu RSV (2014a) Statistical corrections of spatially interpolated missing precipitation data estimates. Hydrol Process 28:3789–3808
Article Google Scholar
Teegavarapu RSV (2014b) Missing precipitation data estimating using optimal proximity metric-based imputation, nearest-neighbor classification and cluster-based interpolation methods. Hydrol Sci J 59(11):2009–2026
Article Google Scholar
Teegavarapu RSV, Chandramouli V (2005) Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. J Hydrol 312:191–206
Article Google Scholar
Teegavarapu RSV, Tufail M, Irmsbee L (2009) Optimal functional forms for estimation of missing precipitation records. J Hydrol 374:106–115
Article Google Scholar
Teegavarapu RSV, Meskele T, Pathak CS (2012) Geo-spatial grid-based transformations of precipitation estimates using spatial interpolation methods. Comput Geosci 40(1):28–39
Article Google Scholar
Turkes M, Tatli H (2011) Use of the spectral clustering to determine coherent precipitation regions in Turkey for the period 1929–2007. J Climatol 31:2055–2067
Article Google Scholar
Vieux BE (2001) Distributed Hydrologic Modeling using GIS. Water Science and Technology Library. Kluwer Academic Publishers
Watson DF, Philip GM (1985) A refinement of inverse distance weighted interpolation. Geo-Processing 2:315–327
Google Scholar
Westerberg I, Walther A, Guerrero JL, Coello Z, Halldin S, Xu CY, Chen D, Lundin LC (2010) Precipitation data in a mountainous catchemnt in Honduras: quality assessment and spatiotemporal characteristics. Theor Appl Climatol 101:381–396
Article Google Scholar
Wilks DS (1995) Statistical methods in the atmospheric sciences. Academic Press
Xia Y, Fabian P, Stohl A, Winterhalter M (1999) Forest climatology: estimation of missing values for Bavaria, Germany. Agr Forest Meteorol 96:131–144
Article Google Scholar

Download references

Acknowledgments

This research is supported partially by the National Institute of Food and Agriculture, United States Department of Agriculture, under ID01507. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the United States Department of Agriculture.

Author information

Authors and Affiliations

Department of Biological and Agricultural Engineering, University of Idaho, 322 E. Front St., Boise, ID, 83702, USA
Jungjin Kim & Jae H. Ryu

Authors

Jungjin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jae H. Ryu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jae H. Ryu.

Appendix

1.1 Gauge Mean Estimator (GME)

The Gauge Mean Estimator (GME) is to use an arithmetic average of all SSs. It is a special case of Inverse Distance Weighting (IDW) and it is a similar method being used for the average precipitation estimation method presented by McCuen (1998). The estimation of missing precipitation is given by

$$ {X}_m = \frac{{\displaystyle {\sum}_{i=1}^{i=n}}\frac{N_m}{N_i}\ {X}_i}{n} $$

(11)

Where, X_m = estimated value at the TS_m, n = the number of stations, X_i = the observed value at SS_i, N_m and N_i = the average monthly precipitation at TS_m and SS_i, respectively.

1.2 Inverse Distance Weighting (IDW) Method

As proposed by Simanton and Osborn (1980), the Inverse Distance Weighting (IDW) is used to estimate the missing precipitation values and it is implemented by Watson and Philip (1985). IDW has been commonly used to fill the missing values. It is also widely used many water resources applications (ASCE 1996). For the precipitation estimation at TSs, IDW provides weighting values inversely depending on the distance between TS and SS. The missing values at TSs can be computed by Eq. (12).

$$ {X}_m = \frac{{\displaystyle {\sum}_{i=1}^n}{X}_i{d}_{mi}^{-k}}{{\displaystyle {\sum}_{i=1}^n}{d}_{mi}^{-k}} $$

(12)

Where, m = the selected TS, X_m = the estimated value at the TS m; n is the number of stations, X_i = the observed value at SS I, d_mi = the distance from the station i to station m, and k = referred to as friction distance (Vieux 2001) that ranges from 1.0 to 6.0. In this study, k = 2 is used (Teegavarapu et al. 2009).

1.3 Ordinary Kriging (OK) Method

The Ordinary Kriging (OK) is the standard approach for surface interpolation method based on scalar measurement at different locations (Journel and Huijbregts 1978; Isaake and Srivastava 1989). OK is spatially dependent variance (Vieux 2001). The degree of spatial dependence in OK method can be determined using a semivariogram. The weights of OK are based on the distance between SS and TS as well as. The equations for OK method to estimate missing values include Eqs. (13) – (15).

$$ {X}_m = {\displaystyle \sum_{i=1}^n}\delta\ {X}_i $$

(13)

$$ \updelta = {\tau}^{-1}\gamma $$

(14)

$$ \upgamma \left(\mathrm{d}\right) = \frac{1}{2\ n\ (d)}\ {\displaystyle \sum_{dij}}{\left({X}_i - {X}_j\right)}^2 $$

(15)

where, δ = the weight obtained from the fitted simivariogram, τ = the gamm matrix, which is the model semivariance for all sampled pairs, γ(d) = the semivariance which is defined over observations X _i and X _j lagged successively by distance d, n(d) = the number of distinct pairs in n(d), X _i and X _j = data values at spatial location I and j, respectively.

1.4 RMSE, MAE, and R

$$ RMSE = \sqrt{\frac{{\displaystyle {\sum}_{i = 1}^N}{\left({P}_{Si} - {P}_{Qi}\right)}^2}{N}} $$

(16)

$$ MAE = \frac{1}{N}\ {\displaystyle \sum_{i = 1}^N}\left|\ {P}_{Si} - {P}_{Qi}\right| $$

(17)

$$ R = \frac{\frac{1}{N}\times {\displaystyle {\sum}_{i=1}^N}\left({P}_{Qi}-{\overline{P}}_{Qi}\right)\times \left({P}_{Si}-{\overline{P}}_{Si}\right)}{\sqrt{\frac{N\times {\displaystyle {\sum}_{i=1}^N}{P}_{Qi}^2 - {\left({\displaystyle {\sum}_{i=1}^N}{P}_{Q1}\right)}^2}{N\times \left(N-1\right)}} \times \sqrt{\frac{N\times {\displaystyle {\sum}_{i=1}^N}{P}_{Si}^2 - {\left({\displaystyle {\sum}_{i=1}^N}{P}_{S1}\right)}^2}{N\times \left(N-1\right)}}\kern0.75em } $$

(18)

where, P_Qi = the observed precipitation data at time step I, P_Si = the estimated precipitation data, $ {\overline{P}}_{Qi}\kern0.5em and\ {\overline{P}}_{Si} $ = the mean of the observed and estimated precipitation data, respectively; N = the number of sample sizes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, J., Ryu, J.H. A Heuristic Gap Filling Method for Daily Precipitation Series. Water Resour Manage 30, 2275–2294 (2016). https://doi.org/10.1007/s11269-016-1284-z

Download citation

Received: 05 January 2016
Accepted: 01 March 2016
Published: 15 March 2016
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11269-016-1284-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Heuristic Gap Filling Method for Daily Precipitation Series

Abstract

Access this article

Similar content being viewed by others

Application of Machine Learning and Remote Sensing for Gap-filling Daily Precipitation Data of a Sparsely Gauged Basin in East Africa

Spatio-temporal estimation of climatic variables for gap filling and record extension using Reanalysis data

Missing Rainfall Daily Data: A Comparison Among Gap-Filling Approaches

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Gauge Mean Estimator (GME)

1.2 Inverse Distance Weighting (IDW) Method

1.3 Ordinary Kriging (OK) Method

1.4 RMSE, MAE, and R

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Heuristic Gap Filling Method for Daily Precipitation Series

Abstract

Access this article

Similar content being viewed by others

Application of Machine Learning and Remote Sensing for Gap-filling Daily Precipitation Data of a Sparsely Gauged Basin in East Africa

Spatio-temporal estimation of climatic variables for gap filling and record extension using Reanalysis data

Missing Rainfall Daily Data: A Comparison Among Gap-Filling Approaches

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Gauge Mean Estimator (GME)

1.2 Inverse Distance Weighting (IDW) Method

1.3 Ordinary Kriging (OK) Method

1.4 RMSE, MAE, and R

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation