Abstract
Accurate and complete data are crucial for climate, environmental, water, and agricultural research. Any record of data that is contaminated with errors should be considered missing and reconstructed. Pollution in climate data can lead to systematic errors, such as polluted outlier data. Simply removing outlier data is not a reliable method, and it is important to perform quality control checks to determine the reliability of the data. While methods for detecting outlier data have received significant attention from researchers, less investigation has been conducted on determining the pollution of outlier data. We propose methods for quality control and reconstruction of incomplete rainfall data using data from 141 stations in the Qaraqhum basin in northeastern Iran. We performed checks for gross errors, temporal consistency, and outlier data. As we observed that the probability distribution of monthly precipitation had a skewness shape, we utilized a robust 3σ-rule to detect outlier values. We propose the use of information such as the number of daily precipitation events per month, maximum monthly rainfall, and standardized monthly rainfall (based on robust 3σ-rule) to detect pollution of outlier values. Additionally, we performed a spatial–temporal comparison to determine the difference between no record and no occurrence of precipitation. For data reconstruction, we used the "mice" package in R, which imputes data using chain equations. We investigated the performance of five functions available in the mice package, and the results showed that the "norm.nob" method had the best performance, while the "sample" and "mean" methods had the weakest performance.
Graphical Abstract
Similar content being viewed by others
Availability of Data and Materials
We are pleased to submit our manuscript entitled “Enhancing Rainfall Data Consistency and Completeness: A Spatiotemporal Quality Control Approach and Missing Data Reconstruction Using MICE on Large Precipitation Datasets” to be considered for publication as an original paper. The data that support the findings of this study are available from [Meteorological Organization (https://www.irimo.ir/) and Ministry of Energy] but restrictions apply to the availability some of daily rainfall, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of [Meteorological Organization (https://www.irimo.ir/) and Ministry of Energy].
Notes
Median of Absolute Deviation.
References
Aieb A, Madani K, Scarpa M, Bonaccorso B, Lefsih K (2019) A new approach for processing climate missing databases applied to daily rainfall data in Soummam watershed. Algeria. Heliyon 5(2):e01247. https://doi.org/10.1016/j.heliyon.2019.e01247
Barrios A, Trincado G, Garreaud R (2018) Alternative approaches for estimating missing climate data: Application to monthly precipitation records in South-Central Chile. Forest Ecosyst 5(1):1–10. https://doi.org/10.1186/s40663-018-0147-x
Daly C, Gibson W, Doggett M, Smith J, Taylor G (2004) A probabilistic-spatial approach to the quality control of climate observations. Proc AMS Conf Appl Climatol Am Meteorol Soc, Seattle, WA
Estévez J, Llabrés-Brustenga A, Casas-Castillo MC, García-Marín AP, Kirchner R, Rodríguez-Solà R (2022) A quality control procedure for long-term series of daily precipitation data in a semiarid environment. Theoret Appl Climatol 149(3–4):1029–1041
Farzandi M, Sanaeinejad H, Rezaei-Pazhan H, Sarmad M (2021) Improving estimation of missing data in historical monthly precipitation by evolutionary methods in the semi-arid area. Environ Dev Sustain 1–20. https://doi.org/10.1007/s10668-021-01784-4
Gentilucci M, Barbieri M, Burt P, D’Aprile F (2018) Preliminary data validation and reconstruction of temperature and precipitation in Central Italy. Geosciences 8(6):202. https://doi.org/10.3390/geosciences8060202
Ha JH, Kim YH, Im HH, Kim NY, Sim S, Yoon Y (2018) Error correction of meteorological data obtained with Mini-AWSs based on machine learning. Adv Meteorol. https://doi.org/10.1155/2018/7210137
Jakobsen JC, Gluud C, Wetterslev J, Winkel P (2017) When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC Med Res Methodol 17(1):1–10. https://doi.org/10.1186/s12874-017-0442-1
Khampuengson T, Wang W (2023) Novel methods for imputing missing values in water level monitoring data. Water Resour Manag 37(2):851–878
Kim HJ, Park SM, Choi BJ, Moon SH, Kim YH (2020) Spatiotemporal approaches for quality control and error correction of atmospheric data through machine learning. Comput Intell Neurosci. https://doi.org/10.1155/2020/7980434
Martinez-Villalobos C, Neelin JD (2019) Why do precipitation intensities tend to follow gamma distributions? J Atmos Sci 76(11):3611–3631. https://doi.org/10.1175/JAS-D-18-0343.1
Rezaee-Pazhand H, Ghahraman B (2006) Estimating maximum daily precipitation by multi-station method: A case study of North Khorasan. Iran-Water Resour Res 2(1):45–53. (In Persian)
Rezaei-Pazhand H (2001) Application of statistics and probability in water resources. Islamic Azad University of Mashhad pub
Rezaee-Pazhand H (2023) Linear regression analysis for hydrometeorological studies (Under publishing, in Persian)
Sciuto G, Bonaccorso B, Cancelliere A, Rossi G (2009) Quality control of daily rainfall data with neural networks. J Hydrol 364(1–2):13–22. https://doi.org/10.1016/j.jhydrol.2008.10.008
Senatore A, Hejabi S, Mendicino G, Bazrafshan J, Irannejad P (2019) Climate conditions and drought assessment with the Palmer Drought Severity Index in Iran: Evaluation of CORDEX South Asia climate projections (2070–2099). Clim Dyn 52:865–891. https://doi.org/10.1007/s00382-018-4171-x
Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res Atmos 106(D7):7183–7192. https://doi.org/10.1029/2000JD900719
Van Buuren S (2018) Flexible imputation of missing data. CRC Press
Van Buuren S, Groothuis-Oudshoorn K (2011) mice: Multivariate imputation by chained equations in R. J Stat Softw 45:1–67. https://doi.org/10.18637/jss.v045.i03
Acknowledgements
The guidance and encouragement of our dear professor, Hojat Rezaee Pazhand are thanked and her memory is cherished.
Author information
Authors and Affiliations
Contributions
Nafiseh SeyyedNezhad and Mahboobeh Farzandi performed the study conception, material preparation and design. Nafiseh Seyyed Nezhad performed data collection and analysis and wrote the first draft of the manuscript. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethical Approval
Not applicable.
Consent to Participate
Not applicable.
Consent to Publish
Not applicable.
Competing Interests
“The authors have no relevant financial or non-financial interests to disclose.” We certify that there is no actual or potential conflict of interest in relation to this article. The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Highlights
• Not all outlier data is corrupted and should not be immediately removed.
• In arid climates, monthly precipitation often has a skewed probability distribution.
• A robust 3σ-rule is recommended for detecting outlier data in monthly precipitation.
• "norm.nob" function is the best method of mice package to reconstruct missing data.
• "sample" and "mean" functions of mice package have the weakest performance.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Golkhatmi, N.S.N., Farzandi, M. Enhancing Rainfall Data Consistency and Completeness: A Spatiotemporal Quality Control Approach and Missing Data Reconstruction Using MICE on Large Precipitation Datasets. Water Resour Manage 38, 815–833 (2024). https://doi.org/10.1007/s11269-023-03567-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11269-023-03567-0