Abstract
Groundwater resources are crucial sources of water supply, and preserving the quality of these resources is an undeniable necessity. On the other hand, the lack of complete time series data in the observation wells is one of the major limitations in the studies of water resources. In the present study, 17 observation wells were selected at the Asadabad plain, and sampled Electrical Conductivity (EC) statistics and information were prepared. Multivariate regression models, ARIMAX and SVM were considered to simulate and fill the missing EC data of Dehnoush and Biaj wells. Six input structures for the models were determined based on the highest correlation coefficient value, which was defined using the multiple time series method at different delays between the EC of Dehnoush and Biaj stations and that of other stations. The results indicated that increasing the number of inputs to the model reduced the error rate and increased the correlation rate in the simulation. These coefficients reached their minimum value in Structure 6 for the RMSE and NRMSE criteria and their maximum value for the Pearson coefficient. Therefore, to fill in the missing data, other stations' data with the highest correlation at different delays were used instead of using the information of adjacent stations without delay. The results revealed that while the SVM and regression models had relatively similar simulation accuracy, the SVM model exhibited higher accuracy compared to the regression and ARIMAX models.
Similar content being viewed by others
Availability of data and materials
Some or all data are available from the corresponding author upon reasonable request.
Code availability
Models or codes that support the findings of this study are available from the corresponding author upon reasonable request.
References
Abebe AJ, Solomatine DP, Venneker RGW (2000) Application of adaptive fuzzy rule-based models for reconstruction of missing precipitation events. Hydrol Sci J 45(3):425–436. https://doi.org/10.1080/02626660009492339
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/BF00994018
Coutinho ER, Silva RMD, Madeira JGF, Coutinho PRDODS, Boloy RAM, Delgado ARS (2018) Application of artificial neural networks (ANNs) in the gap filling of meteorological time series. Revista Brasileira De Meteorologia 33:317–328. https://doi.org/10.1590/0102-7786332013
Dibike YB, Velickov S, Solomatine D, Abbott MB (2001) Model induction with support vector machines: introduction and applications. J Comput Civ Eng 15(3):208–216. https://doi.org/10.1061/(ASCE)0887-3801(2001)15:3(208)
Ghobadi A, Cheraghi M, Sobhanardakani S, Lorestani B, Merrikhpour H (2022) Groundwater quality modeling using a novel hybrid data-intelligence model based on gray wolf optimization algorithm and multi-layer perceptron artificial neural network: a case study in Asadabad Plain Hamedan, Iran. Environ Sci Pollut Res 29(6):8716–8730
Gong Y, Wang Z, Xu G, Zhang Z (2018) A comparative study of groundwater level forecasting using data-driven models based on ensemble empirical mode decomposition. Water 10(6):730. https://doi.org/10.3390/w10060730
Guo B, Gunn SR, Damper RI, Nelson JD (2008) Customizing kernel functions for SVM-based hyperspectral image classification. IEEE Trans Image Process 17(4):622–629. https://doi.org/10.1109/TIP.2008.918955
Haghiabi AH, Nasrolahi AH, Parsaie A (2018) Water quality prediction using machine learning methods. Water Qual Res J 53(1):3–13. https://doi.org/10.2166/wqrj.2018.025
Isa Zadeh M, Mohammadi P, Dean Pujuh J (2018) Evaluation of artificial neural network models and multiple linear regression in estimation of missing daily flow data (Case Study: Sante Hydrometric Station-Kurdistan Province). J Water Soil Sci Agric Sci Technol Isfahan Univ Technol 21(4):143–159. https://doi.org/10.29252/jstnar.21.4.143. ((In Persian))
Kavzoglu T, Colkesen I (2009) A kernel functions analysis for support vector machines for land cover classification. Int J Appl Earth Obs Geoinf 11(5):352–359. https://doi.org/10.1016/j.jag.2009.06.002
Kirchner PB, Bales RC, Molotch NP, Flanagan J, Guo Q (2014) LiDAR measurement of seasonal snow accumulation along an elevation gradient in the southern Sierra Nevada. Calif Hydrol Earth Syst Sci 18(10):4261–4275. https://doi.org/10.5194/hess-18-4261-2014
Lal A, Datta B (2018) Development and implementation of support vector machine regression surrogate models for predicting groundwater pumping-induced saltwater intrusion into coastal aquifers. Water Resour Manage 32(7):2405–2419. https://doi.org/10.1007/s11269-018-1936-2
Lopes Martins L, Martins WA, Rodrigues ICDA, Freitas Xavier AC, Moraes JFLD, Blain GC (2022) Gap-filling of daily precipitation and streamflow time series: a method comparison at random and sequential gaps. Hydrol Sci J. https://doi.org/10.1080/02626667.2022.2145200
Matini M, Shamsi ET, Maghsood AH (2017) The parasitic contamination of farm vegetables in Asadabad City, West of Iran, in 2014. Avicen J Clin Microbiol Infect 4(1):32474. https://doi.org/10.17795/ajcmi-32474
Moradi S, Jafary H, Safari M (2022) Identification of controller processes groundwater quality. Water Soil Sci 32(4):1–15. https://doi.org/10.22034/WS.2021.43618.2395
Park J, Müller J, Arora B, Faybishenko B, Pastorello G, Varadharajan C, Sahu R, Agarwal D (2023) Long-term missing value imputation for time series data using deep neural networks. Neural Comput Appl 35(12):9071–9091
Patro S, Sahu KK (2015) Normalization: a preprocessing stage. arXiv preprint arXiv:1503.06462.
Pektaş AO, Cigizoglu HK (2013) ANN hybrid model versus ARIMA and ARIMAX models of runoff coefficient. J Hydrol 500:21–36. https://doi.org/10.1016/j.jhydrol.2013.07.020
Peter Ď, Silvia P (2012) ARIMA vs. ARIMAX–which approach is better to analyze and forecast macroeconomic time series. In: Proceedings of 30th International Conference Mathematical Methods in Economics 2, 136–140
Rafiee M, Akhond Ali AM, Moazed H, Lyon SW, Jaafarzadeh N, Zahraie B (2014) A case study of water quality modeling of the Gargar River, Iran. J Hydraul Struct 1(2):10–22. https://doi.org/10.22055/JHS.2014.10533
Regional Water Company of Hamadan (2007) Groundwater report of Asadabad Aquifer, Hamedan. 74 (In Persian)
Riad S, Mania J, Bouchaou L, Najjar Y (2004) Rainfall-runoff model using an artificial neural network approach. Math Comput Model 40(7–8):839–846. https://doi.org/10.1016/j.mcm.2004.10.012
Shiri J, Kisi O, Yoon H, Lee KK, Nazemi AH (2013) Predicting groundwater level fluctuations with meteorological effect implications—a comparative study among soft computing techniques. Comput Geosci 56:32–44. https://doi.org/10.1016/j.cageo.2013.01.007
Soltani A, Mirzababaei M (2019) Discussion on “Effects of lime addition on geotechnical properties of sedimentary soil in Curitiba, Brazil”[J Rock Mech Geotech Eng 10 (2018) 188–194]. J Rock Mech Geotech Eng 11(1):214–218. https://doi.org/10.1016/j.jrmge.2018.08.008
Suykens JA, Van Gestel T, De Brabanter J, De Moor B, Vandewalle JP (2002) Least squares support vector machines. In: Copyright by World Scientific Publishing Co. Pie. Ltd., pp.58, 72–75, 98–99.
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Wanderley HS, Amorim RFCD, Carvalho FOD (2014) Interpolação espacial de dados médios mensais pluviométricos com redes neurais artificiais. Revista Brasileira De Meteorologia 29:389–396. https://doi.org/10.1590/0102-778620130639
Wongsathan R, Chankham S (2016) Improvement on PM-10 forecast by using hybrid ARIMAX and neural networks model for the summer season in Chiang Mai. Proc Comput Sci 86:277–280. https://doi.org/10.1016/j.procs.2016.05.062
You J, Hubbard KG, Goddard S (2004) Comparison of estimates from spatial regression and inverse distance method. J Atmos Oceanic Tech. https://doi.org/10.1002/joc.1571
Zhou T, Wang F, Yang Z (2017) Comparative analysis of ANN and SVM models combined with wavelet preprocess for groundwater depth prediction. Water 9(10):781. https://doi.org/10.3390/w9100781
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
HN designed and directed the project and supervised the work. AV performed SVM model to simulate and fill in the missing data. FF performed ARIMAX model to simulate and fill in the missing data. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nozari, H., Vanaei, A. & Faraji, F. Gap-filling missing data in time series using the correlation matrix method of multiple time series in Asadabad Plain, Iran. Sustain. Water Resour. Manag. 9, 194 (2023). https://doi.org/10.1007/s40899-023-00977-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40899-023-00977-1