Abstract
Missing data were frequently found in the instrumental climatic records, which hindered the statistical analyses on climate change. A novel imputation method, called Imputation Based on Decomposition of Time Series (IBDTS), was developed in this article for the climatic data with strong seasonality and spatial correlation. It was to decompose the time series into three components first, and then to predict the missing values in each component. The trend component was predicted by regression analysis, the seasonal component was predicted by spectral analysis, and the remainder component was predicted by spatial interpolation. The IBDTS imputation method showed relatively small errors in performance, and kept the real attributes of climatic series, including the amplitude and phase with the cycle period of 12 months, and the linear trend. The sensibility to station distance for the IBDTS method was relatively small. In addition, the IBDTS method had the ability to deal with the data with none of or only a few of complete series, and it was possible to be applied not only in the field of climatology but also in other fields as long as the data had the intrinsic properties of strong seasonality and spatial correlation.
Similar content being viewed by others
References
Alessio SM (2016) Digital signal processing and spectral analysis for scientists: concepts and applications. Springer, New York
Atabay D (2016) Pyrenn: first release (version v0.1). Zenodo. https://doi.org/10.5281/zenodo.45022
Bindoff NL, Stott PA, AchutaRao KM, Allen MR, Gillett N, Gutzler D, Hansingo K, Hegerl G, Hu Y, Jain S. Mokhov II, Overland J, Perlwitz J, Sebbari R, Zhang X (2013) Detection and attribution of climate change: from global to regional. In: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Stocker TF, Qin D, Plattner G-K, Tignor M, Allen SK, Boschung J, Nauels A, Xia Y, Bex V, Midgley PM (eds)]. Cambridge University Press, Cambridge, and New York
Brönnimann S, Brugnara Y, Allan RJ, Brunet M, Compo GP, Crouthamel RI, Jones PD, Jourdain S, Luterbacher J, Siegmund P, Valente MA, Wilkinson CW (2018) A roadmap to climate data rescue services. Geosci Data J 5:28–39. https://doi.org/10.1002/gdj3.56
Broyden CG (1970) The convergence of a class of double-rank minimization algorithms. IMA J Appl Math 6:76–90. https://doi.org/10.1093/imamat/6.1.76
Cao L, Zhu Y, Tang G, Yuan F, Yan Z (2016) Climatic warming in China according to a homogenized data set from 2419 stations. Int J Climatol 36:4384–4392. https://doi.org/10.1002/joc.4639
Delaunay B (1934) Sur la sphère vide. A la mémoire de Georges Voronoï. Bulletin de l’Académie des Sciences de l’URSS 6:793–800
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38
Deng Q, Fu Z (2019) Comparison of methods for extracting annual cycle with changing amplitude in climate series. Clim Dyn 52:5059–5070. https://doi.org/10.1007/s00382-018-4432-8
Deng Q, Nian D, Fu Z (2018) The impact of inter-annual variability of annual cycle on long-term persistence of surface air temperature in long historical records. Clim Dyn 50:1091–1100. https://doi.org/10.1007/s00382-017-3662-5
Dobesch H, Dumolard P, Dyras I (eds) (2007) Spatial interpolation for climate data: the use of GIS in climatology and meterology. ISTE, London
Domonkos P, Coll J (2019) Impact of missing data on the efficiency of homogenisation: experiments with ACMANTv3. Theor Appl Climatol 136:287–299. https://doi.org/10.1007/s00704-018-2488-3
Du Z, Wang Z, Wu S, Zhang F, Liu R (2020) Geographically neural network weighted regression for the accurate estimation of spatial non-stationarity. Int J Geogr Inf Sci 34:1353–1377. https://doi.org/10.1080/13658816.2019.1707834
Fischer MM, Getis A (eds) (2010) Handbook of applied spatial analysis: software tools, methods and applications. Springer, Heidelberg
Fletcher R (1970) A new approach to variable metric algorithms. Comput J 13:317–322. https://doi.org/10.1093/comjnl/13.3.317
Fletcher R (1987) Practical methods of optimization, 2nd edn. Wiley, New York
Ford BL (1983) An overview of hot-deck procedures. Incomplete Data in Sample Surveys. 2:185–207
García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput & Applic 19:263–282. https://doi.org/10.1007/s00521-009-0295-6
Goldfarb D (1970) A family of variable-metric methods derived by variational means. Math Comput 24:23–26. https://doi.org/10.1090/S0025-5718-1970-0258249-6
Grewal MS, Andrews AP (2008) Kalman filtering: theory and practice using MATLAB, 3rd edn. Wiley, Hoboken
Haghighi AD (2014) Numerical optimization: understanding L-BFGS. URL: http://aria42.com/blog/2014/12/understanding-lbfgs. Accessed 2 Dec 2014
Hopke PK, Liu C, Rubin DB (2001) Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic. Biometrics 57:22–33
Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice, 2nd edn. OTexts, Melbourne
Kabacoff RI (2015) R in action: data analysis and graphics with R, 2nd edn. Manning, Shelter Island
Kang HM, Yusof F, Mohamad I (2012) Imputation of missing data with different missingness mechanism. Jurnal Teknologi 57:57-67. https://doi.org/10.11113/jt.v57.1523
Kendall MG (1976) Time-series, 2nd edn. Griffin, London
Kisaka MO, Mucheru-Muna M, Ngetich FK, Mugwe J, Mugendi D, Mairura F, Shisanya C, Makokha GL (2016) Potential of deterministic and geostatistical rainfall interpolation under high rainfall variability and dry spells: case of Kenya’s central highlands. Theor Appl Climatol 124:349–364. https://doi.org/10.1007/s00704-015-1413-2
Li J, Heap AD (2014) Spatial interpolation methods applied in the environmental sciences: a review. Environ Model Softw 53:173–189. https://doi.org/10.1016/j.envsoft.2013.12.008
Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, Hoboken
Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comp Sci Rev 3:127–149. https://doi.org/10.1016/j.cosrev.2009.03.005
Luo Y, Cai X, Zhang Y, Xu J, Yuan X (2018) Multivariate time series imputation with generative adversarial networks. In: 32nd Conference on Neural Information Processing Systems. Montréal, Canada
Massetti L (2014) Analysis and estimation of the effects of missing values on the calculation of monthly temperature indices. Theor Appl Climatol 117:511–519. https://doi.org/10.1007/s00704-013-1024-8
Moritz S, Bartz-Beielstein T (2017) imputeTS: time series missing value imputation in R. R J 9:207–218. https://doi.org/10.32614/RJ-2017-009
Moskowitz MA (2002) A course in complex analysis in one variable. World Scientific, River Edge
Mudelsee M (2014) Climate time series analysis: classical statistical and bootstrap methods, 2nd edn. Springer, New York
Myers DE (1994) Spatial interpolation: an overview. Geoderma 62:17–28. https://doi.org/10.1016/0016-7061(94)90025-6
Navarra A, Simoncini V (2010) A guide to empirical orthogonal functions for climate data analysis. Springer, Dordrecht
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, New York
Pasini A (2015) Artificial neural networks for small dataset analysis. J Thoracic Dis 7:953–960. https://doi.org/10.3978/j.issn.2072-1439.2015.04.61
Philip GM, Watson DF (1982) A precise method for determining contoured surfaces. Appea J 22:205–212. https://doi.org/10.1071/AJ81016
Proakis JG, Manolakis DG (1996) Digital signal processing: principles, algorithms, and applications, 3rd edn. Prentice-Hall, Upper Saddle River
Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Clim 14:853–871. https://doi.org/10.1175/1520-0442(2001)014<0853:AOICDE>2.0.CO;2
Shen SSP, Somerville RCJ (2019) Climate mathematics: theory and applications, 1st edn. Cambridge University Press, Cambridge
Shumway RH, Stoffer DS (2017) Time series analysis and its applications: with R examples, 4th edn. Springer, New York
Simolo C, Brunetti M, Maugeri M, Nanni T (2010) Improving estimation of missing values in daily precipitation series by a probability density function-preserving approach. Int J Climatol 30:1564–1576. https://doi.org/10.1002/joc.1992
Smith SW (1999) The scientist and engineer’s guide to digital signal processing, 2nd edn. California Technical Publishing, San Diego
Stooksbury DE, Idso CD, Hubbard KG (1999) The effects of data gaps on the calculated monthly mean maximum and minimum temperatures in the continental United States: a spatial and temporal study. J Clim 12:1524–1533. https://doi.org/10.1175/1520-0442(1999)0122.0.CO;2
van Buuren S (2012) Flexible imputation of missing data, 2nd edn. Chapman and Hall/CRC, Boca Raton
Vincent LA, Wang XL, Milewska EJ, Wan H, Yang F, Swail V (2012) A second generation of homogenized Canadian monthly surface air temperature for climate trend analysis. J Geophys Res 117:D18110. https://doi.org/10.1029/2012JD017859
von Storch H, Zwiers FW (1999) Statistical analysis in climate research. Cambridge University Press, Cambridge
Wallace JM, Hobbs PV (2006) Atmospheric science: an introductory survey, 2nd edn. Elsevier Academic Press, Amsterdam
Wang XL, Swail VR (2001) Changes of extreme Wave Heights in northern hemisphere oceans and related atmospheric circulation regimes. J Clim 14:2204–2221. https://doi.org/10.1175/1520-0442(2001)014<2204:COEWHI>2.0.CO;2
Watson DF, Philip GM (1985) A refinement of inverse distance weighted interpolation. Geoprocessing 2:315–327
Wilks DS (2019) Statistical methods in the atmospheric sciences, 4th edn. Elsevier, Cambridge
Willmott CJ, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res 30:79–82. https://doi.org/10.3354/cr030079
Xu C, Wang J, Hu M, Li Q (2013) Interpolation of missing temperature data at meteorological stations using P-BSHADE. J Clim 26:7452–7463. https://doi.org/10.1175/JCLI-D-12-00633.1
Zhang Z (2018) Multivariate time series analysis in climate and environmental research. Springer International Publishing, Cham
Acknowledgements
This work was supported by the Chinese Ministry of Science and Technology (MOST) National Key R&D Program (No.2018YFA0605603) and the Science Foundation Program of Guangxi University of Science and Technology (No.1711311). The authors thank Yunxin Huang, Tianlin Zhai, and Huqiang Qin for their kind assistance during manuscript writing, and the reviewers for providing constructive comments, which greatly improved this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qin, Y., Ren, G., Zhang, P. et al. An imputation method for the climatic data with strong seasonality and spatial correlation. Theor Appl Climatol 144, 203–213 (2021). https://doi.org/10.1007/s00704-021-03537-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00704-021-03537-9