Abstract
Time-series prediction has been widely studied and applied in various fields. For the time series with high acquisition frequency and high noise, it is very difficult to establish a prediction model directly. Therefore, it is necessary to study how to obtain the change trend information of time series accurately, and then build a prediction model for its change trend. To obtain the change trend information of the original time series effectively and establish an accurate prediction model, this paper proposes a novel prediction method of complex univariate time series based on K-means clustering. This method first obtains the change trend information of the original time series based on the K-means clustering idea, and then, a gated recurrent unit based on the input attention mechanism is used to establish a prediction model for the obtained time-series change trend information. Extensive experiments on the electromagnetic radiation dataset we collected, the AEP_hourly dataset, and the Wind Turbine Scada dataset published online, demonstrate that our proposed K-means clustering method can effectively reduce noise interference and accurately obtain the time-series change trend information. Comparative experiments of different prediction models demonstrate that our prediction model has the best prediction accuracy, and our proposed complex univariate time-series prediction algorithm has great practical value.
Similar content being viewed by others
References
Achanta S, Gangashetty SV (2017) Deep elman recurrent neural networks for statistical parametric speech synthesis. Speech Commun 93:31–42
Arthur D, Vassilvitskii S (2007) K-means: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, New Orleans, Louisiana, pp 1027–1035
Asteriou D, Hall S (2016) ARIMA models and the box-jenkins methodology, pp 275–296
Baek Y, Kim HY (2018) ModAugNet: a new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Syst Appl 113:457–480
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Brockwell PJ, Davis RA (1989) Time series: theory and methods. Technometrics 31(1):121
Bustamam A, Puspa SD, Siswantining T (2018) Implementation of co-similarity measure on microarray data of lymphoma using K-means partition algorithm. AIP Conf Proc 2023(1):20221–20222
Chang Y, Sun F, Wu Y, et al (2018) A memory-network based solution for multivariate time-series forecasting. arXiv:1809.02105
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. arXiv:1601.06733
Cho K, van Merrienboer B, Gulcehre C et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv: 1406.1078
Choi H (2019) Persistent hidden states and nonlinear transformation for long short-term memory. Neurocomputing 331:458–464
Chuanmin M, Yue L, Sifeng L et al (2018) An ensemble telecom customers clustering model based on grey incidence and K-means. J Grey Syst 30(4):47–59
Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach Learn 7(2–3):195–225
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):1–34
Frigola-Alcalde R (2016) Bayesian time series learning with Gaussian processes
Fu T (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Har-Peled S, Kushal A (2005) Smaller coresets for k-median and K-means clustering. In: Proceedings of the twenty-first annual symposium on computational geometry, Pisa, Italy, ACM, pp 1027–1035
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hong-Sen Y, Nan-Yun J, Wen-Wu S et al (2017) Product price forecasting based on correlative price net and neural networks. Int J Ind Eng 24(3):306–327
Kingma DP, Adam BJ (2014) A method for stochastic optimization. arXiv:1412.6980
Li Y, Wu H, Liu H (2018) Multi-step wind speed forecasting using EWT decomposition, LSTM principal computing, RELM subordinate computing and IEWT reconstruction. Energy Convers Manag 167:203–219
Liang C, Hao H (2017) Research on distributed data mining technology based on K-mean algorithm. Rev Facult Ingen 32(5):291–298
Mhammedi Z, Hellicar A, Rahman A et al (2016) Recurrent neural networks for one day ahead prediction of stream flow. In: Proceedings of the workshop on time series analytics and applications, Hobart, TAS, Australia, ACM, pp 25–31
Morrison GL, Hall KR, Holste JC et al (1994) Comparison of orifice and slotted plate flowmeters. Flow Meas Instrum 5:71–77
PJM Hourly Energy Consumption Data. https://www.kaggle.com/robikscube/hourly-energy-consumption
Qin Y, Song D, Chen H et al (2017) A dual-stage attention-based recurrent neural network for time series prediction, pp 2627–2633
Roberts S, Osborne M, Ebden M et al (2012) Gaussian processes for time-series modeling. Philos Trans Ser A Math Phys Eng Sci 371(1984):20110550
Sun L, Yang X, Zhou J et al (2018) Echo state network with multiple loops reservoir and its application in network traffic prediction. In: 2018 IEEE 22nd international conference on computer supported cooperative work in design (CSCWD), IEEE, pp 689–694
Tang L, Pan H, Yao Y (2018) PANK-A financial time series prediction model integrating principal component analysis, affinity propagation clustering and nested k-nearest neighbor regression. J Interdiscip Math 21(3):717–728
Wang EY, He XQ, Nie BS, Liu ZT (2000) Principle of predicting coal and gas outburst using electromagnetic emission. J China Univ Min Technol 3:3–7
Warren LT (2005) Clustering of time series data—a survey. Pattern Recogn 38(11):1857–1874
Wind Turbine Scada Data. https://www.kaggle.com/berkerisen/wind-turbine-scada-dataset
Whittle P (1951) Hypothesis testing in time series analysis. PhD thesis
Yu R, Gao J, Yu M et al (2019) LSTM-EFG for wind power forecasting based on sequential correlation features. Future Gener Comput Syst 93:33–42
Zhang B, Ren H, Huang G et al (2019) Predicting blood pressure from physiological index data using the SVR algorithm. BMC Bioinform 20(1):1–15
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Nos. 61672522 and 61976216).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Informed consent
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the 1975 Declaration of Helsinki, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.
Human and animal rights
This article does not contain any studies with human or animal subjects performed by the any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Y., Ding, S. & Jia, W. A novel prediction method of complex univariate time series based on k-means clustering. Soft Comput 24, 16425–16437 (2020). https://doi.org/10.1007/s00500-020-04952-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-04952-2