Abstract
With the increasing number of installed hydrological sensors, the data from these sensors usually contain a variety of abnormal values due to network congestion, equipment failure, or environmental influence. To deal with the anomaly on a larger scale of hydrological sensor data, a series of algorithms have been proposed. However, they are usually based on the ideas of distance or classification, which usually bring pretty high time complexity. To solve this problem, a detection algorithm called AR-iForest is proposed. It is an algorithm for hydrological time series anomaly detection based on the isolation forest. Firstly, the features of hydrological data are extracted and mapped it to a high-dimensional space. Before using the isolation forest in high-dimensional space for anomaly detection, the Auto-Regressive model is used first to predict the current data and calculate the confidence interval. Only the data not in the confidence interval needs to be detected. Secondly, a measure of the effectiveness of trees in the isolation forest is proposed. This method selects the tree with the best classification effect through continuous iteration. Finally, the proposed algorithm is integrated into the window of the big data platform Flink to give a performance evaluation. The experimental results show that the proposed algorithm increases the AUC value from 90.60% to 96.72%, and the detection time is reduced by 52.23%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wu, D.: Research and Application of Hydrological Time Series Similarity Pattern. HoHai University, pp. 1–2 (2007)
Talagala, P.D., Hyndman, R.J., Miles, K.S., Kandanaarachchi, S., Muñoz, M.A.: Anomaly detection in streaming nonstationary temporal data. JCGS 29(1), 13–27. https://doi.org/10.1080/10618600.2019.1617160
Sun, J.S., Lou, Y.S., Chen, Y.J.: Outlier detection of hydrological time series based on ARIMA-SVR Model. Comput. Digit. Eng. 02, 225–230 (2018)
Vy, N.D.K., Anh, D.T.: Detecting variable length anomaly patterns in time series data. In: Proceedings of DMBD, Bali Island, Indonesia, June 2016, pp. 279–287 (2016)
Yu, Y.F., Zhu, Y.L., Wan, D.S.: Time series outlier detection based on sliding window prediction. J. Comput. Appl. 34(8), 2217–2220 (2014)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Proceedings of ICDM, Pisa, Italy, December 2008, pp. 413–422. https://doi.org/10.1109/icdm.2008.17
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flinkTM: stream and batch processing in a single engine. In: Proceedings of ICDE, Seoul, South Korea, vol. 38, no. 4, pp. 28–38 (2015)
Toliopoulos, T., Gounaris, A., Tsichlas, K., Papadopoulos, A., Sampaio, S.: Continuous outlier mining of streaming data in flink (2019). arXiv:1902.07901
Bandaragoda, T.R., Ting, K.M., Albrecht, D., Liu, F.T., Wells, J.R.: Efficient anomaly detection by isolation using nearest neighbour ensemble. In: Proceedings of ICDMW, Shenzhen, China, pp. 698–705 (2014). https://doi.org/10.1109/icdmw.2014.70
Xu, D., Wang, Y., Meng, Y., Zhang, Z.: An improved data anomaly detection method based on isolation forest. In: Proceedings of ISCID, HangZhou, China, December 2017 (2017)
Ding, Z.G., Fei, M.R.: An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. In: Proceedings of ICONS, ChengDu, China, September 2013, pp. 12–17 (2013)
Aryal, S., Ting, K.M., Wells, J.R., Washio, T.: Improving iForest with relative mass. In: Proc. PAKDD, TaiWan, China, May 2014, pp. 510–521 (2014)
Zou, Z., Xie, Y., Huang, K., Xu, G., Feng, D.¸ Long, D.: A docker container anomaly monitoring system based on optimized isolation forest. In: IEEE TCC, to be published. https://doi.org/10.1109/tcc.2019.2935724
Ma, Y., Zhang, Q., Ding, J., Wang, Q., Ma, J.: Short term load forecasting based on iForest-LSTM. In: Proceedings of ICIEA, Xi’an, China, pp. 2278–2282 (2019)
Apache Kafka. https://kafka.apache.org/
Karimov, J., Rabl, T., Katsifodimos, A., Samarev, R., Heiskanen, H., Markl, V.: Benchmarking distributed stream data processing systems. In: Proceedings of ICDE, Paris, pp. 1507–1518 (2018)
Acknowledgments
This work is partly supported by the Fundamental Research Funds for the Central Universities B200202185, 2018 Jiangsu Province Key Research and Development Program (Modern Agriculture) Project under Grant No. BE2018301, 2017 Jiangsu Province Postdoctoral Research Funding Project under Grant No. 1701020C, 2017 Six Talent Peaks Endorsement Project of Jiangsu under Grant No. XYDXX-078, Research on the Analysis System of Hydrological Big Data under Grant No. 818116816.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Shao, P., Ye, F., Liu, Z., Wang, X., Lu, M., Mao, Y. (2020). Improving iForest for Hydrological Time Series Anomaly Detection. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12454. Springer, Cham. https://doi.org/10.1007/978-3-030-60248-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-60248-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60247-5
Online ISBN: 978-3-030-60248-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)