HAIS 2014: Hybrid Artificial Intelligence Systems pp 697-708 | Cite as
YASA: Yet Another Time Series Segmentation Algorithm for Anomaly Detection in Big Data Problems
Abstract
Time series patterns analysis had recently attracted the attention of the research community for real-world applications. Petroleum industry is one of the application contexts where these problems are present, for instance for anomaly detection. Offshore petroleum platforms rely on heavy turbomachines for its extraction, pumping and generation operations. Frequently, these machines are intensively monitored by hundreds of sensors each, which send measurements with a high frequency to a concentration hub. Handling these data calls for a holistic approach, as sensor data is frequently noisy, unreliable, inconsistent with a priori problem axioms, and of a massive amount. For the anomalies detection problems in turbomachinery, it is essential to segment the dataset available in order to automatically discover the operational regime of the machine in the recent past. In this paper we propose a novel time series segmentation algorithm adaptable to big data problems and that is capable of handling the high volume of data involved in problem contexts. As part of the paper we describe our proposal, analyzing its computational complexity. We also perform empirical studies comparing our algorithm with similar approaches when applied to benchmark problems and a real-life application related to oil platform turbomachinery anomaly detection.
Keywords
Time series segmentation anomaly detection big data oil industry applicationPreview
Unable to display preview. Download preview PDF.
References
- 1.Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41(3), 15 (2009)CrossRefGoogle Scholar
- 2.DeCoste, D.: Mining multivariate time-series sensor data to discover behavior envelopes. In: KDD, pp. 151–154 (1997)Google Scholar
- 3.Hawkins, D.M.: Identification of outliers, vol. 11. Springer (1980)Google Scholar
- 4.Yairi, T., Kato, Y., Hori, K.: Fault detection by mining association rules from house-keeping data. In: Proc. of International Symposium on Artificial Intelligence, Robotics and Automation in Space, vol. 3. Citeseer (2001)Google Scholar
- 5.Keogh, E., Chu, S., Hart, D., Pazzani, M.: Segmenting time series: A survey and novel approach. Data Mining in Time Series Databases 57, 1–22 (2004)CrossRefGoogle Scholar
- 6.Abraham, A.: Special issue: Hybrid approaches for approximate reasoning. Journal of Intelligent and Fuzzy Systems 23(2-3), 41–42 (2012)MathSciNetGoogle Scholar
- 7.Borrajo, M.L., Baruque, B., Corchado, E., Bajo, J., Corchado, J.M.: Hybrid neural intelligent system to predict business failure in small-to-medium-size enterprises. International Journal of Neural Systems 21(04), 277–296 (2011)CrossRefGoogle Scholar
- 8.Bouchard, D.: Automated time series segmentation for human motion analysis. Center for Human Modeling and Simulation, University of Pennsylvania (2006)Google Scholar
- 9.Bingham, E., Gionis, A., Haiminen, N., Hiisilä, H., Mannila, H., Terzi, E.: Segmentation and dimensionality reduction. In: SDM. SIAM (2006)Google Scholar
- 10.Lemire, D.: A better alternative to piecewise linear time series segmentation. In: SDM. SIAM (2007)Google Scholar
- 11.Hunter, J., McIntosh, N.: Knowledge-based event detection in complex time series data. In: Horn, W., Shahar, Y., Lindberg, G., Andreassen, S., Wyatt, J.C. (eds.) AIMDM 1999. LNCS (LNAI), vol. 1620, pp. 271–280. Springer, Heidelberg (1999)CrossRefGoogle Scholar
- 12.Vlachos, M., Lin, J., Keogh, E., Gunopulos, D.: A wavelet-based anytime algorithm for k-means clustering of time series. In: Proc. Workshop on Clustering High Dimensionality Data and Its Applications. Citeseer (2003)Google Scholar
- 13.Bollobás, B., Das, G., Gunopulos, D., Mannila, H.: Time-series similarity problems and well-separated geometric sets. In: Proceedings of the Thirteenth Annual Symposium on Computational Geometry, pp. 454–456. ACM (1997)Google Scholar
- 14.Feder, P.I.: On asymptotic distribution theory in segmented regression problems–identified case. The Annals of Statistics, 49–83 (1975)Google Scholar
- 15.Hinkley, D.V.: Inference in two-phase regression. Journal of the American Statistical Association 66(336), 736–743 (1971)CrossRefMATHGoogle Scholar
- 16.Bai, J.: Estimation of a change point in multiple regression models. Review of Economics and Statistics 79(4), 551–563 (1997)CrossRefGoogle Scholar
- 17.Logan Jr., E.: Handbook of Turbomachinery, 2nd edn. CRC Press (2003)Google Scholar
- 18.Duda, R.O., Hart, P.E., et al.: Pattern classification and scene analysis, vol. 3. Wiley, New York (1973)MATHGoogle Scholar
- 19.Chambers, J., Cleveland, W., Kleiner, B., Tukey, P.: Graphical Methods for Data Analysis. Wadsworth, Belmont (1983)MATHGoogle Scholar
- 20.Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 50–60 (1947)CrossRefMATHMathSciNetGoogle Scholar
- 21.Ratsch, G., Mika, S., Scholkopf, B., Muller, K.: Constructing boosting algorithms from svms: an application to one-class classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(9), 1184–1199 (2002)CrossRefGoogle Scholar