YASA: Yet Another Time Series Segmentation Algorithm for Anomaly Detection in Big Data Problems

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8480)


Time series patterns analysis had recently attracted the attention of the research community for real-world applications. Petroleum industry is one of the application contexts where these problems are present, for instance for anomaly detection. Offshore petroleum platforms rely on heavy turbomachines for its extraction, pumping and generation operations. Frequently, these machines are intensively monitored by hundreds of sensors each, which send measurements with a high frequency to a concentration hub. Handling these data calls for a holistic approach, as sensor data is frequently noisy, unreliable, inconsistent with a priori problem axioms, and of a massive amount. For the anomalies detection problems in turbomachinery, it is essential to segment the dataset available in order to automatically discover the operational regime of the machine in the recent past. In this paper we propose a novel time series segmentation algorithm adaptable to big data problems and that is capable of handling the high volume of data involved in problem contexts. As part of the paper we describe our proposal, analyzing its computational complexity. We also perform empirical studies comparing our algorithm with similar approaches when applied to benchmark problems and a real-life application related to oil platform turbomachinery anomaly detection.


Time series segmentation anomaly detection big data oil industry application 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41(3), 15 (2009)CrossRefGoogle Scholar
  2. 2.
    DeCoste, D.: Mining multivariate time-series sensor data to discover behavior envelopes. In: KDD, pp. 151–154 (1997)Google Scholar
  3. 3.
    Hawkins, D.M.: Identification of outliers, vol. 11. Springer (1980)Google Scholar
  4. 4.
    Yairi, T., Kato, Y., Hori, K.: Fault detection by mining association rules from house-keeping data. In: Proc. of International Symposium on Artificial Intelligence, Robotics and Automation in Space, vol. 3. Citeseer (2001)Google Scholar
  5. 5.
    Keogh, E., Chu, S., Hart, D., Pazzani, M.: Segmenting time series: A survey and novel approach. Data Mining in Time Series Databases 57, 1–22 (2004)CrossRefGoogle Scholar
  6. 6.
    Abraham, A.: Special issue: Hybrid approaches for approximate reasoning. Journal of Intelligent and Fuzzy Systems 23(2-3), 41–42 (2012)MathSciNetGoogle Scholar
  7. 7.
    Borrajo, M.L., Baruque, B., Corchado, E., Bajo, J., Corchado, J.M.: Hybrid neural intelligent system to predict business failure in small-to-medium-size enterprises. International Journal of Neural Systems 21(04), 277–296 (2011)CrossRefGoogle Scholar
  8. 8.
    Bouchard, D.: Automated time series segmentation for human motion analysis. Center for Human Modeling and Simulation, University of Pennsylvania (2006)Google Scholar
  9. 9.
    Bingham, E., Gionis, A., Haiminen, N., Hiisilä, H., Mannila, H., Terzi, E.: Segmentation and dimensionality reduction. In: SDM. SIAM (2006)Google Scholar
  10. 10.
    Lemire, D.: A better alternative to piecewise linear time series segmentation. In: SDM. SIAM (2007)Google Scholar
  11. 11.
    Hunter, J., McIntosh, N.: Knowledge-based event detection in complex time series data. In: Horn, W., Shahar, Y., Lindberg, G., Andreassen, S., Wyatt, J.C. (eds.) AIMDM 1999. LNCS (LNAI), vol. 1620, pp. 271–280. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  12. 12.
    Vlachos, M., Lin, J., Keogh, E., Gunopulos, D.: A wavelet-based anytime algorithm for k-means clustering of time series. In: Proc. Workshop on Clustering High Dimensionality Data and Its Applications. Citeseer (2003)Google Scholar
  13. 13.
    Bollobás, B., Das, G., Gunopulos, D., Mannila, H.: Time-series similarity problems and well-separated geometric sets. In: Proceedings of the Thirteenth Annual Symposium on Computational Geometry, pp. 454–456. ACM (1997)Google Scholar
  14. 14.
    Feder, P.I.: On asymptotic distribution theory in segmented regression problems–identified case. The Annals of Statistics, 49–83 (1975)Google Scholar
  15. 15.
    Hinkley, D.V.: Inference in two-phase regression. Journal of the American Statistical Association 66(336), 736–743 (1971)CrossRefzbMATHGoogle Scholar
  16. 16.
    Bai, J.: Estimation of a change point in multiple regression models. Review of Economics and Statistics 79(4), 551–563 (1997)CrossRefGoogle Scholar
  17. 17.
    Logan Jr., E.: Handbook of Turbomachinery, 2nd edn. CRC Press (2003)Google Scholar
  18. 18.
    Duda, R.O., Hart, P.E., et al.: Pattern classification and scene analysis, vol. 3. Wiley, New York (1973)zbMATHGoogle Scholar
  19. 19.
    Chambers, J., Cleveland, W., Kleiner, B., Tukey, P.: Graphical Methods for Data Analysis. Wadsworth, Belmont (1983)zbMATHGoogle Scholar
  20. 20.
    Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 50–60 (1947)CrossRefzbMATHMathSciNetGoogle Scholar
  21. 21.
    Ratsch, G., Mika, S., Scholkopf, B., Muller, K.: Constructing boosting algorithms from svms: an application to one-class classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(9), 1184–1199 (2002)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Dept. of Electrical EngineeringPontifícia Universidade Católica do Rio de JaneiroRio de JaneiroBrazil
  2. 2.Instituto de LógicaFilosofia e Teoria da Ciéncia (ILTC)NiteróiBrazil
  3. 3.Dept. of InformaticsUniversidad Carlos III de MadridColmenarejoSpain
  4. 4.ADDLabsFluminense Federal UniversityNiteróiBrazil

Personalised recommendations