Data Mining and Knowledge Discovery

, Volume 30, Issue 6, pp 1427–1454 | Cite as

Exemplar learning for extremely efficient anomaly detection in real-valued time series

  • Michael Jones
  • Daniel Nikovski
  • Makoto Imamura
  • Takahisa Hirata
Article

Abstract

We investigate algorithms for efficiently detecting anomalies in real-valued one-dimensional time series. Past work has shown that a simple brute force algorithm that uses as an anomaly score the Euclidean distance between nearest neighbors of subsequences from a testing time series and a training time series is one of the most effective anomaly detectors. We investigate a very efficient implementation of this method and show that it is still too slow for most real world applications. Next, we present a new method based on summarizing the training time series with a small set of exemplars. The exemplars we use are feature vectors that capture both the high frequency and low frequency information in sets of similar subsequences of the time series. We show that this exemplar-based method is both much faster than the efficient brute force method as well as a prediction-based method and also handles a wider range of anomalies. We compare our algorithm across a large variety of publicly available time series and encourage others to do the same. Our exemplar-based algorithm is able to process time series in minutes that would take other methods days to process.

Keywords

Anomaly detection Time series Exemplar learning 

References

  1. Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6:37–66Google Scholar
  2. Assent I, Krieger R, Afschari F, Seidl T (2008) The TS-tree: efficient time series search and retrieval. In: Proceedings of the 11th international conference on extending database technology: advances in database technology (EDBT)Google Scholar
  3. Bay S, Saito K, Ueda N, Langley P (2004) A framework for discovering anomalous regimes in multivariate time-series data with local models. Symposium on machine learning for anomaly detection. Stanford UniversityGoogle Scholar
  4. Bentley J (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517MathSciNetCrossRefMATHGoogle Scholar
  5. Chan P, Mahoney M (2005) Modeling multiple time series for anomaly detection. In: Fifth IEEE international conference on data mining, pp 90–97Google Scholar
  6. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3)Google Scholar
  7. Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. Dept. of Computer Science and Engineering, Univ. of Minnesota Technical Report, TR 09–004Google Scholar
  8. Chang C-C, Lin C-J (2011) LIBSVM : a library for support vector machines. ACM Trans Intell Syst Technol 2(3): article no. 27, 1–27Google Scholar
  9. Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498Google Scholar
  10. Dasgupta D, Forrest S (1996) Novelty Detection in time series data using ideas from immunology. In: 5th international conference on intelligent systemsGoogle Scholar
  11. Farrell B, Santuro S (2005) NASA shuttle valve Data. http://www.cs.fit.edu/ pkc/nasa/data/
  12. Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267MathSciNetCrossRefMATHGoogle Scholar
  13. Jones M, Nikovski D, Imamura M, Hirata T (2014) Anomaly detection in real-valued multidimensional time series. In: Proceedings of the 2nd international ASE conference on big data science and computingGoogle Scholar
  14. Keogh E, Lin J, Fu A (2005) HOT SAX: finding the most unusual time series subsequence: algorithms and applications. In: Proceedings of the Fifth IEEE international conference on data mining, pp 226–233Google Scholar
  15. Liu B, Chen H, Sharma A, Jiang G, Xiong H (2013) Modeling heterogeneous time series dynamics to profile big sensor data in complex physical systems. In: IEEE international conference on big data, pp 631–638Google Scholar
  16. Ma J, Perkins S (2003) Online novelty detection on temporal sequences. Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618Google Scholar
  17. Mahoney M, Chan P (2005) Trajectory boundary modeling of time series for anomaly detection. Workshop on data mining methods for anomaly detection at SIGKDDGoogle Scholar
  18. Oliveira A, Meira S (2006) Detecting novelties in time series through neural network forcasting with robust confidence intervals. Neurocomputing 70:79–92CrossRefGoogle Scholar
  19. Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the 2002 IEEE international conference on data mining, pp 370–377Google Scholar
  20. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–270Google Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  1. 1.MERLCambridgeUSA
  2. 2.Mitsubishi Electric, Information Technology CenterOfunaJapan

Personalised recommendations