Skip to main content
Log in

Exemplar learning for extremely efficient anomaly detection in real-valued time series

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We investigate algorithms for efficiently detecting anomalies in real-valued one-dimensional time series. Past work has shown that a simple brute force algorithm that uses as an anomaly score the Euclidean distance between nearest neighbors of subsequences from a testing time series and a training time series is one of the most effective anomaly detectors. We investigate a very efficient implementation of this method and show that it is still too slow for most real world applications. Next, we present a new method based on summarizing the training time series with a small set of exemplars. The exemplars we use are feature vectors that capture both the high frequency and low frequency information in sets of similar subsequences of the time series. We show that this exemplar-based method is both much faster than the efficient brute force method as well as a prediction-based method and also handles a wider range of anomalies. We compare our algorithm across a large variety of publicly available time series and encourage others to do the same. Our exemplar-based algorithm is able to process time series in minutes that would take other methods days to process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. We did test the z-normalized BFED algorithm and as expected found it to be less accurate for anomaly detection. Over all the testing time series used in Sect. 6, the z-normalized BFED algorithm has a detection rate of 31/45 with no false positives which is worse than the unnormalized BFED algorithm as well as our exemplar approach.

References

  • Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6:37–66

    Google Scholar 

  • Assent I, Krieger R, Afschari F, Seidl T (2008) The TS-tree: efficient time series search and retrieval. In: Proceedings of the 11th international conference on extending database technology: advances in database technology (EDBT)

  • Bay S, Saito K, Ueda N, Langley P (2004) A framework for discovering anomalous regimes in multivariate time-series data with local models. Symposium on machine learning for anomaly detection. Stanford University

  • Bentley J (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  MathSciNet  MATH  Google Scholar 

  • Chan P, Mahoney M (2005) Modeling multiple time series for anomaly detection. In: Fifth IEEE international conference on data mining, pp 90–97

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3)

  • Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. Dept. of Computer Science and Engineering, Univ. of Minnesota Technical Report, TR 09–004

  • Chang C-C, Lin C-J (2011) LIBSVM : a library for support vector machines. ACM Trans Intell Syst Technol 2(3): article no. 27, 1–27

  • Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498

  • Dasgupta D, Forrest S (1996) Novelty Detection in time series data using ideas from immunology. In: 5th international conference on intelligent systems

  • Farrell B, Santuro S (2005) NASA shuttle valve Data. http://www.cs.fit.edu/ pkc/nasa/data/

  • Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267

    Article  MathSciNet  MATH  Google Scholar 

  • Jones M, Nikovski D, Imamura M, Hirata T (2014) Anomaly detection in real-valued multidimensional time series. In: Proceedings of the 2nd international ASE conference on big data science and computing

  • Keogh E, Lin J, Fu A (2005) HOT SAX: finding the most unusual time series subsequence: algorithms and applications. In: Proceedings of the Fifth IEEE international conference on data mining, pp 226–233

  • Keogh E (2005) www.cs.ucr.edu/ eamonn/discords/

  • Liu B, Chen H, Sharma A, Jiang G, Xiong H (2013) Modeling heterogeneous time series dynamics to profile big sensor data in complex physical systems. In: IEEE international conference on big data, pp 631–638

  • Ma J, Perkins S (2003) Online novelty detection on temporal sequences. Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618

  • Mahoney M, Chan P (2005) Trajectory boundary modeling of time series for anomaly detection. Workshop on data mining methods for anomaly detection at SIGKDD

  • Oliveira A, Meira S (2006) Detecting novelties in time series through neural network forcasting with robust confidence intervals. Neurocomputing 70:79–92

    Article  Google Scholar 

  • Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the 2002 IEEE international conference on data mining, pp 370–377

  • Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–270

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Jones.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jones, M., Nikovski, D., Imamura, M. et al. Exemplar learning for extremely efficient anomaly detection in real-valued time series. Data Min Knowl Disc 30, 1427–1454 (2016). https://doi.org/10.1007/s10618-015-0449-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0449-3

Keywords

Navigation