Skip to main content
Log in

Detecting anomalies in sequential data augmented with new features

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

This paper presents a new weighted local outlier factor method for anomaly detection, which is underpinned with three novel components: (1) a piecewise linear representation defined on the basis of the important points that consist of extreme points and additional points; (2) a set of new features which are used to identify anomalies given the new piecewise linear representation; (3) a weighting schema, assigning different weights to different features by accounting for the discriminant power of the features. The underlying idea of the proposed method is to characterize a time series with a set of four features and then discover abnormal changes by taking account of the closeness of any data points augmented with the new features. The comparative experiments demonstrate that the proposed piecewise representation method has performed well in sequential time series data, and the weighted local outlier factor method has achieved better accuracy and RankPower in detecting anomalies from the same data sets in comparison with the conventional local outlier factor, normalized local outlier factor and HOT symbolic aggregate approximation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Aydin I, Karakose M, Akin E (2015) Anomaly detection using a modified kernel-based tracking in the pantograph-catenary system. Expert Syst Appl 42(2015):938–948

    Article  Google Scholar 

  • Beigi MS, Chang SF, Ebadollahi S, Verma DC (2011) Anomaly detection in information streams without prior domain knowledge. IBM J Res Dev 55(5):1–11

    Article  Google Scholar 

  • Breunig MM, Kriegel H-P, Ng RN, Sander J (2000) LOF: identifying density-based local outliers. In: Proceeding SIGMOD’00 proceedings of the 2000 ACM SIGMOD international conference on management of data, vol 29(2). ACM, New York, pp 93–104

  • Chandola V, Boriah S, Kumar V (2008a) Understanding categorical similarity measures for outlier detection. Technical report 08-008, University of Minnesota, pp 1–45

  • Chandola V, Mithal V, Kumar V (2008b) A comparative evaluation of anomaly detection techniques for sequence data. In: ICDM, pp 743–748

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58

    Article  Google Scholar 

  • Gupta M, Gao J, Aggarwal CC, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267

    Article  Google Scholar 

  • Hadi AS (1994) A modification of a method for the detection of outliers in multivariate samples. J R Stat Soc B 56(2):393–396

    MATH  Google Scholar 

  • Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126

    Article  Google Scholar 

  • Huang H (2013) Rank based anomaly detection algorithms. Dissertations, Electrical Engineering and Computer Science, pp 1–182

  • Jin XH, Sun Y, Que ZJ, Wang Y, Chow WS (2016) Anomaly detection and fault prognosis for bearings. IEEE Trans Instrum Meas 65(9):2046–2054

    Article  Google Scholar 

  • Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 151–162

  • Keogh E, Lonardi S, Ratanamahatana CA (2004) Towards parameter-free data mining. KDD, Seattle, Washington, DC, pp 206–215

    Google Scholar 

  • Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: ICDM, pp 226–233

  • Keogh E, Lin J, Lee SH, Herle HV (2006) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27. http://www.cs.ucr.edu/~eamonn/

    Article  Google Scholar 

  • Keogh E, Chakrabarti K, Pazzani MJ, Mehrotra S (2008) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–268

    Article  Google Scholar 

  • Kou Y, Lu CT, Chen D (2006) Spatial weighted outlier detection. In: Proceedings of the SIAM conference on data mining, pp 614–617

  • Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11

  • Palpanas T, Vlachos M, Keogh E, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: ICDE, Boston, March 2004

  • Park S, Kim SW, Cho JS, Padmanabhan S (2001a) Prefix-querying: an approach for effective subsequence matching under time warping in sequence databases. In: Proceedings of the 10th international conference on information and knowledge management, pp 255–262

  • Park S, Kim SW, Chu WW (2001b) Segment-based approach for subsequence searches in sequence databases. In: Proceedings of the 16th ACM symposium on applied computing, pp 248–252

  • Peng CS, Wang H, Zhang SR, Parker DS (2000) Landmarks: a new model for similarity-based pattern querying in time series databases. In: Proceedings of the 16th international conference on data engineering, pp 33–42

  • Pratt KB, Fink E (2002) Search for patterns in compressed time series. Int J Image Graph 2(1):89–106

    Article  Google Scholar 

  • Ramaswamy S, Rastogi R, Kyuseok S (2000) Efficient algorithms for mining outliers from large data sets. In: Proceeding ACMSIGMOD international conference on management of data, pp 427–438

  • Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: Proceedings of the 5th IEEE international conference on data mining. IEEE Computer Society, pp 418–425

  • Tandon G, Chan P (2007) Weighting versus pruning in rule validation for detecting network and host anomalies. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 697–706

  • Weigend AS, Mangeas M, Srivastava AN (1995) Nonlinear gated experts for time-series: discovering regimes and avoiding overfitting. Int J Neural Syst 6(4):373–399

    Article  Google Scholar 

  • Yan C, Fang J, Wu L, Ma S (2013) An approach of time series piecewise linear representation based on local maximum minimum and extremum. J Inf Comput Sci 10(9):2747–2756

    Article  Google Scholar 

  • Yankov D, Keogh E, Rebbapragada U (2007) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. In: ICDM 2007

  • Zhang Y, Meratnia N, Havinga PJM (2008) Outlier detection techniques for wireless sensor networks: a survey. Technical Report, Centre Telemat. Inform. Technol. Univ. Twente, Enschede, TR-CTIT-08-59, pp 159–170

Download references

Acknowledments

This work is supported by the Vice Chancellors Research Scholarships (VCRS) of Ulster University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaxin Bi.

Additional information

Yaxin Bi is the corresponding author

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kong, X., Bi, Y. & Glass, D.H. Detecting anomalies in sequential data augmented with new features. Artif Intell Rev 53, 625–652 (2020). https://doi.org/10.1007/s10462-018-9671-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-018-9671-x

Keywords

Navigation