Detecting anomalies in sequential data augmented with new features

Kong, Xiangzeng; Bi, Yaxin; Glass, David H.

doi:10.1007/s10462-018-9671-x

Detecting anomalies in sequential data augmented with new features

Published: 03 January 2019

Volume 53, pages 625–652, (2020)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

474 Accesses
11 Citations
Explore all metrics

Abstract

This paper presents a new weighted local outlier factor method for anomaly detection, which is underpinned with three novel components: (1) a piecewise linear representation defined on the basis of the important points that consist of extreme points and additional points; (2) a set of new features which are used to identify anomalies given the new piecewise linear representation; (3) a weighting schema, assigning different weights to different features by accounting for the discriminant power of the features. The underlying idea of the proposed method is to characterize a time series with a set of four features and then discover abnormal changes by taking account of the closeness of any data points augmented with the new features. The comparative experiments demonstrate that the proposed piecewise representation method has performed well in sequential time series data, and the weighted local outlier factor method has achieved better accuracy and RankPower in detecting anomalies from the same data sets in comparison with the conventional local outlier factor, normalized local outlier factor and HOT symbolic aggregate approximation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aydin I, Karakose M, Akin E (2015) Anomaly detection using a modified kernel-based tracking in the pantograph-catenary system. Expert Syst Appl 42(2015):938–948
Article Google Scholar
Beigi MS, Chang SF, Ebadollahi S, Verma DC (2011) Anomaly detection in information streams without prior domain knowledge. IBM J Res Dev 55(5):1–11
Article Google Scholar
Breunig MM, Kriegel H-P, Ng RN, Sander J (2000) LOF: identifying density-based local outliers. In: Proceeding SIGMOD’00 proceedings of the 2000 ACM SIGMOD international conference on management of data, vol 29(2). ACM, New York, pp 93–104
Chandola V, Boriah S, Kumar V (2008a) Understanding categorical similarity measures for outlier detection. Technical report 08-008, University of Minnesota, pp 1–45
Chandola V, Mithal V, Kumar V (2008b) A comparative evaluation of anomaly detection techniques for sequence data. In: ICDM, pp 743–748
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58
Article Google Scholar
Gupta M, Gao J, Aggarwal CC, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267
Article Google Scholar
Hadi AS (1994) A modification of a method for the detection of outliers in multivariate samples. J R Stat Soc B 56(2):393–396
MATH Google Scholar
Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Article Google Scholar
Huang H (2013) Rank based anomaly detection algorithms. Dissertations, Electrical Engineering and Computer Science, pp 1–182
Jin XH, Sun Y, Que ZJ, Wang Y, Chow WS (2016) Anomaly detection and fault prognosis for bearings. IEEE Trans Instrum Meas 65(9):2046–2054
Article Google Scholar
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 151–162
Keogh E, Lonardi S, Ratanamahatana CA (2004) Towards parameter-free data mining. KDD, Seattle, Washington, DC, pp 206–215
Google Scholar
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: ICDM, pp 226–233
Keogh E, Lin J, Lee SH, Herle HV (2006) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27. http://www.cs.ucr.edu/~eamonn/
Article Google Scholar
Keogh E, Chakrabarti K, Pazzani MJ, Mehrotra S (2008) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–268
Article Google Scholar
Kou Y, Lu CT, Chen D (2006) Spatial weighted outlier detection. In: Proceedings of the SIAM conference on data mining, pp 614–617
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11
Palpanas T, Vlachos M, Keogh E, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: ICDE, Boston, March 2004
Park S, Kim SW, Cho JS, Padmanabhan S (2001a) Prefix-querying: an approach for effective subsequence matching under time warping in sequence databases. In: Proceedings of the 10th international conference on information and knowledge management, pp 255–262
Park S, Kim SW, Chu WW (2001b) Segment-based approach for subsequence searches in sequence databases. In: Proceedings of the 16th ACM symposium on applied computing, pp 248–252
Peng CS, Wang H, Zhang SR, Parker DS (2000) Landmarks: a new model for similarity-based pattern querying in time series databases. In: Proceedings of the 16th international conference on data engineering, pp 33–42
Pratt KB, Fink E (2002) Search for patterns in compressed time series. Int J Image Graph 2(1):89–106
Article Google Scholar
Ramaswamy S, Rastogi R, Kyuseok S (2000) Efficient algorithms for mining outliers from large data sets. In: Proceeding ACMSIGMOD international conference on management of data, pp 427–438
Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: Proceedings of the 5th IEEE international conference on data mining. IEEE Computer Society, pp 418–425
Tandon G, Chan P (2007) Weighting versus pruning in rule validation for detecting network and host anomalies. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 697–706
Weigend AS, Mangeas M, Srivastava AN (1995) Nonlinear gated experts for time-series: discovering regimes and avoiding overfitting. Int J Neural Syst 6(4):373–399
Article Google Scholar
Yan C, Fang J, Wu L, Ma S (2013) An approach of time series piecewise linear representation based on local maximum minimum and extremum. J Inf Comput Sci 10(9):2747–2756
Article Google Scholar
Yankov D, Keogh E, Rebbapragada U (2007) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. In: ICDM 2007
Zhang Y, Meratnia N, Havinga PJM (2008) Outlier detection techniques for wireless sensor networks: a survey. Technical Report, Centre Telemat. Inform. Technol. Univ. Twente, Enschede, TR-CTIT-08-59, pp 159–170

Download references

Acknowledments

This work is supported by the Vice Chancellors Research Scholarships (VCRS) of Ulster University.

Author information

Authors and Affiliations

School of Computing, Ulster University at Jordanstown, Newtownabbey, BT37 0QB, Northern Ireland, UK
Xiangzeng Kong, Yaxin Bi & David H. Glass

Authors

Xiangzeng Kong
View author publications
You can also search for this author in PubMed Google Scholar
Yaxin Bi
View author publications
You can also search for this author in PubMed Google Scholar
David H. Glass
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaxin Bi.

Additional information

Yaxin Bi is the corresponding author

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kong, X., Bi, Y. & Glass, D.H. Detecting anomalies in sequential data augmented with new features. Artif Intell Rev 53, 625–652 (2020). https://doi.org/10.1007/s10462-018-9671-x

Download citation

Published: 03 January 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10462-018-9671-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting anomalies in sequential data augmented with new features

Abstract

Access this article

Similar content being viewed by others

Data-Driven Pattern Identification and Outlier Detection in Time Series

Show Me Your Friends and I’ll Tell You Who You Are. Finding Anomalous Time Series by Conspicuous Cluster Transitions

A Review of Time-Series Anomaly Detection Techniques: A Step to Future Perspectives

References

Acknowledments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting anomalies in sequential data augmented with new features

Abstract

Access this article

Similar content being viewed by others

Data-Driven Pattern Identification and Outlier Detection in Time Series

Show Me Your Friends and I’ll Tell You Who You Are. Finding Anomalous Time Series by Conspicuous Cluster Transitions

A Review of Time-Series Anomaly Detection Techniques: A Step to Future Perspectives

References

Acknowledments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation