Abstract
This paper presents a new weighted local outlier factor method for anomaly detection, which is underpinned with three novel components: (1) a piecewise linear representation defined on the basis of the important points that consist of extreme points and additional points; (2) a set of new features which are used to identify anomalies given the new piecewise linear representation; (3) a weighting schema, assigning different weights to different features by accounting for the discriminant power of the features. The underlying idea of the proposed method is to characterize a time series with a set of four features and then discover abnormal changes by taking account of the closeness of any data points augmented with the new features. The comparative experiments demonstrate that the proposed piecewise representation method has performed well in sequential time series data, and the weighted local outlier factor method has achieved better accuracy and RankPower in detecting anomalies from the same data sets in comparison with the conventional local outlier factor, normalized local outlier factor and HOT symbolic aggregate approximation methods.
Similar content being viewed by others
References
Aydin I, Karakose M, Akin E (2015) Anomaly detection using a modified kernel-based tracking in the pantograph-catenary system. Expert Syst Appl 42(2015):938–948
Beigi MS, Chang SF, Ebadollahi S, Verma DC (2011) Anomaly detection in information streams without prior domain knowledge. IBM J Res Dev 55(5):1–11
Breunig MM, Kriegel H-P, Ng RN, Sander J (2000) LOF: identifying density-based local outliers. In: Proceeding SIGMOD’00 proceedings of the 2000 ACM SIGMOD international conference on management of data, vol 29(2). ACM, New York, pp 93–104
Chandola V, Boriah S, Kumar V (2008a) Understanding categorical similarity measures for outlier detection. Technical report 08-008, University of Minnesota, pp 1–45
Chandola V, Mithal V, Kumar V (2008b) A comparative evaluation of anomaly detection techniques for sequence data. In: ICDM, pp 743–748
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58
Gupta M, Gao J, Aggarwal CC, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267
Hadi AS (1994) A modification of a method for the detection of outliers in multivariate samples. J R Stat Soc B 56(2):393–396
Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Huang H (2013) Rank based anomaly detection algorithms. Dissertations, Electrical Engineering and Computer Science, pp 1–182
Jin XH, Sun Y, Que ZJ, Wang Y, Chow WS (2016) Anomaly detection and fault prognosis for bearings. IEEE Trans Instrum Meas 65(9):2046–2054
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 151–162
Keogh E, Lonardi S, Ratanamahatana CA (2004) Towards parameter-free data mining. KDD, Seattle, Washington, DC, pp 206–215
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: ICDM, pp 226–233
Keogh E, Lin J, Lee SH, Herle HV (2006) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27. http://www.cs.ucr.edu/~eamonn/
Keogh E, Chakrabarti K, Pazzani MJ, Mehrotra S (2008) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–268
Kou Y, Lu CT, Chen D (2006) Spatial weighted outlier detection. In: Proceedings of the SIAM conference on data mining, pp 614–617
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11
Palpanas T, Vlachos M, Keogh E, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: ICDE, Boston, March 2004
Park S, Kim SW, Cho JS, Padmanabhan S (2001a) Prefix-querying: an approach for effective subsequence matching under time warping in sequence databases. In: Proceedings of the 10th international conference on information and knowledge management, pp 255–262
Park S, Kim SW, Chu WW (2001b) Segment-based approach for subsequence searches in sequence databases. In: Proceedings of the 16th ACM symposium on applied computing, pp 248–252
Peng CS, Wang H, Zhang SR, Parker DS (2000) Landmarks: a new model for similarity-based pattern querying in time series databases. In: Proceedings of the 16th international conference on data engineering, pp 33–42
Pratt KB, Fink E (2002) Search for patterns in compressed time series. Int J Image Graph 2(1):89–106
Ramaswamy S, Rastogi R, Kyuseok S (2000) Efficient algorithms for mining outliers from large data sets. In: Proceeding ACMSIGMOD international conference on management of data, pp 427–438
Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: Proceedings of the 5th IEEE international conference on data mining. IEEE Computer Society, pp 418–425
Tandon G, Chan P (2007) Weighting versus pruning in rule validation for detecting network and host anomalies. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 697–706
Weigend AS, Mangeas M, Srivastava AN (1995) Nonlinear gated experts for time-series: discovering regimes and avoiding overfitting. Int J Neural Syst 6(4):373–399
Yan C, Fang J, Wu L, Ma S (2013) An approach of time series piecewise linear representation based on local maximum minimum and extremum. J Inf Comput Sci 10(9):2747–2756
Yankov D, Keogh E, Rebbapragada U (2007) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. In: ICDM 2007
Zhang Y, Meratnia N, Havinga PJM (2008) Outlier detection techniques for wireless sensor networks: a survey. Technical Report, Centre Telemat. Inform. Technol. Univ. Twente, Enschede, TR-CTIT-08-59, pp 159–170
Acknowledments
This work is supported by the Vice Chancellors Research Scholarships (VCRS) of Ulster University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Yaxin Bi is the corresponding author
Rights and permissions
About this article
Cite this article
Kong, X., Bi, Y. & Glass, D.H. Detecting anomalies in sequential data augmented with new features. Artif Intell Rev 53, 625–652 (2020). https://doi.org/10.1007/s10462-018-9671-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-018-9671-x