Abstract
Outlierdetection is an important research direction in data mining, including fraud detection, activity monitoring, medical research, network intrusion detection, etc. Many outlier detection methods have been proposed; however, most of them are not suitable for complex patterns because they do not extract appropriate neighbor information and cannot estimate the density accurately. Additionally, their performance is not stable and depends heavily on the number of nearest neighbors (k) selected. To overcome the above defects, we proposed a neighborhood weighted-based outlier detection(NWOD) algorithm that can obtain correct detection result in a variety of situations. In our algorithm, the local density of an object is measured by constructing a weighted nearest neighbor graph and quantifying how difficult it is for the object and its nearest neighbors to reach each other. Furthermore, the proposed neighborhood weighted local outlier factor (NWLOF) compares the differences of the neighborhood weighted local density between a given object and the objects in its neighborhood, and then the degree of being an outlier of an object can be judged. The larger the NWLOF of an object is, the more likely it is to be an outlier. In addition, due to our proposed algorithm being based on the concept of a natural stable structure, its performance does not rely on the value of k. Experiments conducted on both synthetic and real-world datasets show the superiority of our algorithm.
Similar content being viewed by others
References
Zhang W H (2017) An anomaly detection method for medicare fraud detection. 2017 IEEE International Conference on Big Knowledge (ICBK), pp 309–314
Evangelou M, Adams NM (2020) An anomaly detection framework for cyber-security data. Comput Secur 97:101941
Smiti A (2020) A critical overview of outlier detection methods. Comput Sci Rev 38:100306
da Costa KAP, Papa JP, Passos LA, Colombo D, Del Ser J, Muhammad K, de Albuquerque VHC (2020) A critical literature survey and prospects on tampering and anomaly detection in image data. Appl Soft Comput 97:106727
Domingues R, Filippone M, Michiardi P, Zouaoui J (2018) A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recogn 74:406–421
Wangm X, Wang X, Wilkes M (2021) New developments in unsupervised outlier detection. Springer Singapore
Meng F, Yuan G, Lv S, Wang Z, Xia S (2019) An overview on trajectory outlier detection. Artif Intell Rev 52:2347–2456
Goldstein M, Uchida S (2016) A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data. PLoS One
Campos GO, Zimek A (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Discov 30:891–927
Ozkan H, Ozkan F, Kozat SS (2016) Online anomaly detection under markov statistics with controllable type-i error. IEEE Trans Signal Process 64(6):1435–1445
Ding J, Wang J, Zhang Y, Li Y, Zheng N (2021) Correlation-based robust linear regression with iterative outlier removal. In: ICASSP 2021 - 2021 IEEE international conference on acoustics, speech and signal processing (ICASSP)
Yuen K-V, Ortiz GA (2017) Outlier detection and robust regression for correlated data. Comput Methods Appl Mech Eng 313:632–646
Wang B, Mao Z (2019) Outlier detection based on Gaussian process with application to industrial processes. Appl Soft Comput 76:505–516
Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowl-based Syst 121:32–40
Jones PJ, James MK, Davies MJ, Khunti K, Catt M, Yates T, Rowlands AV, Mirkes EM (2020) FilterK: A new outlier detection method for k-means clustering of physical activity. J Biomed Inf 104:103397
Tu B, Yang X, Li N, Zhou C, He D (2020) Hyperspectral anomaly detection via density peak clustering. Pattern Recogn Lett 129:144–149
Chen J, Sadeqi E, Zhang Q (2018) A practical algorithm for distributed clustering and outlier detection. arXiv preprint
Gao J, Ji W, Zhang L, Li A, Wang Y, Zhang Z (2020) Cube-based incremental outlier detection for streaming computing. Inf Sci 517:361–376
Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-based Syst 63:15–23
Zhang K, Hutter M, Jin HD (2009) A new local distance-based outlier detection approach for scattered real-world data. Adv Knowl Discov Data Min 5476:813–822
Knorr E M, Ng R T (1998) Algorithms for mining distance-based outliers in large datasets, pp 392–403
Tran L F (2016) Distance-Based Outlier Detection in Data Streams. Proc VLDB Endow 9 (12):1089–1100
Tran L M (2020) Real-Time Distance-Based Outlier Detection in Data Streams. Proc VLDB Endow 14(2):141–153
Breunig M M, Kriegel H-P, Ng R T, Sander J (2000) Lof: Identifying density-based local outliers. SIGMOD Rec 29(2):93–104
Tang J, Chen Z, Fu A W-C, Cheung D W-L (2002) Enhancing effectiveness of outlier detections for low density patterns, pp 535–548
Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180
Uttarkabat S, Sunkara N D, Patra B K (2020) Rsod: Efficient technique for outlier detection using reverse nearest neighbors statistics. In: 2020 4th international conference on computational intelligence and networks (CINE), pp 1–6
Xie J, Xiong ZY, Dai QZ, Wang XX, Zhang YF (2020) A local-gravitation-based method for the detection of outliers and boundary points. Knowl-based Syst 192
Zhu QS, Feng J, Huang JL (2016) Natural neighbor: A self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80:30–36
Huang JL, Zhu QS, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-based Syst 92:71–77
Yang LJ, Zhu QS, JL.Huang, Cheng DD (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230:427– 433
Wahid A, Sekhara C, Annavarapu R (2021) NaNOD: A natural neighbour-based outlier detection algorithm. Neural Comput Appl 33:2107–2123
Bentley J L (1975) Multidimensional binary search trees used for associative searching. Assoc Comput Machinery 18(9):509– 517
Sadeghi R, Banerjee T, Romine W (2018) Early hospital mortality prediction using vital signals. Smart Health 9-10:265–274
Li L-T, Xiong Z-Y, Dai Q-Z, Zha Y-F, Zhang Y-F, Dan J-P (2020) A novel graph-based clustering method using noise cutting. Inf Syst 91:101504
Papadimitriou C, Steiglitz K (1998) Combinatorial optimization:algorithms and complexity. Courier Dover Publications
Wang C, Liu Z, Gao H, Fu Y (2019) Vos: A new outlier detection model using virtual graph. Knowl-based Syst 185
Lichman M Uci machine learning repository. http://archive.ics.uci.edu/ml
Acknowledgments
The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions. This work is funded by the National Natural Science Foundation of China (no. 61701051), the Fundamental Research Funds for the Central Universities (no. 2019CDCGJSJ329) and the Graduate Scientific Research and Innovation Foundation of Chongqing, China (Grant no. CYS20067).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xiong, ZY., Long, H., Zhang, YF. et al. A neighborhood weighted-based method for the detection of outliers. Appl Intell 53, 9897–9915 (2023). https://doi.org/10.1007/s10489-022-03258-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03258-0