Abstract
Outlier detection is one of the knowledge discovery problems that identifies a data point which does not agree with majority data points in a dataset. In the real-world datasets, the majority data points normally line up into patterns that can be captured by some models. In this paper, we propose the new outlier detection algorithm based on the dynamically updated tree model. It composes of two-step processes (1) constructing the extreme-centroid tree from a sampling dataset, and (2) dynamically updated extreme-centroid tree. In the extreme-centroid tree construction step, the root initially identifies two extreme data points from the centroid of a sampling dataset and uses them for splitting data points into groups. It continues splitting until the terminal criterion is met. A leaf node with a single data point is assigned as a suspected outlier in this process. The suspected outliers are trimmed from the tree model and sent back to the rest of a dataset. In the dynamically updated extreme-centroid tree step, a data point from the rest of a dataset will be inserted to the tree model, called the new inserted data point, and a single data point in the tree model is randomly removed from this tree model to maintain the amount of current data points, called the expired data point. The new inserted data point and the expired data point will adjust the tree maintaining the linear time complexity. We compared our algorithm with LOF algorithm and COF algorithm on the synthetic dataset and three UCI datasets. In the UCI datasets, a majority class is selected and other classes are randomly picked as the outliers. The results show that our algorithm outperformed when compared to LOF and COF using precision, recall, and F-measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm. KI-2012 Poster Demo Track, 59–63 (2012)
Amer, M., Abdennadher, S.: Comparison of unsupervised anomaly detection techniques. Bachelor’s Thesis (2011)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. 41, 15 (2009)
Knox, E.M., Ng, R.T.: Algorithms for mining distancebased outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases, pp. 392–403. Citeseer (1998)
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record, pp. 93–104. ACM (2000)
Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Advances in Knowledge Discovery and Data Mining, pp. 535–548. Springer (2002)
Kriegel, H.-P., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 444–452. ACM (2008)
Buthong, N., Luangsodsai, A., Sinapiromsaran, K.: Outlier detection score based on ordered distance difference. In: 2013 International Computer Science and Engineering Conference (ICSEC), pp. 157–162. IEEE (2013)
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1649–1652. ACM (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Songwattanasiri, P., Sinapiromsaran, K. (2016). Extreme-Centroid Tree for Outlier Detection. In: Lavangnananda, K., Phon-Amnuaisuk, S., Engchuan, W., Chan, J. (eds) Intelligent and Evolutionary Systems. Proceedings in Adaptation, Learning and Optimization, vol 5. Springer, Cham. https://doi.org/10.1007/978-3-319-27000-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-27000-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26999-3
Online ISBN: 978-3-319-27000-5
eBook Packages: EngineeringEngineering (R0)