Skip to main content

Extreme-Centroid Tree for Outlier Detection

  • Conference paper
  • First Online:
Intelligent and Evolutionary Systems

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 5))

  • 1464 Accesses

Abstract

Outlier detection is one of the knowledge discovery problems that identifies a data point which does not agree with majority data points in a dataset. In the real-world datasets, the majority data points normally line up into patterns that can be captured by some models. In this paper, we propose the new outlier detection algorithm based on the dynamically updated tree model. It composes of two-step processes (1) constructing the extreme-centroid tree from a sampling dataset, and (2) dynamically updated extreme-centroid tree. In the extreme-centroid tree construction step, the root initially identifies two extreme data points from the centroid of a sampling dataset and uses them for splitting data points into groups. It continues splitting until the terminal criterion is met. A leaf node with a single data point is assigned as a suspected outlier in this process. The suspected outliers are trimmed from the tree model and sent back to the rest of a dataset. In the dynamically updated extreme-centroid tree step, a data point from the rest of a dataset will be inserted to the tree model, called the new inserted data point, and a single data point in the tree model is randomly removed from this tree model to maintain the amount of current data points, called the expired data point. The new inserted data point and the expired data point will adjust the tree maintaining the linear time complexity. We compared our algorithm with LOF algorithm and COF algorithm on the synthetic dataset and three UCI datasets. In the UCI datasets, a majority class is selected and other classes are randomly picked as the outliers. The results show that our algorithm outperformed when compared to LOF and COF using precision, recall, and F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm. KI-2012 Poster Demo Track, 59–63 (2012)

    Google Scholar 

  2. Amer, M., Abdennadher, S.: Comparison of unsupervised anomaly detection techniques. Bachelor’s Thesis (2011)

    Google Scholar 

  3. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. 41, 15 (2009)

    Article  Google Scholar 

  4. Knox, E.M., Ng, R.T.: Algorithms for mining distancebased outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases, pp. 392–403. Citeseer (1998)

    Google Scholar 

  5. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record, pp. 93–104. ACM (2000)

    Google Scholar 

  6. Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Advances in Knowledge Discovery and Data Mining, pp. 535–548. Springer (2002)

    Google Scholar 

  7. Kriegel, H.-P., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 444–452. ACM (2008)

    Google Scholar 

  8. Buthong, N., Luangsodsai, A., Sinapiromsaran, K.: Outlier detection score based on ordered distance difference. In: 2013 International Computer Science and Engineering Conference (ICSEC), pp. 157–162. IEEE (2013)

    Google Scholar 

  9. Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1649–1652. ACM (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panote Songwattanasiri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Songwattanasiri, P., Sinapiromsaran, K. (2016). Extreme-Centroid Tree for Outlier Detection. In: Lavangnananda, K., Phon-Amnuaisuk, S., Engchuan, W., Chan, J. (eds) Intelligent and Evolutionary Systems. Proceedings in Adaptation, Learning and Optimization, vol 5. Springer, Cham. https://doi.org/10.1007/978-3-319-27000-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27000-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26999-3

  • Online ISBN: 978-3-319-27000-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics