Skip to main content

A Minimum Spanning Tree Clustering-Inspired Outlier Detection Technique

  • Chapter
  • First Online:
New Developments in Unsupervised Outlier Detection

Abstract

Due to its important applications in data mining, many techniques have been developed for outlier detection. In contrast to k-nearest neighbor-based outlier detection techniques such as distance-based and density-based algorithms whose results have a relatively strong sensitiveness to the setting of the parameters, clustering-based outlier detection algorithms regard data items in the small groups as outliers and often obtain them as a by-product. Unlike K-means clustering algorithms, minimum spanning tree-based clustering algorithms can find clusters of arbitrary shapes, different sizes, and different densities. However, minimum spanning tree clustering-based outlier detection approaches may incur high computational costs. To partially circumvent these problems, in this chapter, an efficient outlier detection technique is proposed which is inspired by minimum spanning tree-based clustering. Extensive performance evaluations on synthetic as well as real datasets are conducted to show that the proposed approach works well for identifying global as well as local outliers with respect to the state-of-the-art outlier detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hawkins, D. M. (1980). Identification of outliers. London: Chapman and Hall.

    Book  Google Scholar 

  2. Eskin, E., Arnold, A., Prerau, M., Portnoy, L., & Stolfo, S. (2002). A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Barbará, D., & Jajodia, S. (Eds.), Applications of data mining in computer security. Advances in Information Security (Vol. 6, pp. 77–101).

    Google Scholar 

  3. Lane, T., & Brodley, C. E. (1998). Temporal sequence learning and data reduction for anomaly detection. In Proceedings of the 1998 5th ACM Conference on Computer and Communications Security (CCS-5), San Francisco, CA, USA (pp. 150–158).

    Google Scholar 

  4. Bolton, R. J., & David, J. H. (2002). Unsupervised profiling methods for fraud detection. Statistical Science, 17(3), 235–255.

    Article  MathSciNet  Google Scholar 

  5. Wong, W., Moore, A., Cooper, G., & Wagner, M. (2002). Rule-based anomaly pattern detection for detecting disease outbreaks. In Proceedings of the 18th National Conference on Artificial Intelligence, Edmonton, Alta., Canada (pp. 217–223).

    Google Scholar 

  6. Sheng, B., Li, Q., Mao, W., & Jin, W. (2007). Outlier detection in sensor networks. In Proceedings of ACM International Symposium on Mobile Ad Hoc Networking and Computing (pp. 219–228).

    Google Scholar 

  7. Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.

    Article  Google Scholar 

  8. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15.1–15.58.

    Google Scholar 

  9. Knorr, E. M., & Ng, R. T. (1998). Algorithms for mining distance-based outliers in large datasets. In Proceedings of the International Conference on Very Large Data Bases (VLDB’98), New York (pp. 392–403).

    Google Scholar 

  10. Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM International Conference on Management of Data (SIGMOD’00), Dallas (pp. 427–438).

    Google Scholar 

  11. Angiulli, F., & Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD’02), Helsinki (pp. 15–26).

    Google Scholar 

  12. Breuning, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00), Dallas, TX, United States (pp. 93–104).

    Google Scholar 

  13. Bay, S. D., & Schwabacher, M. (2003). Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘03), Washington, DC, United States (pp. 29–38).

    Google Scholar 

  14. Aggarwal, C., & Yu, P. (2005). An effective and efficient algorithm for high-dimensional outlier detection. The VLDB Journal, 14(2), 211–221.

    Article  Google Scholar 

  15. Kriegel, H.-P., Schubert, M., & Zimek, A. (2008). Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), Las Vegas, Nevada, USA (pp. 444–452).

    Google Scholar 

  16. Huang, H., Mehrotra, K., & Mohan, C. K. (2013). Rank-based outlier detection. Journal of Statistical Computation and Simulation, 83(3), 518–531.

    Article  MathSciNet  Google Scholar 

  17. Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, C-20(1), 64–82.

    Google Scholar 

  18. Wang, X., Wang, X. L., & Wilkes, D. M. (2009). A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Transactions on Knowledge and Data Engineering, 21(7), 945–958.

    Article  Google Scholar 

  19. Zhong, C., Miao, D., & Wang, R. (2010). A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition, 43(3), 752–766.

    Article  Google Scholar 

  20. Luo, T., & Zhong, C. (2010). A neighborhood density estimation clustering algorithm based on minimum spanning tree. In Proceedings of the 5th International Conference on Rough Set and Knowledge Technology (RSKT’10), Beijing, China (pp. 557–565).

    Google Scholar 

  21. Luo, T., Zhong, C., Li, H., & Sun, X. (2010). A multi-prototype clustering algorithm based on minimum spanning tree. In Proceedings of 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD’10) (pp. 1602–1607).

    Google Scholar 

  22. Zhong, C., Miao, D., & Franti, P. (2011). Minimum spanning tree based split-and-merge: A hierarchical clustering method. Information Sciences, 181(16), 3397–3410.

    Article  Google Scholar 

  23. Lin, J., Ye, D., Chen, C., & Gao, M. (2008). Minimum spanning tree based spatial outlier mining and its applications. In Proceedings of the 3rd International Conference on Rough Sets and Knowledge Technology (RSKT’08), Chengdu, China, LNAI 5009 (pp. 508–515).

    Google Scholar 

  24. John Peter, S., & Victor, S. P. (2011). An integrated approach for local outlier detection using dynamic minimum spanning tree. Journal of Discrete Mathematical Sciences and Cryptography, 14(1), 89–106.

    Article  Google Scholar 

  25. John Peter, S. (2011). Minimum spanning tree based clustering for outlier detection. Journal of Discrete Mathematical Sciences and Cryptography, 14(2), 149–166.

    Article  MathSciNet  Google Scholar 

  26. Daneshgar, A., Javadi, R., & Shariat Razavi, S. B. (2013). Clustering and outlier detection using isoperimetric number of trees. Pattern Recognition, 46(12), 3371–3382.

    Google Scholar 

  27. Wang, X., Wang, X. L., & Wilkes, D. M., A spanning tree-inspired clustering based outlier detection technique. In Proceedings of the 12th Industry Conference on Data Mining, Berlin, Germany (pp. 209–223)

    Google Scholar 

  28. Zhu, Q., Fan, X., & Feng, J. (2014). Outlier detection based on K-Neighborhood MST. In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI’14), San Francisco, CA, United States (pp. 718–724).

    Google Scholar 

  29. Cipolla, E., & Vella, F. (2014). Identification of spatio-temporal outliers through Minimum Spanning Tree. In Proceedings of the 10th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS’14), Marrakech, Morocco (pp. 248–255).

    Google Scholar 

  30. Abghari, S., Boeva, V., Lavesson, N., Grahn, H., Ickin, S., & Gustafsson, J. (2018). A minimum spanning tree clustering approach for outlier detection in event sequences. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA’18), Orlando, FL, United States (pp. 1123–1130).

    Google Scholar 

  31. Wang, X., Wang, X.L., & Wilkes, D. M. (2012). Modifying iDistance for a fast CHAMELEON with application to patch based image segmentation. In Proceedings of the 9th IASTED International Conference on Signal Processing, Pattern Recognition and Applications (SPPRA 2012), Crete, Greece (pp. 107–114).

    Google Scholar 

  32. UCI: The UCI KDD Archive. [http://kdd.ics.uci.edu/]. Irvine, CA: University of California.

    Google Scholar 

  33. Tang, J., Chen, Z., Fu, A. W. C., & Cheung, D. W. (2002). Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’02), Taipei, Taiwan (pp. 535–548).

    Google Scholar 

  34. Jin, W., Tung, A.K.H., Han, J., & Wang, W. (2006). Ranking outliers using symmetric neighborhood relationship. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06), Singapore (pp. 577–593).

    Google Scholar 

  35. Zhang, K., Hutter, M., & Jin, H. (2009). A new local distance-based outlier detection approach for scattered real-world data. In Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD ’09), Bangkok, Thailand (pp. 813–822).

    Google Scholar 

  36. Aggarwal, C., & Yu, P. (2001). Outlier detection for high-dimensional data. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD’01), Santa Barbara, CA, USA (pp. 37–46).

    Google Scholar 

  37. Meng, X., & Chen, Z. (2004). On user-oriented measurements of effectiveness of web information retrieval systems. In Proceedings of the International Conference on Internet Computing (ICIC’04), LasVegas, Nevada, USA (vol. 1, pp. 527–533).

    Google Scholar 

  38. Wang, X., Wang, X. L., Ma, Y., & Wilkes, D. M. (2015). A fast MST-inspired kNN-based outlier detection method. Information Systems, 48, 89–112.

    Article  Google Scholar 

Download references

Acknowledgements

This chapter was modified from the paper published by our group in Information Systems [38]. The related contents are reused with permission.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochun Wang .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Xi'an Jiaotong University Press

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wang, X., Wang, X., Wilkes, M. (2021). A Minimum Spanning Tree Clustering-Inspired Outlier Detection Technique. In: New Developments in Unsupervised Outlier Detection. Springer, Singapore. https://doi.org/10.1007/978-981-15-9519-6_5

Download citation

Publish with us

Policies and ethics