Skip to main content

Enhancing Outlier Detection by Filtering Out Core Points and Border Points

  • Chapter
  • First Online:
New Developments in Unsupervised Outlier Detection

Abstract

Outlier detection is an important task in data mining and has high practical value in numerous applications such as astronomical observation, text detection, fraud detection, and so on. At present, a large number of popular outlier detection algorithms are available, including distribution-based, distance-based, density-based, and clustering-based approaches. However, traditional outlier detection algorithms face some challenges. For one example, most distance-based and density-based outlier detection methods are based on k-nearest neighbors. Therefore, even though the outlier data occupy a relatively small amount in the dataset, the existing approaches need to perform local outlier factor calculation on all data during the outlier detection, which greatly reduces the efficiency of the algorithms. For another example, some methods can only detect the global outliers, but fail to detect the local outliers. Last but not the least, most outlier detection algorithms do not accurately distinguish between boundary points and outliers. To partially solve these problems, it is realized that the outlier detection problem is related to the clustering problem by complementarity. According to density-based clustering, there are three kinds of data points, namely core points, border points, and outliers. If indicators can be extracted from the data that make outliers have much larger deviation values than the other two kinds of data points, outlier detection problems can be fulfilled. Therefore, in this chapter, we propose to augment some boundary indicators to classical outlier detection algorithms. Experiments performed on both synthetic and real data sets demonstrate the efficacy of enhanced outlier detection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Knorr, E.M., & Ng, R.T. (1999). A unified notion of outliers: Properties and computation. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD’97) (pp. 219-222). CA, USA: Newport Beach.

    Google Scholar 

  2. Knorr, E. M., Ng, R. T., & Tucakov, V. (2000). Distance-based outliers: Algorithms and applications. The VLDB Journal, 8(3), 237–253.

    Article  Google Scholar 

  3. Mehnaz, S., & Bertino, E. (2017). Ghostbuster: A fine-grained approach for anomaly detection in file system accesses. In Proceedings of the ACM Conference on Data and Application Security and Privacy (CODASPY’17) (pp. 3–14). Scottsdale, AZ, United states.

    Google Scholar 

  4. Iturbe, M., Garitano, I., Zurutuza, U. and Uribeetxeberria, R. (2017). Towards large-scale, heterogeneous anomaly detection systems in industrial networks: A survey of current trends. Security and Communication Networks, 2017(6), Art. no. 9150965.

    Google Scholar 

  5. Wang, Y., Wu, Z., Zhu, Y., & Zhang, P. (2018). Research on anomaly detection algorithm based on generalization latency of telecommunication network. Future Generation Computer Systems, 85, 9–18.

    Article  Google Scholar 

  6. Gogoi, P., Bhattacharyya, D. K., Borah, B., & Kalita, J. K. (2011). A survey of outlier detection methods in network anomaly identification. Computer Journal, 54(4), 570–588.

    Article  Google Scholar 

  7. Bhuyan, M. H., Bhattacharyya, D. K., & Kalita, J. K. (2012). Survey on incremental approaches for network anomaly detection. International Journal of Communication Networks and Information Security, 3(3), 226–239.

    Google Scholar 

  8. Agarwal, D. (2005) An empirical Bayes approach to detect anomalies in dynamic multidimensional arrays. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05) (pp. 26–33). Houston, TX, United states.

    Google Scholar 

  9. Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 99, 215–249.

    Article  Google Scholar 

  10. Avdiienko, V., Kuznetsov, K., Rommelfanger, I., Rau, A., Gorla, A., & Zeller, A. (2017). Detecting behavior anomalies in graphical user interfaces. In Proceedings of the International Conference on Software Engineering Companion (ICSE-C’17) (pp. 201–203). Buenos Aires, Argentina.

    Google Scholar 

  11. Keogh, E., Lin, J., Lee, S.-H., & van Herle, H. (2010). Finding the most unusual time series subsequence: Algorithms and applications. Knowledge and Information Systems, 11(1), 1–27.

    Article  Google Scholar 

  12. Cai, L., Thornhill, N., Kuenzel, S., & Pal, B. C. (2017). Real-time detection of power system disturbances based on k-nearest neighbor analysis. IEEE Access, 5, 5631–5639.

    Article  Google Scholar 

  13. Mccarren, A., Mccarthy, S., Sullivan, C.O., & Roantree, M. (2017). Anomaly detection in agri warehouse construction. In Proceedings of 2017 Australasian Computer Science Week Multiconference (ACSW’17) (pp. 1–10). Geelong, VIC, Australia.

    Google Scholar 

  14. Stojanovic, N., Dinic, M. and Stojanovic, L. (2018). A data-driven approach for multivariate contextualized anomaly detection: Industry use case. In Proceedings of the 5th IEEE International Conference on Big Data (Big Data’17) (99. 1560–1569). Boston, MA, United states.

    Google Scholar 

  15. Vidmar, G., & Blagus, R. (2014). Outlier detection for healthcare quality monitoring: A comparison of four approaches to over-dispersed proportions. Quality and Reliability Engineering International, 30(3), 347–362.

    Article  Google Scholar 

  16. Yan, K., You, X., Ji, X., Yin, G., & Yang, F. (2016). A hybrid outlier detection method for health care big data. In Proceedings of the 6th IEEE International Conference on Big Data and Cloud Computing (BDCloud’16) (pp. 157–162). Atlanta, GA, United states.

    Google Scholar 

  17. Gu, F., Niu, J., Das, S. K., He, Z., & Jin, X. (2017). Detecting breathing frequency and maintaining a proper running rhythm. Pervasive and Mobile Computing, 42, 498–512.

    Article  Google Scholar 

  18. Barnett, V., & Lewis, T. (1994). Outliers in statistical data. New York: Wiley.

    MATH  Google Scholar 

  19. Knorr, E.M., & Ng, R.T. (1998). Algorithms for mining distance-based outliers in large datasets. In Proceedings of the International Conference on Very Large Data Bases (VLDB’98) (pp. 392–403), New York.

    Google Scholar 

  20. Breuning, M.M., Kriegel, H.P., Ng, R.T., Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00) (pp. 93–104). Dallas, TX, United states.

    Google Scholar 

  21. Jiang, M. F., Tseng, S. S., & Su, C. M. (2001). Two-Phase Clustering Process for Outliers Detection. Pattern Recognition Letters, 22(6–7), 691–700.

    Article  MATH  Google Scholar 

  22. Ester, M., Kriegel, H.-P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’96) (pp. 226–231). Portland, Oregon, USA.

    Google Scholar 

  23. Ankerst, M., Breunig, M. M., Kriegel, H. P., et al. (1999). OPTICS: Ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD’99), 28(2), 49–60.

    Google Scholar 

  24. Hinneburg A., & Keim D.A. (1998). An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD’98) (pp. 58–65). New York, NY, USA.

    Google Scholar 

  25. Duan, L., Xu, L., Liu, Y., & Lee, J. Cluster-based outlier detection. Annals of Operations Research, 168(1), 151–168.

    Google Scholar 

  26. Chen, X., Liu, W., Qiu, H., & Lai, J. (2011). APSCAN: A parameter free algorithm for clustering. Pattern Recognition Letters, 32(7), 973–986.

    Article  Google Scholar 

  27. Chen, Y.Q., Wang, X., Xu, R., Bai, X., & Meng, X. (2010). An adaptive affinity propagation document clustering. In Proceedings of the 2010 7th International Conference on Informatics and Systems (INFOS’10) (pp. 1–7). Cairo, Egypt.

    Google Scholar 

  28. Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492–1496.

    Article  Google Scholar 

  29. Hou, J., Gao, H., & Li, X. (2016). DSets-DBSCAN: A parameter-free clustering algorithm. IEEE Transactions on Image Processing, 25(7), 3182–3193.

    Article  MathSciNet  MATH  Google Scholar 

  30. Qi, X., & Wang, P. (2016). A density-based clustering algorithm for high-dimensional data with feature selection. In Proceedings of the 2016 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII’16) (pp. 114–118). Wuhan, Hubei, China.

    Google Scholar 

  31. Zhu, Y., Ting, K. M., & Carman, M. J. (2016). Density-ratio based clustering for discovering clusters with varying densities. Pattern Recognition, 60, 983–997.

    Article  MATH  Google Scholar 

  32. Messaoud, T.A., Smiti, A. and Louati, A. (2019). A novel density-based clustering approach for outlier detection in high-dimensional data. In Proceedings of the 14th International Conference on Hybrid Artificial Intelligence Systems (HAIS’19) (pp. 322–331). León, Spain.

    Google Scholar 

  33. Roffo, G., Melzi, S. and Cristani, M. (2015). Infinite feature selection. In Proceedings of the 15th IEEE International Conference on Computer Vision (ICCV’15) (pp. 4202–4210). Santiago, Chile.

    Google Scholar 

  34. Rahman, M. A., Ang, K. L.-M., & Seng, K. P. (2018). Unique neighborhood set parameter independent density-based clustering with outlier detection. IEEE Access, 6, 44707–44717.

    Article  Google Scholar 

  35. Su, S., Xiao, L., Ruan, L., Gu, F., Li, S., Wang, Z., et al. (2019). An efficient density-based local outlier detection approach for scattered data. IEEE Access, 7, 1006–1020.

    Article  Google Scholar 

  36. Wang, Y. F., Yu, J., Su, G. P., & Qian, Y. R. (2019). A new outlier detection method based on OPTICS. Sustainable Cities and Society, 45, 197–212.

    Article  Google Scholar 

  37. Nagamani, C., & Chittineni, S. (2019). Efficient neighborhood density based outlier detection inside a sub network with high dimensional data. Ingenierie des Systemes d’Information, 24(1), 107–111.

    Google Scholar 

  38. Angiulli, F., & Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD’02) (pp. 15–26). Helsinki.

    Google Scholar 

  39. Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’00) (pp. 427–438). Dallas.

    Google Scholar 

  40. Jin, W., Tung, A.K.H., Han, J., & Wang, W. (2006). Ranking outliers using symmetric neighborhood relationship. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06) (pp. 577–593). Singapore.

    Google Scholar 

  41. Huang, H., Mehrotra, K., & Mohan, C. K. (2013). Rank-based outlier detection. Journal of Statistical Computation and Simulation, 83(3), 518–531.

    Article  MathSciNet  Google Scholar 

  42. UCI: The UCI KDD Archive, University of California, Irvine, CA. http://kdd.ics.uci.edu/.

  43. Aggarwal, C., & Yu, P. (2001). Outlier detection for high-dimensional data. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD’01) (pp. 37–46). Santa Barbara, CA, USA.

    Google Scholar 

  44. Li, X., Wang, X., & Wang, X.L. (2018). Enhancing outlier detection by an outlier indicator. In Proceedings of the 14th International Conference on Machine Learning and Data Mining (pp. 393–405). New York, USA.

    Google Scholar 

Download references

Acknowledgements

This chapter was modified from the paper published by our group in machine learning and data mining in pattern recognition [44]. The related contents are reused with permission.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochun Wang .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Xi'an Jiaotong University Press

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wang, X., Wang, X., Wilkes, M. (2021). Enhancing Outlier Detection by Filtering Out Core Points and Border Points. In: New Developments in Unsupervised Outlier Detection. Springer, Singapore. https://doi.org/10.1007/978-981-15-9519-6_7

Download citation

Publish with us

Policies and ethics