Skip to main content

Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Abstract

Outlier detection research has been seeing many new algorithms every year that often appear to be only slightly different from existing methods along with some experiments that show them to “clearly outperform” the others. However, few approaches come along with a clear analysis of existing methods and a solid theoretical differentiation. Here, we provide a formalized method of analysis to allow for a theoretical comparison and generalization of many existing methods. Our unified view improves understanding of the shared properties and of the differences of outlier detection models. By abstracting the notion of locality from the classic distance-based notion, our framework facilitates the construction of abstract methods for many special data types that are usually handled with specialized algorithms. In particular, spatial neighborhood can be seen as a special case of locality. Here we therefore compare and generalize approaches to spatial outlier detection in a detailed manner. We also discuss temporal data like video streams, or graph data such as community networks. Since we reproduce results of specialized approaches with our general framework, and even improve upon them, our framework provides reasonable baselines to evaluate the true merits of specialized approaches. At the same time, seeing spatial outlier detection as a special case of local outlier detection, opens up new potentials for analysis and advancement of methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Notes

  1. 1.

    This adaptation could also be considered “Schönfinkeling” (or, tastier, “Currying”) of the original DB-outlier model.

  2. 2.

    Available at the US Census Bureau, United States Department of Commerce. http://www.census.gov/.

  3. 3.

    Available at the Arizona State University GeoDa Center, http://geodacenter.asu.edu/.

  4. 4.

    Available at Statistisches Bundesamt Deutschland, http://www.destatis.de.

  5. 5.

    http://www.dblp.org/db/.

  6. 6.

    Xapian search engine library, http://xapian.org/, GPL.

References

  1. Achtert E, Hettab A, Kriegel HP, Schubert E, Zimek A (2011) Spatial outlier detection: data, algorithms, visualizations. In: Proceedings of the 12th international symposium on spatial and temporal databases (SSTD), Minneapolis, MN, pp 512–516. doi:10.1007/978-3-642-22922-0_41

  2. Achtert E, Goldhofer S, Kriegel HP, Schubert E, Zimek A (2012) Evaluation of clusterings—metrics and visual support. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC, pp 1285–1288. doi:10.1109/ICDE.2012.128

  3. Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceedings of the ACM international conference on management of data (SIGMOD), Santa Barbara, CA, pp 37–46

  4. Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proceedings of the 8th SIAM international conference on data mining (SDM), Atlanta, GA, pp 483–493

  5. Agyemang M, Barker K, Alhajj R (2006) A comprehensive survey of numeric and symbolic outlier mining techniques. Intell Data Anal 10:521–538

    Google Scholar 

  6. Angiulli F, Fassetti F (2009) DOLPHIN: an efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans Knowl Discov Data 3(1):4:1–4:57. doi:10.1145/1497577.1497581

    Article  Google Scholar 

  7. Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery (PKDD), Helsinki, Finland, pp 15–26. doi:10.1007/3-540-45681-3_2

  8. Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM international conference on management of data (SIGMOD), Philadelphia, PA, pp 49–60

  9. Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115

    Article  Google Scholar 

  10. Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley, New York

    MATH  Google Scholar 

  11. Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 29–38. doi:10.1145/956750.956758

  12. Breunig MM, Kriegel HP, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, TX, pp 93–104

  13. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):Article 15, 1–58. doi:10.1145/1541880.1541882

  14. Chandola V, Banerjee A, Kumar V (2012) Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng 24(5):823–839

    Article  Google Scholar 

  15. Chawla S, Sun P (2006) SLOM: a new measure for local spatial outliers. Knowl Inf Syst 9(4):412–429. doi:10.1007/s10115-005-0200-2

    Article  Google Scholar 

  16. Chen F, Lu CT, Boedihardjo AP (2010) GLS-SOD: a generalized local statistical approach for spatial outlier detection. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 1069–1078. doi:10.1145/1835804.1835939

  17. de Vries T, Chawla S, Houle ME (2010) Finding local anomalies in very high dimensional space. In: Proceedings of the 10th IEEE international conference on data mining (ICDM), Sydney, Australia, pp 128–137. doi:10.1109/ICDM.2010.151

  18. Gao J, Tan PN (2006) Converting output scores from outlier detection algorithms into probability estimates. In: Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China, pp 212–221. doi:10.1109/ICDM.2006.43

  19. Gao J, Liang F, Fan W, Wang C, Sun Y, Han J (2010) On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 813–822. doi:10.1145/1835804.1835907

  20. Geary RC (1954) The contiguity ratio and statistical mapping. Inc Stat 5(3):115–146

    Google Scholar 

  21. Hadi AS, Rahmatullah Imon AHM, Werner M (2009) Detection of outliers. Wiley Interdiscip Rev Comput Stat 1(1):57–70

    Article  Google Scholar 

  22. Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22:85–126. doi:10.1023/B:AIRE.0000045502.10941.a9

    Article  MATH  Google Scholar 

  23. Hong SB, Nah W, Baek JH (2003) Abrupt shot change detection using multiple features and classification tree. In: Proceedings of the 4th international conference on intelligent data engineering and automated learning (IDEAL), Hong Kong, China, pp 553–560. doi:10.1007/978-3-540-45080-1_76

  24. Jagadish HV, Koudas N, Muthukrishnan S (1999) Mining deviants in a time series database. In: Proceedings of the 25th international conference on very large databases (VLDB), Edinburgh, Scotland, pp 102–113

  25. Jin W, Tung A, Han J (2001) Mining top-n local outliers in large databases. In: Proceedings of the 7th ACM international conference on knowledge discovery and data mining (SIGKDD), San Francisco, CA, pp 293–298. doi:10.1145/502512.502554

  26. Jin W, Tung AKH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Singapore, pp 577–593. doi:10.1007/11731139_68

  27. Keller F, Müller E, Böhm K (2012) HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC

  28. Kim HH, Kim YH (2010) Toward a conceptual framework of key-frame extraction and storyboard display for video summarization. J Am Soc Inf Sc Technol 61(5):927–939. doi:10.1002/asi.21317

    Article  Google Scholar 

  29. Knorr EM, Ng RT (1997) A unified notion of outliers: properties and computation. In: Proceedings of the 3rd ACM international conference on knowledge discovery and data mining (KDD), Newport Beach, CA, pp 219–222

  30. Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large databases (VLDB), New York City, NY, pp 392–403

  31. Knorr EM, Ng RT, Tucanov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253

    Article  Google Scholar 

  32. Kollios G, Gunopulos D, Koudas N, Berchthold S (2003) Efficient biased sampling for approximate clustering and outlier detection in large datasets. IEEE Trans Knowl Data Eng 15(5):1170–1187. doi:10.1109/TKDE.2003.1232271

    Article  Google Scholar 

  33. Kou Y, Lu CT, Chen D (2006) Spatial weighted outlier detection. In: Proceedings of the 6th SIAM international conference on data mining (SDM), Bethesda, MD

  34. Kou Y, Lu CT, Dos Santos RF (2007) Spatial outlier detection: a graph-based approach. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence (ICTAI), Patras, Greece, pp 281–288. doi:10.1109/ICTAI.2007.169

  35. Kriegel HP, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, NV, pp 444–452. doi:10.1145/1401890.1401946

  36. Kriegel HP, Kröger P, Schubert E, Zimek A (2009a) LoOP: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM), Hong Kong, China, pp 1649–1652. doi:10.1145/1645953.1646195

  37. Kriegel HP, Kröger P, Schubert E, Zimek A (2009b) Outlier detection in axis-parallel subspaces of high dimensional data. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Bangkok. Thailand, pp 831–838. doi:10.1007/978-3-642-01307-2_86

  38. Kriegel HP, Kröger P, Schubert E, Zimek A (2011) Interpreting and unifying outlier scores. In: Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, AZ, pp 13–24

  39. Kriegel HP, Kröger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), Brussels, Belgium

  40. Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, IL, pp 157–166. doi:10.1145/1081870.1081891

  41. Lee W, Kim H, Kang H, Lee J, Kim Y, Jeon S (2004) Video cataloging system for real-time scene change detection of news video. In: Proceedings of Teh 10th international workshop on combinatorial image analysis (IWCIA), Auckland, New Zealand, pp 705–715

  42. Lee JG, Han J, Li X (2008) Trajectory outlier detection: a partition-and-detect framework. In: Proceedings of the 24th international conference on data engineering (ICDE), Cancun, Mexico, pp 140–149. doi:10.1109/ICDE.2008.4497422

  43. Liu X, Lu CT, Chen F (2010) Spatial outlier detection: Random walk based approaches. In: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), San Jose, CA, pp 370–379, doi:10.1145/1869790.1869841

  44. Lu CT, Chen D, Kou Y (2003) Algorithms for spatial outlier detection. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, FL, pp 597–600. doi:10.1109/ICDM.2003.1250986

  45. Money AG, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143. doi:10.1016/j.jvcir.2007.04.002

    Article  Google Scholar 

  46. Moran P (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23

    Article  MATH  MathSciNet  Google Scholar 

  47. Müller E, Assent I, Steinhausen U, Seidl T (2008) OutRank: ranking outliers in high dimensional data. In: Proceedings of the 24th international conference on data engineering (ICDE) workshop on ranking in databases (DBRank), Cancun, Mexico, pp 600–603. doi:10.1109/ICDEW.2008.4498387

  48. Müller E, Schiffer M, Gerwert P, Hannen M, Jansen T, Seidl T (2010a) SOREX: subspace outlier ranking exploration toolkit. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD), Barcelona, Spain, pp 607–610. doi:10.1007/978-3-642-15939-8_44

  49. Müller E, Schiffer M, Seidl T (2010b) Adaptive outlierness for subspace outlier ranking. In: Proceedings of the 19th ACM conference on information and knowledge management (CIKM), Toronto, ON, Canada, pp 1629–1632. doi:10.1145/1871437.1871690

  50. Müller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: Proceedings of the 27th international conference on data engineering (ICDE), Hannover, Germany, pp 434–445. doi:10.1109/ICDE.2011.5767916

  51. Nguyen HV, Ang HH, Gopalkrishnan V (2010) Mining outliers with ensemble of heterogeneous detectors on random subspaces. In: Proceedings of the 15th international conference on database systems for advanced applications (DASFAA), Tsukuba, Japan, pp 368–383. doi:10.1007/978-3-642-12026-8_29

  52. Nguyen HV, Gopalkrishnan V, Assent I (2011) An unbiased distance-based outlier detection approach for high-dimensional data. In: Proceedings of the 16th international conference on database systems for advanced applications (DASFAA), Hong Kong, China, pp 138–152. doi:10.1007/978-3-642-20149-3_12

  53. Orair GH, Teixeira C, Wang Y, Meira W Jr, Parthasarathy S (2010) Distance-based outlier detection: consolidation and renewed bearing. Proc VLDB Endow 3(2):1469–1480

    Google Scholar 

  54. Papadimitriou S, Kitagawa H, Gibbons P, Faloutsos C (2003) LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE), Bangalore, India, pp 315–326. doi:10.1109/ICDE.2003.1260802

  55. Pei Y, Zaïane O, Gao Y (2006) An efficient reference-based approach to outlier detection in large datasets. In: Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China, pp 478–487. doi:10.1109/ICDM.2006.17

  56. Pickup L, Zisserman A (2009) Automatic retrieval of visual continuity errors in movies. In: Proceedings of the 8th ACM international conference on image and video retrieval (CIVR), Santorini, Greece. doi:10.1145/1646396.1646406

  57. Pokrajac D, Lazarevic A, Latecki LJ (2007) Incremental local outlier detection for data streams. In: Proceedings of the IEEE symposium on computational intelligence and data mining (CIDM), Honolulu, HI, pp 504–515. doi:10.1109/CIDM.2007.368917

  58. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, TX, pp 427–438

  59. Roddick JF, Spiliopoulou M (1999) A bibliography of temporal, spatial and spatio-temporal data mining research. ACM SIGKDD Explor 1(1):34–38. doi:10.1145/846170.846173

    Article  Google Scholar 

  60. Schubert E, Wojdanowski R, Zimek A, Kriegel HP (2012) On evaluation of outlier rankings and outlier scores. In: Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA, pp 1047–1058

  61. Shekhar S, Lu CT, Zhang P (2003) A unified approach to detecting spatial outliers. GeoInformatica 7(2):139–166. doi:10.1023/A:1023455925009

    Article  Google Scholar 

  62. Su X, Tsai CL (2011) Outlier detection. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):261–268. doi:10.1002/widm.19

    Article  Google Scholar 

  63. Sun P, Chawla S (2004) On local spatial outliers. In: Proceedings of the 4th IEEE international conference on data mining (ICDM), Brighton, UK, pp 209–216. doi:10.1109/ICDM.2004.10097

  64. Takeuchi J, Yamanishi K (2006) A unifying framework for detecting outliers and change points from time series. IEEE Trans Knowl Data Eng 18(4):482–492. doi:10.1109/TKDE.2006.1599387

    Article  Google Scholar 

  65. Tan PN, Steinbach M, Kumar V (2006) Introduction to Data Mining. Addison Wesley, Upper Saddle River

    Google Scholar 

  66. Tang J, Chen Z, Fu AWC, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Taipei, Taiwan, pp 535–548. doi:10.1007/3-540-47887-6_53

  67. Vu NH, Gopalkrishnan V (2009) Efficient pruning schemes for distance-based outlier detection. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD), Bled, Slovenia, pp 160–175. doi:10.1007/978-3-642-04174-7_11

  68. Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixture with discounting learning algorithms. Data Min Knowl Discov 8:275–300. doi:10.1023/B:DAMI.0000023676.72185.7c

    Article  MathSciNet  Google Scholar 

  69. Yu JX, Qian W, Lu H, Zhou A (2006) Finding centric local outliers in categorical/numerical spaces. Knowl Inf Syst 9(3):309–338. doi:10.1007/s10115-005-0197-6

    Article  Google Scholar 

  70. Zhang J, Lou M, Ling TW, Wang H (2004) HOS-miner: a system for detecting outlying subspaces of high-dimensional data. In: Proceedings of the 30th international conference on very large databases (VLDB), Toronto, Canada, pp 1265–1268

  71. Zhang K, Hutter M, Jin H (2009) A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Bangkok, Thailand, pp 813–822. doi:10.1007/978-3-642-01307-2_84

  72. Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387. doi:10.1002/sam.11161

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

Arthur Zimek is partially supported by NSERC.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Arthur Zimek.

Additional information

Responsible editor: M. J. Zaki.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Schubert, E., Zimek, A. & Kriegel, HP. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc 28, 190–237 (2014). https://doi.org/10.1007/s10618-012-0300-z

Download citation

Keywords

  • Local outlier
  • Spatial outlier
  • Video outlier
  • Network outlier