Skip to main content

Outlier detection from large distributed databases

Abstract

In this paper, we present an innovative system, coined as DISTROD (a.k.a DISTRibuted Outlier Detector), for detecting outliers, namely abnormal instances or observations, from multiple large distributed databases. DISTROD is able to effectively detect the so-called global outliers from distributed databases that are consistent with those produced by the centralized detection paradigm. DISTROD is equipped with a number of optimization/boosting strategies which empower it to significantly enhance its speed performance and reduce its communication overhead. Experimental evaluation demonstrates the good performance of DISTROD in terms of speed and communication overhead.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD’98, pp. 94–105. Seattle, Washington (1998)

  2. 2.

    Breuning, M., Kriegel, H-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD’00, pp. 93–104. Dallas, Texas (2000)

  3. 3.

    Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. John Wiley (1994)

  4. 4.

    Branch, J.W., Szymanski, B.K., Giannella, C., Wolff, R., Kargupta, H.: In-network outlier detection in wireless sensor networks. In: ICDCS’06, pp. 51. Lisboa, Portugal (2006)

  5. 5.

    Chhabra, P., Scott, C., Kolaczyk, E.D., Crovella, M.: Distributed spatial anomaly detection. In: INFOCOM’08, pp. 1705–1713. Phoenix, AZ (2008)

  6. 6.

    Dutta, H., Giannella, C., Borne, K.D., Kargupta, H.: Distributed top-K outlier detection from astronomy catalogs using the DEMAC system. In: SDM’07. Minneapolis, Minnesota (2007)

  7. 7.

    Ester, M., Kriegel, H-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD’96, pp. 226–231. Portland, Oregon, USA (1996)

  8. 8.

    Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)

    Book  MATH  Google Scholar 

  9. 9.

    Hinneburg, A., Keim, D.A.: An efficient approach to cluster in large multimedia databases with noise. In: SIGKDD’98, pp. 58–65. New York, NY (1998)

  10. 10.

    Jin, W., Tung, A.K.H., Han, J.: Finding top n local outliers in large database. In: SIGKDD’01, pp. 293–298. San Francisco, CA (2001)

  11. 11.

    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large dataset. In: VLDB’98, pp. 392–403. New York, NY (1998)

  12. 12.

    Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: VLDB’99, pp. 211–222. Edinburgh, Scotland (1999)

  13. 13.

    Koufakou, A., Georgiopoulos, M.: A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes. Data Min. Knowl. Disc. 20(2), 259–289 (2010)

    Article  MathSciNet  Google Scholar 

  14. 14.

    Kaosar, M.G., Xu, Z., Yi, X.: Distributed Association rule mining with minimum communication overhead. In: AusDM’09. Melbourne, Australia (2009)

  15. 15.

    Ng, R., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB’94, pp. 144–155. Santiago, Chile (1994)

  16. 16.

    Otey, M., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed attribute data sets. Data Min. Knowl. Disc. 12(2), 203–228 (2006)

    Article  MathSciNet  Google Scholar 

  17. 17.

    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD’00, pp. 427–438. Dallas, Texas (2000)

  18. 18.

    Sheng, B., Li, Q., Mao, W., Jin, W.: Outlier detection in sensor networks. In: MobiHoc’07, pp. 219–228. Montral, Qubec, Canada (2007)

  19. 19.

    Su, L., Han, W., Yang, S., Zou, P., Jia, Y.: Continuous adaptive outlier detection on distributed data streams. In: HPCC’07, pp. 74–85. Houston, TX, USA (2007)

  20. 20.

    Tang, J., Chen, Z., Fu, A., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: PAKDD’02, pp. 535–548. Taipei, Taiwan (2002)

  21. 21.

    Zhou, J. et al: A novel outlier detection algorithm for distributed databases. In: FSKD’05, pp. 293–297. Shangdong, China (2008)

  22. 22.

    Zhang, J., Wang, H.: Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance. Knowl. Inf. Syst. (KAIS) 10(3), 333–355 (2006)

    Article  Google Scholar 

  23. 23.

    Zhang, J., Hsu, W., Lee, M.L.: Clustering in dynamic spatial databases. J. Intell. Inf. Syst. (JIIS) 24(1), 5–27 (2005)

    Article  MATH  Google Scholar 

  24. 24.

    Zhang, J., Lou, M., Ling, T.W., Wang, H.: HOS-Miner: a system for detecting outlying subspaces of high-dimensional data. In: VLDB’04, pp. 1265–1268. Toronto, Canada (2004)

  25. 25.

    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: SIGMOD’96, pp. 103–114. Montreal, Canada (1996)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ji Zhang.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zhang, J., Tao, X. & Wang, H. Outlier detection from large distributed databases. World Wide Web 17, 539–568 (2014). https://doi.org/10.1007/s11280-013-0218-4

Download citation

Keywords

  • Data mining
  • Distributed database
  • Outlier detection