Outlier detection from large distributed databases

Zhang, Ji; Tao, Xiaohui; Wang, Hua

doi:10.1007/s11280-013-0218-4

Outlier detection from large distributed databases

Published: 01 May 2013

Volume 17, pages 539–568, (2014)
Cite this article

World Wide Web Aims and scope Submit manuscript

Ji Zhang¹,
Xiaohui Tao¹ &
Hua Wang¹

601 Accesses
48 Citations
Explore all metrics

Abstract

In this paper, we present an innovative system, coined as DISTROD (a.k.a DISTRibuted Outlier Detector), for detecting outliers, namely abnormal instances or observations, from multiple large distributed databases. DISTROD is able to effectively detect the so-called global outliers from distributed databases that are consistent with those produced by the centralized detection paradigm. DISTROD is equipped with a number of optimization/boosting strategies which empower it to significantly enhance its speed performance and reduce its communication overhead. Experimental evaluation demonstrates the good performance of DISTROD in terms of speed and communication overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD’98, pp. 94–105. Seattle, Washington (1998)
Breuning, M., Kriegel, H-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD’00, pp. 93–104. Dallas, Texas (2000)
Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. John Wiley (1994)
Branch, J.W., Szymanski, B.K., Giannella, C., Wolff, R., Kargupta, H.: In-network outlier detection in wireless sensor networks. In: ICDCS’06, pp. 51. Lisboa, Portugal (2006)
Chhabra, P., Scott, C., Kolaczyk, E.D., Crovella, M.: Distributed spatial anomaly detection. In: INFOCOM’08, pp. 1705–1713. Phoenix, AZ (2008)
Dutta, H., Giannella, C., Borne, K.D., Kargupta, H.: Distributed top-K outlier detection from astronomy catalogs using the DEMAC system. In: SDM’07. Minneapolis, Minnesota (2007)
Ester, M., Kriegel, H-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD’96, pp. 226–231. Portland, Oregon, USA (1996)
Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)
Book MATH Google Scholar
Hinneburg, A., Keim, D.A.: An efficient approach to cluster in large multimedia databases with noise. In: SIGKDD’98, pp. 58–65. New York, NY (1998)
Jin, W., Tung, A.K.H., Han, J.: Finding top n local outliers in large database. In: SIGKDD’01, pp. 293–298. San Francisco, CA (2001)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large dataset. In: VLDB’98, pp. 392–403. New York, NY (1998)
Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: VLDB’99, pp. 211–222. Edinburgh, Scotland (1999)
Koufakou, A., Georgiopoulos, M.: A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes. Data Min. Knowl. Disc. 20(2), 259–289 (2010)
Article MathSciNet Google Scholar
Kaosar, M.G., Xu, Z., Yi, X.: Distributed Association rule mining with minimum communication overhead. In: AusDM’09. Melbourne, Australia (2009)
Ng, R., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB’94, pp. 144–155. Santiago, Chile (1994)
Otey, M., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed attribute data sets. Data Min. Knowl. Disc. 12(2), 203–228 (2006)
Article MathSciNet Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD’00, pp. 427–438. Dallas, Texas (2000)
Sheng, B., Li, Q., Mao, W., Jin, W.: Outlier detection in sensor networks. In: MobiHoc’07, pp. 219–228. Montral, Qubec, Canada (2007)
Su, L., Han, W., Yang, S., Zou, P., Jia, Y.: Continuous adaptive outlier detection on distributed data streams. In: HPCC’07, pp. 74–85. Houston, TX, USA (2007)
Tang, J., Chen, Z., Fu, A., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: PAKDD’02, pp. 535–548. Taipei, Taiwan (2002)
Zhou, J. et al: A novel outlier detection algorithm for distributed databases. In: FSKD’05, pp. 293–297. Shangdong, China (2008)
Zhang, J., Wang, H.: Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance. Knowl. Inf. Syst. (KAIS) 10(3), 333–355 (2006)
Article Google Scholar
Zhang, J., Hsu, W., Lee, M.L.: Clustering in dynamic spatial databases. J. Intell. Inf. Syst. (JIIS) 24(1), 5–27 (2005)
Article MATH Google Scholar
Zhang, J., Lou, M., Ling, T.W., Wang, H.: HOS-Miner: a system for detecting outlying subspaces of high-dimensional data. In: VLDB’04, pp. 1265–1268. Toronto, Canada (2004)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: SIGMOD’96, pp. 103–114. Montreal, Canada (1996)

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computing, University of Southern Queensland, Toowoomba, QLD, Australia
Ji Zhang, Xiaohui Tao & Hua Wang

Authors

Ji Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Tao
View author publications
You can also search for this author in PubMed Google Scholar
Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ji Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Tao, X. & Wang, H. Outlier detection from large distributed databases. World Wide Web 17, 539–568 (2014). https://doi.org/10.1007/s11280-013-0218-4

Download citation

Received: 28 October 2012
Revised: 23 March 2013
Accepted: 17 April 2013
Published: 01 May 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11280-013-0218-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Outlier detection from large distributed databases

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Big data analytics on Apache Spark

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Outlier detection from large distributed databases

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Big data analytics on Apache Spark

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation