Skip to main content

Detection of Outliers in an Unsupervised Environment

  • Conference paper
  • First Online:
Computational Intelligence in Data Mining - Volume 2

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 32))

  • 2413 Accesses

Abstract

Outliers are exceptions when compared with the rest of the data. Outliers do not have a clear distinction with respect to regular samples in the dataset. Analysis and knowledge extraction from data with outliers lead to ambiguity and confused conclusions. Therefore, there is a need for detection of outliers as a pre-processing stage for data mining. In a multidimensional perspective, outlier detection is a challenging issue as an object may deviate in one subspace and may appear perfectly regular in another subspace. In this paper, an ensemble meta-algorithm has been proposed to analyze and vote the samples for outlier identification in multidimensional subspaces. Cook’s distance, a regression based model has been applied to detect the outliers voted by the ensemble meta-algorithm. Extensive experimentation on real datasets demonstrates the efficiency of the proposed system in detecting outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mahalanobis, P.C.: On the generalized distance in statistics. Proc. Natl. Inst. Sci. India 12, 49–55 (1936)

    Google Scholar 

  2. Liu, R.Y., Singh, K.: A quality index based on data depth and multivariate rank tests. J. Am. Stat. Assoc. 88, 252–260 (1993)

    MathSciNet  MATH  Google Scholar 

  3. Arning, A., Agrawal, R., Raghavan, P.: A linear method for deviation detection in large databases. In: Proceedings of Data Mining and Knowledge Discovery, pp. 164–169. Portland, Oregon (1996)

    Google Scholar 

  4. Knorr, E., Ng. R.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of 24th International Conference on Very Large Data Bases (VLDB), pp. 392–403, 24–27 (1998)

    Google Scholar 

  5. Ramaswamy S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Dallas, TX (2000)

    Google Scholar 

  6. Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recogn. Lett. 22(6), 691–700 (2001)

    Article  MATH  Google Scholar 

  7. Filzmoser, P.: A multivariate outlier detection method (2004)

    Google Scholar 

  8. Hawkins, S., et al.: Outlier detection using replicator neural networks. Data Warehousing and Knowledge Discovery, pp. 170–180. Springer, Berlin (2002)

    Google Scholar 

  9. He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9), 1641–1650 (2003)

    Google Scholar 

  10. Rousseeuw, P.J., Van Zomeren, B.C.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)

    Article  Google Scholar 

  11. Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: Proceedings of KDD. pp. 157–166 (2005)

    Google Scholar 

  12. He, Z., Deng, S., Xu, X.: A unified subspace outlier ensemble framework for outlier detection. In: Fan, W., Wu, Z., Yang, J. (eds.). LNCS, vol. 3739 pp. 632–637. Springer, Heidelberg, (2005)

    Google Scholar 

  13. Nguyen, H.V., Ang, H.H., Gopalkrishnan V: Mining outliers with ensemble of heterogeneous detectors on random subspaces. Database Systems for Advanced Applications. Springer Berlin Heidelberg (2010)

    Google Scholar 

  14. Breunig, M., Kriegel, H.P., Ng, R., Sander, J.: LOF: Identifying Density-based Local Outliers. ACM SIGMOD Conference (2000)

    Google Scholar 

  15. Papadimitriou, S., et al.: Loci: Fast outlier detection using the local correlation integral. In: Proceedings of 19th International Conference on Data Engineering. IEEE (2003)

    Google Scholar 

  16. Zimek, A., et al.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2013)

    Google Scholar 

  17. Gao, J., Tan, P.N.: Converting output scores from outlier detection algorithms into probability estimates. In: Sixth International Conference on Data Mining. IEEE (2006)

    Google Scholar 

  18. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley (1987)

    Google Scholar 

  19. Tukey, J.: Exploratory Data Analysis. Addison-Wesley (1977)

    Google Scholar 

  20. Ruts, I., Rousseeuw, P.J.: Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal. 23, 153–168 (1996)

    Article  MATH  Google Scholar 

  21. Müller, E., et al.: Outlier Ranking via Subspace Analysis in Multiple Views of the Data. ICDM (2012)

    Google Scholar 

  22. Keller, F., Muller, E., Bohm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: IEEE 28th International Conference on. Data Engineering (ICDE) (2012)

    Google Scholar 

  23. Foss, A., Zaïane, O.R.: Class separation through variance: a new application of outlier detection. Knowl. Inf. Syst. 29(3), 565–596 (2011)

    Google Scholar 

  24. Nguyen, H.V., et al.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. SDM (2013)

    Google Scholar 

  25. Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. ACM Sigmod Record, vol. 30. No. 2, ACM, New York (2001)

    Google Scholar 

  26. Aggarwal, C.C.: Outlier ensembles. Position paper. ACM SIGKDD Explorations Newsletter. pp. 49–58, (2013)

    Google Scholar 

  27. Cook, R.: Detection of influential observations in linear regression. Technometrics 19, 15–18 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  28. Hoaglin, D., Welsch, R.: The hat matrix in regression and anova. Am. Stat. 32, 17–22 (1978)

    MATH  Google Scholar 

  29. Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Ashwini Kumari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer India

About this paper

Cite this paper

Ashwini Kumari, M., Bhargavi, M.S., Gowda, S.D. (2015). Detection of Outliers in an Unsupervised Environment. In: Jain, L., Behera, H., Mandal, J., Mohapatra, D. (eds) Computational Intelligence in Data Mining - Volume 2. Smart Innovation, Systems and Technologies, vol 32. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2208-8_51

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2208-8_51

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2207-1

  • Online ISBN: 978-81-322-2208-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics