Detection of Outliers in an Unsupervised Environment

  • M. Ashwini Kumari
  • M. S. Bhargavi
  • Sahana D. Gowda
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 32)

Abstract

Outliers are exceptions when compared with the rest of the data. Outliers do not have a clear distinction with respect to regular samples in the dataset. Analysis and knowledge extraction from data with outliers lead to ambiguity and confused conclusions. Therefore, there is a need for detection of outliers as a pre-processing stage for data mining. In a multidimensional perspective, outlier detection is a challenging issue as an object may deviate in one subspace and may appear perfectly regular in another subspace. In this paper, an ensemble meta-algorithm has been proposed to analyze and vote the samples for outlier identification in multidimensional subspaces. Cook’s distance, a regression based model has been applied to detect the outliers voted by the ensemble meta-algorithm. Extensive experimentation on real datasets demonstrates the efficiency of the proposed system in detecting outliers.

Keywords

Outlier detection Outlier ensemble Multidimensional subspace analysis Cook’s distance 

References

  1. 1.
    Mahalanobis, P.C.: On the generalized distance in statistics. Proc. Natl. Inst. Sci. India 12, 49–55 (1936)Google Scholar
  2. 2.
    Liu, R.Y., Singh, K.: A quality index based on data depth and multivariate rank tests. J. Am. Stat. Assoc. 88, 252–260 (1993)MathSciNetMATHGoogle Scholar
  3. 3.
    Arning, A., Agrawal, R., Raghavan, P.: A linear method for deviation detection in large databases. In: Proceedings of Data Mining and Knowledge Discovery, pp. 164–169. Portland, Oregon (1996)Google Scholar
  4. 4.
    Knorr, E., Ng. R.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of 24th International Conference on Very Large Data Bases (VLDB), pp. 392–403, 24–27 (1998)Google Scholar
  5. 5.
    Ramaswamy S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Dallas, TX (2000)Google Scholar
  6. 6.
    Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recogn. Lett. 22(6), 691–700 (2001)CrossRefMATHGoogle Scholar
  7. 7.
    Filzmoser, P.: A multivariate outlier detection method (2004)Google Scholar
  8. 8.
    Hawkins, S., et al.: Outlier detection using replicator neural networks. Data Warehousing and Knowledge Discovery, pp. 170–180. Springer, Berlin (2002)Google Scholar
  9. 9.
    He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9), 1641–1650 (2003)Google Scholar
  10. 10.
    Rousseeuw, P.J., Van Zomeren, B.C.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)CrossRefGoogle Scholar
  11. 11.
    Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: Proceedings of KDD. pp. 157–166 (2005)Google Scholar
  12. 12.
    He, Z., Deng, S., Xu, X.: A unified subspace outlier ensemble framework for outlier detection. In: Fan, W., Wu, Z., Yang, J. (eds.). LNCS, vol. 3739 pp. 632–637. Springer, Heidelberg, (2005)Google Scholar
  13. 13.
    Nguyen, H.V., Ang, H.H., Gopalkrishnan V: Mining outliers with ensemble of heterogeneous detectors on random subspaces. Database Systems for Advanced Applications. Springer Berlin Heidelberg (2010)Google Scholar
  14. 14.
    Breunig, M., Kriegel, H.P., Ng, R., Sander, J.: LOF: Identifying Density-based Local Outliers. ACM SIGMOD Conference (2000)Google Scholar
  15. 15.
    Papadimitriou, S., et al.: Loci: Fast outlier detection using the local correlation integral. In: Proceedings of 19th International Conference on Data Engineering. IEEE (2003)Google Scholar
  16. 16.
    Zimek, A., et al.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2013)Google Scholar
  17. 17.
    Gao, J., Tan, P.N.: Converting output scores from outlier detection algorithms into probability estimates. In: Sixth International Conference on Data Mining. IEEE (2006)Google Scholar
  18. 18.
    Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley (1987)Google Scholar
  19. 19.
    Tukey, J.: Exploratory Data Analysis. Addison-Wesley (1977)Google Scholar
  20. 20.
    Ruts, I., Rousseeuw, P.J.: Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal. 23, 153–168 (1996)CrossRefMATHGoogle Scholar
  21. 21.
    Müller, E., et al.: Outlier Ranking via Subspace Analysis in Multiple Views of the Data. ICDM (2012)Google Scholar
  22. 22.
    Keller, F., Muller, E., Bohm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: IEEE 28th International Conference on. Data Engineering (ICDE) (2012)Google Scholar
  23. 23.
    Foss, A., Zaïane, O.R.: Class separation through variance: a new application of outlier detection. Knowl. Inf. Syst. 29(3), 565–596 (2011)Google Scholar
  24. 24.
    Nguyen, H.V., et al.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. SDM (2013)Google Scholar
  25. 25.
    Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. ACM Sigmod Record, vol. 30. No. 2, ACM, New York (2001)Google Scholar
  26. 26.
    Aggarwal, C.C.: Outlier ensembles. Position paper. ACM SIGKDD Explorations Newsletter. pp. 49–58, (2013)Google Scholar
  27. 27.
    Cook, R.: Detection of influential observations in linear regression. Technometrics 19, 15–18 (1977)CrossRefMathSciNetMATHGoogle Scholar
  28. 28.
    Hoaglin, D., Welsch, R.: The hat matrix in regression and anova. Am. Stat. 32, 17–22 (1978)MATHGoogle Scholar
  29. 29.
    Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml

Copyright information

© Springer India 2015

Authors and Affiliations

  • M. Ashwini Kumari
    • 1
  • M. S. Bhargavi
    • 1
  • Sahana D. Gowda
    • 1
  1. 1.Department of Computer Science and EngineeringBNM Institute of TechnologyBangaloreIndia

Personalised recommendations