Detection of Outliers in an Unsupervised Environment

Ashwini Kumari, M.; Bhargavi, M. S.; Gowda, Sahana D.

doi:10.1007/978-81-322-2208-8_51

M. Ashwini Kumari⁷,
M. S. Bhargavi⁷ &
Sahana D. Gowda⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 32))

2413 Accesses

Abstract

Outliers are exceptions when compared with the rest of the data. Outliers do not have a clear distinction with respect to regular samples in the dataset. Analysis and knowledge extraction from data with outliers lead to ambiguity and confused conclusions. Therefore, there is a need for detection of outliers as a pre-processing stage for data mining. In a multidimensional perspective, outlier detection is a challenging issue as an object may deviate in one subspace and may appear perfectly regular in another subspace. In this paper, an ensemble meta-algorithm has been proposed to analyze and vote the samples for outlier identification in multidimensional subspaces. Cook’s distance, a regression based model has been applied to detect the outliers voted by the ensemble meta-algorithm. Extensive experimentation on real datasets demonstrates the efficiency of the proposed system in detecting outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mahalanobis, P.C.: On the generalized distance in statistics. Proc. Natl. Inst. Sci. India 12, 49–55 (1936)
Google Scholar
Liu, R.Y., Singh, K.: A quality index based on data depth and multivariate rank tests. J. Am. Stat. Assoc. 88, 252–260 (1993)
MathSciNet MATH Google Scholar
Arning, A., Agrawal, R., Raghavan, P.: A linear method for deviation detection in large databases. In: Proceedings of Data Mining and Knowledge Discovery, pp. 164–169. Portland, Oregon (1996)
Google Scholar
Knorr, E., Ng. R.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of 24th International Conference on Very Large Data Bases (VLDB), pp. 392–403, 24–27 (1998)
Google Scholar
Ramaswamy S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Dallas, TX (2000)
Google Scholar
Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recogn. Lett. 22(6), 691–700 (2001)
Article MATH Google Scholar
Filzmoser, P.: A multivariate outlier detection method (2004)
Google Scholar
Hawkins, S., et al.: Outlier detection using replicator neural networks. Data Warehousing and Knowledge Discovery, pp. 170–180. Springer, Berlin (2002)
Google Scholar
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9), 1641–1650 (2003)
Google Scholar
Rousseeuw, P.J., Van Zomeren, B.C.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)
Article Google Scholar
Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: Proceedings of KDD. pp. 157–166 (2005)
Google Scholar
He, Z., Deng, S., Xu, X.: A unified subspace outlier ensemble framework for outlier detection. In: Fan, W., Wu, Z., Yang, J. (eds.). LNCS, vol. 3739 pp. 632–637. Springer, Heidelberg, (2005)
Google Scholar
Nguyen, H.V., Ang, H.H., Gopalkrishnan V: Mining outliers with ensemble of heterogeneous detectors on random subspaces. Database Systems for Advanced Applications. Springer Berlin Heidelberg (2010)
Google Scholar
Breunig, M., Kriegel, H.P., Ng, R., Sander, J.: LOF: Identifying Density-based Local Outliers. ACM SIGMOD Conference (2000)
Google Scholar
Papadimitriou, S., et al.: Loci: Fast outlier detection using the local correlation integral. In: Proceedings of 19th International Conference on Data Engineering. IEEE (2003)
Google Scholar
Zimek, A., et al.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2013)
Google Scholar
Gao, J., Tan, P.N.: Converting output scores from outlier detection algorithms into probability estimates. In: Sixth International Conference on Data Mining. IEEE (2006)
Google Scholar
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley (1987)
Google Scholar
Tukey, J.: Exploratory Data Analysis. Addison-Wesley (1977)
Google Scholar
Ruts, I., Rousseeuw, P.J.: Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal. 23, 153–168 (1996)
Article MATH Google Scholar
Müller, E., et al.: Outlier Ranking via Subspace Analysis in Multiple Views of the Data. ICDM (2012)
Google Scholar
Keller, F., Muller, E., Bohm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: IEEE 28th International Conference on. Data Engineering (ICDE) (2012)
Google Scholar
Foss, A., Zaïane, O.R.: Class separation through variance: a new application of outlier detection. Knowl. Inf. Syst. 29(3), 565–596 (2011)
Google Scholar
Nguyen, H.V., et al.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. SDM (2013)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. ACM Sigmod Record, vol. 30. No. 2, ACM, New York (2001)
Google Scholar
Aggarwal, C.C.: Outlier ensembles. Position paper. ACM SIGKDD Explorations Newsletter. pp. 49–58, (2013)
Google Scholar
Cook, R.: Detection of influential observations in linear regression. Technometrics 19, 15–18 (1977)
Article MathSciNet MATH Google Scholar
Hoaglin, D., Welsch, R.: The hat matrix in regression and anova. Am. Stat. 32, 17–22 (1978)
MATH Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, BNM Institute of Technology, Bangalore, India
M. Ashwini Kumari, M. S. Bhargavi & Sahana D. Gowda

Authors

M. Ashwini Kumari
View author publications
You can also search for this author in PubMed Google Scholar
M. S. Bhargavi
View author publications
You can also search for this author in PubMed Google Scholar
Sahana D. Gowda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Ashwini Kumari .

Editor information

Editors and Affiliations

University of Canberra, Canberra, Australia and University of South Australia, Adelaide, South Australia, Australia
Lakhmi C. Jain
Department of Computer Science and Engineering, Veer Surendra Sai University of Technology, Sambalpur, Odisha, India
Himansu Sekhar Behera
Computer Science & Engineering, Kalyani University, Nadia, West Bengal, India
Jyotsna Kumar Mandal
Dept. of Computer Science and Engineering, National Institute of Technology Rourkela, Rourkela, India
Durga Prasad Mohapatra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashwini Kumari, M., Bhargavi, M.S., Gowda, S.D. (2015). Detection of Outliers in an Unsupervised Environment. In: Jain, L., Behera, H., Mandal, J., Mohapatra, D. (eds) Computational Intelligence in Data Mining - Volume 2. Smart Innovation, Systems and Technologies, vol 32. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2208-8_51

Download citation

DOI: https://doi.org/10.1007/978-81-322-2208-8_51
Published: 11 December 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2207-1
Online ISBN: 978-81-322-2208-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics