Outlier Detection with Kernel Density Functions
Conference paper
- 56 Citations
- 3.2k Downloads
Abstract
Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel unsupervised algorithm for outlier detection with a solid statistical foundation is proposed. First we modify a nonparametric density estimate with a variable kernel to yield a robust local density estimation. Outliers are then detected by comparing the local density of each point to the local density of its neighbors. Our experiments performed on several simulated data sets have demonstrated that the proposed approach can outperform two widely used outlier detection algorithms (LOF and LOCI).
Keywords
False Alarm Rate Outlier Detection Local Density Estimate Neighbor Query Variable Kernel
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Joshi, M., Agarwal, R., Kumar, V., Nrule, P.: Mining Needles in a Haystack: Classifying Rare Classes via Two-Phase Rule Induction. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Santa Barbara, CA (May 2001)Google Scholar
- 2.Chawla, N., Lazarevic, A., Hall, L., Bowyer, K.: SMOTEBoost: Improving the Prediction of Minority Class in Boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)Google Scholar
- 3.Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, New York, NY (1994)zbMATHGoogle Scholar
- 4.Lazarevic, A., Ertoz, L., Ozgur, A., Srivastava, J., Kumar, V.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the Third SIAM Int. Conf. on Data Mining, San Francisco, CA (May 2003)Google Scholar
- 5.Lazarevic, A., Kumar, V.: Feature Bagging for Outlier Detection. In: Proc. of the ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Chicago, IL (August 2005)Google Scholar
- 6.Billor, N., Hadi, A., Velleman, P.: BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators. Computational Statistics and Data Analysis 34, 279–298 (2000)zbMATHCrossRefGoogle Scholar
- 7.Eskin, E.: Anomaly Detection over Noisy Data using Learned Probability Distributions. In: Proceedings of the Int. Conf. on Machine Learning, Stanford University, CA (June 2000)Google Scholar
- 8.Aggarwal, C.C., Yu, P.: Outlier detection for high dimensional data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)Google Scholar
- 9.Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Identifying, L.O.F.: Density Based Local Outliers. In: Proceedings of the ACM SIGMOD Conference, Dallas, TX (May 2000)Google Scholar
- 10.Knorr, E., Ng, R.: Algorithms for Mining Distance based Outliers in Large Data Sets. In: VLDB. Proceedings of the Very Large Databases Conference, New York City, NY (August 1998)Google Scholar
- 11.Yu, D., Sheikholeslami, G., Zhang, A.: FindOut: Finding Outliers in Very Large Datasets. The Knowledge and Information Systems (KAIS) 4, 4 (2002)Google Scholar
- 12.Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data. In: Jajodia, S., Barbara, D. (eds.) Applications of Data Mining in Computer Security, Advances In Information Security, Kluwer Academic Publishers, Boston (2002)Google Scholar
- 13.Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier Detection Using Replicator Neural Networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002)CrossRefGoogle Scholar
- 14.Medioni, G., Cohen, I., Hongeng, S., Bremond, F., Nevatia, R.: Event Detection and Analysis from Video Streams. IEEE Trans. on Pattern Analysis and Machine Intelligence 8(23), 873–889 (2001)CrossRefGoogle Scholar
- 15.Chen, S.-C., Shyu, M.-L., Zhang, C., Strickrott, J.: Multimedia Data Mining for Traffic Video Sequences. MDM/KDD pp. 78–86 (2001)Google Scholar
- 16.Chen, S.-C., Shyu, M.-L., Zhang, C., Kashyap, R.L.: Video Scene Change Detection Method Using Unsupervised Segmentation And Object Tracking. ICME (2001)Google Scholar
- 17.Tao, Y., Papadias, D., Lian, X.: Reverse kNN search in arbitrary dimensionality. In: Proceedings of the 30th Int. Conf. on Very Large Data Bases, Toronto, Canada (September 2004)Google Scholar
- 18.Singh, A., Ferhatosmanoglu, H., Tosun, A.: High Dimensional Reverse Nearest Neighbor Queries. In: CIKM 2003. Proceedings of the ACM Int. Conf. on Information and Knowledge Management, New Orleans, LA (November 2003)Google Scholar
- 19.Stanoi, I., Agrawal, D., Abbadi, A.E.: Reverse Nearest Neighbor Queries for Dynamic Databases. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dalas, TX (May 2000)Google Scholar
- 20.Anderson, J., Tjaden, B.: The inverse nearest neighbor problem with astrophysical applications. In: Proceedings of the 12th Symposium of Discrete Algorithms (SODA), Washington, DC (January 2001)Google Scholar
- 21.Pokrajac, D., Latecki, L.J., Lazarevic, A., et al.: Computational geometry issues of reverse-k nearest neighbors queries, Technical Report TR-CIS5001, Delaware State University (2006)Google Scholar
- 22.Conway, J., Sloane, N.H.: Sphere Packings, Lattices and Groups. Springer, Heidelberg (1998)Google Scholar
- 23.Preparata, F.P., Shamos, M.I.: Computational Geometry: an Introduction, 2nd Printing. Springer, Heidelberg (1988)Google Scholar
- 24.Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of the ACM SIGMOD Conference, San Jose, CA, pp. 71–79 (1995)Google Scholar
- 25.Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec. 19(2), 322–331 (1990)CrossRefGoogle Scholar
- 26.Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for highdimensional data. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) Proceedings of the 22nd International Conference on Very Large Databases, San Francisco, USA, pp. 28–39. Morgan Kaufmann Publishers, Seattle (1996)Google Scholar
- 27.Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB 1998. Proceedings of the 24rd International Conference on Very Large Data Bases, San Francisco, CA, USA, pp. 194–205. Morgan Kaufmann, Seattle, Washington (1998)Google Scholar
- 28.DeMenthon, D., Latecki, L.J., Rosenfeld, A., Stückelberg, M.V.: Relevance Ranking of Video Data using Hidden Markov Model Distances and Polygon Simplification. In: Laurini, R. (ed.) VISUAL 2000. LNCS, vol. 1929, pp. 49–61. Springer, Heidelberg (2000)Google Scholar
- 29.Latecki, L.J., Miezianko, R., Megalooikonomou, V., Pokrajac, D.: Using Spatiotemporal Blocks to Reduce the Uncertainty in Detecting and Tracking Moving Objects in Video. Int. Journal of Intelligent Systems Technologies and Applications 1(3/4), 376–392 (2006)Google Scholar
- 30.Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, Heidelberg (2002)zbMATHGoogle Scholar
- 31.Lippmann, R.P., Fried, D.J., Graf, I.J., et al.: Evaluating Intrusion Detection Systems: The 1998 DARPA Off-line Intrusion Detection Evaluation. In: DISCEX 2000. Proc. DARPA Information Survivability Conf. and Exposition, vol. 2, pp. 12–26. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
- 32.Tcptrace software tool, www.tcptrace.org
- 33.UCI KDD Archive, KDD Cup Data Set (1999), www.ics.uci.edu/kdd/databases/kddcup99/kddcup99.html
- 34.Tang, J., Chen, Z., Fu, A., Cheung, D.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002)CrossRefGoogle Scholar
- 35.Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003. Proc. of the 19th Int. Conf. on Data Engineering, Bangalore, India (March 2003)Google Scholar
- 36.Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the Ninth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, New York, NY (2003)Google Scholar
- 37.Breiman, L., Meisel, W., Purcell, E.: Variable kernel estimates of multivariate densities. Technometrics 19(2), 135–144 (1977)zbMATHCrossRefGoogle Scholar
- 38.Loftsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Statist. 36, 1049–1051 (1965)zbMATHCrossRefMathSciNetGoogle Scholar
- 39.Terrell, G.R., Scott, D.W.: Variable kernel density estimation. The Annals of Statistics 20(3), 1236–1265 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
- 40.Maloof, M., Langley, P., Binford, T., Nevatia, R., Sage, S.: Improved Rooftop Detection in Aerial Images with Machine Learning. Machine Learning 53(1-2), 157–191 (2003)CrossRefGoogle Scholar
- 41.Michalski, R., Mozetic, I., Hong, J., Lavrac, N.: The Multi-Purpose Incremental Learning System AQ15 and its Testing Applications to Three Medical Domains. In: Proceedings of the Fifth National Conference on Artificial Intelligence, Philadelphia, PA, pp. 1041–1045 (1986)Google Scholar
- 42.van der Putten, P., van Someren, M.: CoIL Challenge 2000: The Insurance Company Case, Sentient Machine Research, Amsterdam and Leiden Institute of Advanced Computer Science, Leiden LIACS Technical Report 2000-09 (June 2000)Google Scholar
- 43.Ertoz, L.: Similarity Measures, PhD dissertation, University of Minnesota (2005)Google Scholar
- 44.Provost, F., Fawcett, T.: Robust Classification for Imprecise Environments. Machine Learning 42(3), 203–231 (2001)zbMATHCrossRefGoogle Scholar
- 45.Blake, C., Merz, C.: UCI Repository of machine learning databases (1998), http://www.ics.uci.edu/mlearn/MLRepository.html
- 46.Roussopoulos, N., Kelly, S., Vincent, F.: Nearest Neighbor Queries. In: Proc. ACM SIGMOD, pp. 71-79 (1995)Google Scholar
- 47.Devore, J.: Probability and Statistics for Engineering and the Sciences, 6th edn. (2003)Google Scholar
- 48.Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. (1999)Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2007