Robust Outlier Detection Using Commute Time and Eigenspace Embedding
We present a method to find outliers using ‘commute distance’ computed from a random walk on graph. Unlike Euclidean distance, commute distance between two nodes captures both the distance between them and their local neighborhood densities. Indeed commute distance is the Euclidean distance in the space spanned by eigenvectors of the graph Laplacian matrix. We show by analysis and experiments that using this measure, we can capture both global and local outliers effectively with just a distance based method. Moreover, the method can detect outlying clusters which other traditional methods often fail to capture and also shows a high resistance to noise than local outlier detection method. Moreover, to avoid the O(n3) direct computation of commute distance, a graph component sampling and an eigenspace approximation combined with pruning technique reduce the time to O(nlogn) while preserving the outlier ranking.
Keywordsoutlier detection commute distance eigenspace embedding random walk nearest neighbor graph
Unable to display preview. Download preview PDF.
- 1.Database basketball, http://www.databasebasketball.com
- 3.Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proc. of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, pp. 93–104. ACM, New York (2000)CrossRefGoogle Scholar
- 4.Chandola, V., Banerjee, A., Kumar, V.: Outlier detection: A survey. Tech. Rep. TR 07-017, Department of Computer Science and Engineering, University of Minnesota, Twin Cities (2007)Google Scholar
- 5.Chung, F.: Spectral Graph Theory. In: Conference Board of the Mathematical Sciences, Washington. CBMS Regional Conference Series, vol. 92 (1997)Google Scholar
- 8.Khoa, N.L.D., Chawla, S.: Unifying global and local outlier detection using commute time distance. Tech. Rep. 638, School of IT, University of Sydney (2009)Google Scholar
- 10.Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: The 24rd International Conference on Very Large Data Bases, pp. 392–403 (1998)Google Scholar
- 11.Lovász, L.: Random walks on graphs: a survey. Combinatorics, Paul Erdös is Eighty 2, 1–46 (1993)Google Scholar
- 14.Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (2003)Google Scholar
- 15.Saerens, M., Fouss, F., Yen, L., Dupont, P.: The principal components analysis of a graph, and its relationships to spectral clustering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 371–383. Springer, Heidelberg (2004)Google Scholar
- 16.Sun, P.: Outlier Detection In High Dimensional, Spatial And Sequential Data Sets. Ph.D. thesis, The University of Sydney (2006)Google Scholar