Online Outlier Detection Based on Relative Neighbourhood Dissimilarity
Outlier detection has many practical applications, especially in domains that have scope for abnormal behavior, such as fraud detection, network intrusion detection, medical diagnosis, etc. In this paper, we present a technique for detecting outliers and learning from data in multi-dimensional streams. Since the concept in such streaming data may drift, learning approaches should be online and should adapt quickly. Our technique adapts to new incoming data points, and incrementally maintains the models it builds in order to overcome the effect of concept drift. Through various experimental results on real data sets, our approach is shown to be effective in detecting outliers in data streams as well as in maintaining model accuracy.
KeywordsData Stream False Alarm Rate Outlier Detection Concept Drift Chunk Size
Unable to display preview. Download preview PDF.
- 1.UCI machine learning repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
- 2.Aggarwal, C.C.: On abnormality detection in spuriously populated data streams. In: SDM (2005)Google Scholar
- 4.Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: SIGMOD Conference, pp. 93–104 (2000)Google Scholar
- 6.Domingos, P., Hulten, G.: Mining high-speed data streams. In: KDD, pp. 71–80 (2000)Google Scholar
- 10.Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis 8(3), 281–300 (2004)Google Scholar
- 11.Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, pp. 392–403 (1998)Google Scholar
- 12.Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: KDD, pp. 157–166 (2005)Google Scholar
- 14.Pazzani, M., Muramatsu, J., Billsus, D.: Syskill and webert: Identifying interesting web sites. In: AAAI, pp. 54–61 (1996)Google Scholar
- 15.Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online outlier detection in sensor data using non-parametric models. In: VLDB, pp. 187–198 (2006)Google Scholar
- 16.Tsymbal, A.: The problem of concept drift: Definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland (2004)Google Scholar