Online Outlier Detection Based on Relative Neighbourhood Dissimilarity

  • Nguyen Hoang Vu
  • Vivekanand Gopalkrishnan
  • Praneeth Namburi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5175)


Outlier detection has many practical applications, especially in domains that have scope for abnormal behavior, such as fraud detection, network intrusion detection, medical diagnosis, etc. In this paper, we present a technique for detecting outliers and learning from data in multi-dimensional streams. Since the concept in such streaming data may drift, learning approaches should be online and should adapt quickly. Our technique adapts to new incoming data points, and incrementally maintains the models it builds in order to overcome the effect of concept drift. Through various experimental results on real data sets, our approach is shown to be effective in detecting outliers in data streams as well as in maintaining model accuracy.


Data Stream False Alarm Rate Outlier Detection Concept Drift Chunk Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    UCI machine learning repository,
  2. 2.
    Aggarwal, C.C.: On abnormality detection in spuriously populated data streams. In: SDM (2005)Google Scholar
  3. 3.
    Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection and prediction of outliers. IEEE Transactions on Knowledge and Data Engineering 18(2), 145–160 (2006)CrossRefGoogle Scholar
  4. 4.
    Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: SIGMOD Conference, pp. 93–104 (2000)Google Scholar
  5. 5.
    Can, F.: Incremental clustering for dynamic information processing. ACM Transactions on Information Systems 11(2), 143–164 (1993)CrossRefGoogle Scholar
  6. 6.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: KDD, pp. 71–80 (2000)Google Scholar
  7. 7.
    Fawcett, T., Provost, F.J.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1(3), 291–316 (1997)CrossRefGoogle Scholar
  8. 8.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528 (2003)CrossRefGoogle Scholar
  9. 9.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  10. 10.
    Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis 8(3), 281–300 (2004)Google Scholar
  11. 11.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, pp. 392–403 (1998)Google Scholar
  12. 12.
    Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: KDD, pp. 157–166 (2005)Google Scholar
  13. 13.
    Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Mining and Knowledge Discovery 12(2-3), 203–228 (2006)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Pazzani, M., Muramatsu, J., Billsus, D.: Syskill and webert: Identifying interesting web sites. In: AAAI, pp. 54–61 (1996)Google Scholar
  15. 15.
    Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online outlier detection in sensor data using non-parametric models. In: VLDB, pp. 187–198 (2006)Google Scholar
  16. 16.
    Tsymbal, A.: The problem of concept drift: Definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Nguyen Hoang Vu
    • 1
  • Vivekanand Gopalkrishnan
    • 1
  • Praneeth Namburi
    • 1
  1. 1.Nanyang Technological UniversitySingapore

Personalised recommendations