Page-Based Anomaly Detection in Large Scale Web Clusters Using Adaptive MapReduce (Extended Abstract)

  • Junsup Lee
  • Sungdeok Cha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5230)


While anomaly detection systems typically work on single server, most commercial web sites operate cluster environments, and user queries trigger transactions scattered through multiple servers. For this reason, anomaly detectors in a same server farm should communicate with each other to integrate their partial profile. In this paper, we describe a real-time distributed anomaly detection system that can deal with over one billion transactions per day. In our system, base on Google MapReduce algorithm, an anomaly detector in each node shares profiles of user behaviors and propagates intruder information to reduce false alarms. We evaluated our system using web log data from The web log data, about 250GB in size, contains over one billion transactions recorded in a day.


  1. 1.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Operating Systems Design and Implementation, 137–149 (2004)Google Scholar
  2. 2.
    Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating MapReduce for Multi-core and Multiprocessor Systems. In: Proceedings of the 13th Intl. Symposium on HPCA, Phoenix, AZ (February 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Junsup Lee
    • 1
  • Sungdeok Cha
    • 2
  1. 1.The Attached Institute of ETRIDaejeonRepublic of Korea
  2. 2.Department of CSEKorea UniversitySeoulRepublic of Korea

Personalised recommendations