Fingerprinting Crowd Events in Content Delivery Networks: A Semi-supervised Methodology

  • Amine BoukhtoutaEmail author
  • Makan Pourzandi
  • Richard Brunner
  • Stéphane Dault
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10980)


Crowd events or flash crowds are meant to be a voluminous access to media or web assets due to a popular event. Even though the crowd event accesses are benign, the problem of distinguishing them from Distributed Denial of Service (DDoS) attacks is difficult by nature as both events look alike. In contrast to the rich literature about how to profile and detect DDoS attack, the problem of distinguishing the benign crowd events from DDoS attacks has not received much interest. In this work, we propose a new approach for profiling crowd events and segregating them from normal accesses. We use a first selection based on semi-supervised approach to segregate between normal events and crowd events using the number of requests. We use a density based clustering, namely, DBSCAN, to label patterns obtained from a time series. We then use a second more refined selection using the resulted clusters to classify the crowd events. To this end, we build a XGBoost classifier to detect crowd events with a high detection rate on the training dataset (99%). We present our initial results of crowd events fingerprinting using 8 days log data collected from a major Content Delivery Network (CDN) as a driving test. We further prove the validity of our approach by applying our models on unseen data, where abrupt changes in the number of accesses are detected. We show how our models can detect the crowd event with high accuracy. We believe that this approach can further be used in similar CDN to detect crowd events.


  1. 1.
    Tianqi, C., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2016)Google Scholar
  2. 2.
    Chen, T., He, T., Benesty, M.: xgboost: Extreme gradient boosting, pp. 1–4. R package version 0.4-2 (2015)Google Scholar
  3. 3.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, no. 34, pp. 226–231 (1966)Google Scholar
  4. 4.
  5. 5.
    Stocker, V., Smaragdakis, G., Lehr, W., Bauer, S.: The growing complexity of content delivery networks: challenges and implications for the internet ecosystem. Telecommun. Policy 41, 1003–1016 (2017)CrossRefGoogle Scholar
  6. 6.
    The CAIDA UCSD “DDoS Attack 2007” Dataset.
  7. 7.
    WITS: Waikato Internet Traffic Storage.
  8. 8.
    Lincoln Laboratory MIT. DARPA Intrusion Detection Evaluation.
  9. 9.
    The Internet Traffic Archive, WorldCup98.
  10. 10.
    Fachkha, C., Bou-Harb, E., Debbabi, M.: Fingerprinting internet DNS amplification DDoS activities. In: The 6th International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–5. IEEE (2014)Google Scholar
  11. 11.
    Rossow, C.: Amplification hell: revisiting network protocols for DDoS abuse. In: Proceedings of the 21st Network and Distributed System Security Symposium (NDSS) (2014)Google Scholar
  12. 12.
    Moustis, D., Kotzanikolaou, P.: Evaluating security controls against HTTP-based DDoS attacks. In: 2013 Fourth International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE (2013)Google Scholar
  13. 13.
    Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data. ACM (2006)Google Scholar
  14. 14.
    Manh, T.T., Kim, J.: The anomaly detection by using DBSCAN clustering with multiple parameters. In: 2011 International Conference on Information Science and Applications (ICISA). IEEE (2011)Google Scholar
  15. 15.
    Shahaboddin, S., Amini, A., Anuar, N.B., Kiah, M.L.M., Teh, Y.W., Furnell, S.: D-FICCA: a density-based fuzzy imperialist competitive clustering algorithm for intrusion detection in wireless sensor networks. Measurement 55, 212–226 (2014)CrossRefGoogle Scholar
  16. 16.
    Le Guennec, A., Malinowski, S., Tavenard, R.: Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data (2016)Google Scholar
  17. 17.
    Howard, A.G.: Some improvements on deep convolutional neural network based image classification. arXiv preprint arXiv:1312.5402 (2013)
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  19. 19.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS. Springer, New York (2009). Scholar
  20. 20.
    Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge (2012)Google Scholar
  21. 21.
    Breiman, L.: Arcing classifier (with discussion and a rejoinder by the author). Ann. Stat. 26(3), 801–849 (1998)CrossRefGoogle Scholar
  22. 22.
    Nielsen, D.: Tree Boosting With XGBoost-Why Does XGBoost Win Every Machine Learning Competition? MS thesis. NTNU (2016)Google Scholar
  23. 23.
    Iyengar, A.K., Squillante, M.S., Zhang, L.: Analysis and characterization of large-scale web server access patterns and performance. World Wide Web 2(1–2), 85–100 (1999)CrossRefGoogle Scholar
  24. 24.
    Arlitt, M., Jin, T.: A workload characterization study of the 1998 world cup web site. IEEE Netw. 14(3), 30–37 (2000)CrossRefGoogle Scholar
  25. 25.
    Jaeyeon, J., Krishnamurthy, B., Rabinovich, M.: Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites. In: Proceedings of the 11th International Conference on World Wide Web. ACM (2002)Google Scholar
  26. 26.
    Phillipa, G., Arlitt, M., Li, Z., Mahanti, A.: Youtube traffic characterization: a view from the edge. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pp. 15–28. ACM (2007)Google Scholar
  27. 27.
    Yu, S., Thapngam, T., Liu, J., Wei, S., Zhou, W.: Discriminating DDoS flows from flash crowds using information distance. In: Proceedings of the 3rd IEEE International Conference on Network and System Security (NSS 2009), 18–21 October 2009 (2009)Google Scholar
  28. 28.
    Thapngam, T., et al.: Discriminating DDoS attack traffic from flash crowd through packet arrival patterns. In: 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE (2011)Google Scholar
  29. 29.
    Chuan, X., Du, C., Kong, X.: An application layer DDoS real-time detection method in flash crowd. In: IACSIT Hong Kong Conferences, pp. 68–73 (2012)Google Scholar
  30. 30.
    Ni, T., Gu, X., Wang, H., Li, Y.: Real-time detection of application-layer DDoS attack using time series analysis. J. Control Sci. Eng. 2013, 4 (2013)CrossRefGoogle Scholar
  31. 31.
    Prasad, K.M., Munivara, K., Reddy, A.R.M., Rao, K.V.: Discriminating DDoS attack traffic from flash crowds on internet threat monitors (ITM) using entropy variations. Afr. J. Comput. ICT 6(2), 53 (2013)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2018

Authors and Affiliations

  • Amine Boukhtouta
    • 1
    Email author
  • Makan Pourzandi
    • 1
  • Richard Brunner
    • 2
  • Stéphane Dault
    • 3
  1. 1.Ericsson Security ResearchMontréalCanada
  2. 2.Ericsson Universal Delivery NetworkMontréalCanada
  3. 3.Ericsson Business Area Digital Services, R&D Security OperationsMontréalCanada

Personalised recommendations