, Volume 101, Issue 4, pp 339–361 | Cite as

Multivariate network traffic analysis using clustered patterns

  • Jinoh KimEmail author
  • Alex Sim
  • Brian Tierney
  • Sang Suh
  • Ikkyun Kim


Traffic analysis is a core element in network operations and management for various purposes including change detection, traffic prediction, and anomaly detection. In this paper, we introduce a new approach to online traffic analysis based on a pattern-based representation for high-level summarization of the traffic measurement data. Unlike the past online analysis techniques limited to a single variable to summarize (e.g., sketch), the focus of this study is on capturing the network state from the multivariate attributes under consideration. To this end, we employ clustering with its benefit of the aggregation of multidimensional variables. The clustered result represents the state of the network with regard to the monitored variables, which can also be compared with the observed patterns from previous time windows enabling intuitive analysis. We demonstrate the proposed method with two popular use cases, one for estimating state changes and the other for identifying anomalous states, to confirm its feasibility. Our extensive experimental results with public traces and collected monitoring measurements from ESnet traffic traces show that our pattern-based approach is effective for multivariate analysis of online network traffic with visual and quantitative tools.


Network traffic analysis Clustered patterns Change detection Anomaly detection Multivariate analysis 

Mathematics Subject Classification

68Uxx Computing methodologies and applications 



This work was supported in part by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP) (No. 2016-0-00078, Cloud based Security Intelligence Technology Development for the Customized Security Service Provisioning).


  1. 1.
  2. 2.
    Cho K, Fukuda K, Esaki H, Kato A (2008) Observing slow crustal movement in residential user traffic. In: Proceedings of the 2008 ACM conference on emerging network experiment and technology, CoNEXT 2008, Madrid, Spain, December 9–12, p 12Google Scholar
  3. 3.
    Tong D, Prasanna V (2016) High throughput sketch based online heavy hitter detection on FPGA. ACM SIGARCH Comput Archit News 43(4):70–75CrossRefGoogle Scholar
  4. 4.
    Yu M, Jose L, Miao R (2013) Software defined traffic measurement with opensketch. In: Proceedings of the 10th USENIX conference on networked systems design and implementation, NSDI’13, pp 29–42Google Scholar
  5. 5.
    Liu Z, Manousis A, Vorsanger G, Sekar V, Braverman V (2016) One sketch to rule them all: Rethinking network flow monitoring with univmon. In: Proceedings of the 2016 conference on ACM SIGCOMM 2016 conference, Florianopolis, Brazil, August 22–26, pp 101–114Google Scholar
  6. 6.
    Li B, Springer J, Bebis G, Gunes MH (2013) Review: a survey of network flow applications. J Netw Comput Appl 36(2):567–581CrossRefGoogle Scholar
  7. 7.
    Krishnamurthy B, Sen S, Zhang Y, Chen Y (2003) Sketch-based change detection: methods, evaluation, and applications. In: Proceedings of the 3rd ACM SIGCOMM conference on internet measurement, IMC ’03, pp 234–247Google Scholar
  8. 8.
    Choi J, Hu K, Sim A (2013) Relational dynamic bayesian networks with locally exchangeable measures. LBNL Technical Report, LBNL-6341EGoogle Scholar
  9. 9.
    Portnoy L, Eskin E, Stolfo S (2001) Intrusion detection with unlabeled data using clustering. In: Proceedings of ACM CSS workshop on data mining applied to security (DMSA), pp 5–8Google Scholar
  10. 10.
    Khan L, Awad M, Thuraisingham B (2007) A new intrusion detection system using support vector machines and hierarchical clustering. VLDB J 16(4):507–521CrossRefGoogle Scholar
  11. 11.
    Leung K, Leckie C (2005) Unsupervised anomaly detection in network intrusion detection using clusters. In: Proceedings of the twenty-eighth Australasian conference on computer science, vol 38, ACSC ’05, pp 333–342Google Scholar
  12. 12.
    Garcia-Teodoro P, Díaz-Verdejo JE, Maciá-Fernández G, Vázquez E (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. Comput Secur 28(1–2):18–28CrossRefGoogle Scholar
  13. 13.
    Dusi M, Este A, Gringoli F, Salgarelli L (2009) Using GMM and SVM-based techniques for the classification of SSH-encrypted traffic. In: Proceedings of IEEE international conference on communications, ICC, pp 1–6Google Scholar
  14. 14.
  15. 15.
  16. 16.
    Iliofotou M, Pappu P, Faloutsos M, Mitzenmacher M, Singh S, Varghese G (2007) Network monitoring using traffic dispersion graphs (TDGS), IMC ’07, pp 315–320Google Scholar
  17. 17.
    Karagiannis T, Papagiannaki K, Faloutsos M (2005) Blinc: multilevel traffic classification in the dark. SIGCOMM Comput Commun Rev 35(4):229–240CrossRefGoogle Scholar
  18. 18.
    Lee S, Kim H, Barman D, Lee S, Kim C, Kwon T, Choi Y (2011) Netramark: a network traffic classification benchmark. SIGCOMM Comput Commun Rev 41(1):22–30CrossRefGoogle Scholar
  19. 19.
    Zhang H, Sun M, Yao DD, North C (2015) Visualizing traffic causality for analyzing network anomalies. In: Proceedings of the 2015 ACM international workshop on international workshop on security and privacy analytics, IWSPA ’15, New York, NY, USA, pp 37–42. ACMGoogle Scholar
  20. 20.
    Sivaraman V, Narayana S, Rottenstreich O, Muthukrishnan S, Rexford J (2017) Heavy-hitter detection entirely in the data plane. In: Proceedings of the symposium on SDN research, SOSR ’17, New York, NY, USA, pp 164–176. ACMGoogle Scholar
  21. 21.
    Das S, Antony S, Agrawal D, Abbadi AE (2009) Cots: a scalable framework for parallelizing frequency counting over data streams. In: IEEE international conference on data engineering (ICDE), pp 1323–1326Google Scholar
  22. 22.
    Guha S, Koudas N, Shim K (2001) Data-streams and histograms. In: ACM symposium on theory of computing, pp 471–475Google Scholar
  23. 23.
    Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. In: ACM-SIAM symposium on discrete algorithms, pp 635–644Google Scholar
  24. 24.
    Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  25. 25.
    Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB, pp 346–357Google Scholar
  26. 26.
    Papadimitriou S, Sun J, Faloutsos C (2007) Dimensionality reduction and forecasting on streams. Data Streams Models Algorithms 31:261–288Google Scholar
  27. 27.
    Baek S, Kwon D, Kim J, Suh SC, Kim H, Kim I (2017) Unsupervised labeling for supervised anomaly detection in enterprise and cloud networks. In: 4th IEEE international conference on cyber security and cloud computing, CSCloud 2017, New York, NY, USA, June 26–28, pp 205–210Google Scholar
  28. 28.
    Fernandes G, Carvalho LF, Rodrigues JJPC, Proença ML (2016) Network anomaly detection using ip flows with principal component analysis and ant colony optimization. J Netw Comput Appl 64(C):1–11CrossRefGoogle Scholar
  29. 29.
    Ahmed M, Mahmood AN, Hu J (2016) A survey of network anomaly detection techniques. J Netw Comput Appl 60:19–31CrossRefGoogle Scholar
  30. 30.
    Qin T, Guan X, Li W, Wang P, Huang Q (2011) Monitoring abnormal network traffic based on blind source separation approach. J Netw Comput Appl 34(5):1732–1742CrossRefGoogle Scholar
  31. 31.
    Li B, Liu P, Lin L (2016) A cluster-based intrusion detection framework for monitoring the traffic of cloud environments. In: 3rd IEEE international conference on cyber security and cloud computing, CSCloud 2016, Beijing, China, June 25–27, pp 42–45Google Scholar
  32. 32.
    Papalexakis EE, Beutel A, Steenkiste P (2012) Network anomaly detection using co-clustering. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining (ASONAM 2012), ASONAM ’12, pp 403–410Google Scholar
  33. 33.
    Jin L, Lee D, Sim A, Borgeson S, Wu K, Spurlock CA, Todd A (2017) Comparison of clustering techniques for residential energy behavior using smart meter data. In: AAAI workshops—artificial intelligence for smart grids and buildings, March 2017, San Francisco, CAGoogle Scholar
  34. 34.
    Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endow 5(7):622–633CrossRefGoogle Scholar
  35. 35.
    Rgringoli F, Salgarelli L, Dusa M, Cascarano N, Risso F, Claffy K (2009) Gt: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comput Commun Rev 39(5):12–18CrossRefGoogle Scholar
  36. 36.
    Kolmogorov-Smirnov Goodness-of-Fit Test.
  37. 37.
    Justel A, Pena D, Zamar R (1997) A multivariate Kolmogorov–Smirnov test of goodness of fit. Stat Probab Lett 35:251–259MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    O’Neilla TJ, Sterna SE (2012) Finite population corrections for the Kolmogorov–Smirnov tests. J Nonparametr Stat 24(2):497–504MathSciNetCrossRefGoogle Scholar
  39. 39.
    Mills-Tettey GA, Stentz A, Dias SMB (2007) The dynamic hungarian algorithm for the assignment problem with changing costs. Technical report, Carnegie Mellon University, East Lansing, MichiganGoogle Scholar
  40. 40.
    Dart E, Rotman L, Tierney B, Hester M, Zurawski J (2013) The science dmz: a network design pattern for data-intensive science. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, SC ’13, pp 85:1–85:10Google Scholar
  41. 41.
    Mellia M, Cigno RL, Neri F (2005) Measuring IP and TCP behavior on edge nodes with tstat. Comput Netw 47(1):1–21CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Texas A&M UniversityCommerceUSA
  2. 2.Lawrence Berkeley National LaboratoryBerkeleyUSA
  3. 3.ESnetBerkeleyUSA
  4. 4.ETRIDaejeonKorea

Personalised recommendations