Skip to main content

Multivariate network traffic analysis using clustered patterns

Abstract

Traffic analysis is a core element in network operations and management for various purposes including change detection, traffic prediction, and anomaly detection. In this paper, we introduce a new approach to online traffic analysis based on a pattern-based representation for high-level summarization of the traffic measurement data. Unlike the past online analysis techniques limited to a single variable to summarize (e.g., sketch), the focus of this study is on capturing the network state from the multivariate attributes under consideration. To this end, we employ clustering with its benefit of the aggregation of multidimensional variables. The clustered result represents the state of the network with regard to the monitored variables, which can also be compared with the observed patterns from previous time windows enabling intuitive analysis. We demonstrate the proposed method with two popular use cases, one for estimating state changes and the other for identifying anomalous states, to confirm its feasibility. Our extensive experimental results with public traces and collected monitoring measurements from ESnet traffic traces show that our pattern-based approach is effective for multivariate analysis of online network traffic with visual and quantitative tools.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    A flow is identified with five tuples of source IP address, source port number, destination IP address, destination port number, and protocol in TCP/IP header.

  2. 2.

    The details of DTNs are described at https://fasterdata.es.net/performance-testing/DTNs/.

References

  1. 1.

    Cisco white paper: Cisco vni forecast and methodology, 2015–2020. http://www.cisco.com/c/dam/en/us/solutions/collateral/service-provider/visual-networking-index-vni/complete-white-paper-c11-481360.pdf

  2. 2.

    Cho K, Fukuda K, Esaki H, Kato A (2008) Observing slow crustal movement in residential user traffic. In: Proceedings of the 2008 ACM conference on emerging network experiment and technology, CoNEXT 2008, Madrid, Spain, December 9–12, p 12

  3. 3.

    Tong D, Prasanna V (2016) High throughput sketch based online heavy hitter detection on FPGA. ACM SIGARCH Comput Archit News 43(4):70–75

    Article  Google Scholar 

  4. 4.

    Yu M, Jose L, Miao R (2013) Software defined traffic measurement with opensketch. In: Proceedings of the 10th USENIX conference on networked systems design and implementation, NSDI’13, pp 29–42

  5. 5.

    Liu Z, Manousis A, Vorsanger G, Sekar V, Braverman V (2016) One sketch to rule them all: Rethinking network flow monitoring with univmon. In: Proceedings of the 2016 conference on ACM SIGCOMM 2016 conference, Florianopolis, Brazil, August 22–26, pp 101–114

  6. 6.

    Li B, Springer J, Bebis G, Gunes MH (2013) Review: a survey of network flow applications. J Netw Comput Appl 36(2):567–581

    Article  Google Scholar 

  7. 7.

    Krishnamurthy B, Sen S, Zhang Y, Chen Y (2003) Sketch-based change detection: methods, evaluation, and applications. In: Proceedings of the 3rd ACM SIGCOMM conference on internet measurement, IMC ’03, pp 234–247

  8. 8.

    Choi J, Hu K, Sim A (2013) Relational dynamic bayesian networks with locally exchangeable measures. LBNL Technical Report, LBNL-6341E

  9. 9.

    Portnoy L, Eskin E, Stolfo S (2001) Intrusion detection with unlabeled data using clustering. In: Proceedings of ACM CSS workshop on data mining applied to security (DMSA), pp 5–8

  10. 10.

    Khan L, Awad M, Thuraisingham B (2007) A new intrusion detection system using support vector machines and hierarchical clustering. VLDB J 16(4):507–521

    Article  Google Scholar 

  11. 11.

    Leung K, Leckie C (2005) Unsupervised anomaly detection in network intrusion detection using clusters. In: Proceedings of the twenty-eighth Australasian conference on computer science, vol 38, ACSC ’05, pp 333–342

  12. 12.

    Garcia-Teodoro P, Díaz-Verdejo JE, Maciá-Fernández G, Vázquez E (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. Comput Secur 28(1–2):18–28

    Article  Google Scholar 

  13. 13.

    Dusi M, Este A, Gringoli F, Salgarelli L (2009) Using GMM and SVM-based techniques for the classification of SSH-encrypted traffic. In: Proceedings of IEEE international conference on communications, ICC, pp 1–6

  14. 14.

    KDD Cup 1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  15. 15.

    ESnet. https://www.es.net/

  16. 16.

    Iliofotou M, Pappu P, Faloutsos M, Mitzenmacher M, Singh S, Varghese G (2007) Network monitoring using traffic dispersion graphs (TDGS), IMC ’07, pp 315–320

  17. 17.

    Karagiannis T, Papagiannaki K, Faloutsos M (2005) Blinc: multilevel traffic classification in the dark. SIGCOMM Comput Commun Rev 35(4):229–240

    Article  Google Scholar 

  18. 18.

    Lee S, Kim H, Barman D, Lee S, Kim C, Kwon T, Choi Y (2011) Netramark: a network traffic classification benchmark. SIGCOMM Comput Commun Rev 41(1):22–30

    Article  Google Scholar 

  19. 19.

    Zhang H, Sun M, Yao DD, North C (2015) Visualizing traffic causality for analyzing network anomalies. In: Proceedings of the 2015 ACM international workshop on international workshop on security and privacy analytics, IWSPA ’15, New York, NY, USA, pp 37–42. ACM

  20. 20.

    Sivaraman V, Narayana S, Rottenstreich O, Muthukrishnan S, Rexford J (2017) Heavy-hitter detection entirely in the data plane. In: Proceedings of the symposium on SDN research, SOSR ’17, New York, NY, USA, pp 164–176. ACM

  21. 21.

    Das S, Antony S, Agrawal D, Abbadi AE (2009) Cots: a scalable framework for parallelizing frequency counting over data streams. In: IEEE international conference on data engineering (ICDE), pp 1323–1326

  22. 22.

    Guha S, Koudas N, Shim K (2001) Data-streams and histograms. In: ACM symposium on theory of computing, pp 471–475

  23. 23.

    Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. In: ACM-SIAM symposium on discrete algorithms, pp 635–644

  24. 24.

    Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  25. 25.

    Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB, pp 346–357

  26. 26.

    Papadimitriou S, Sun J, Faloutsos C (2007) Dimensionality reduction and forecasting on streams. Data Streams Models Algorithms 31:261–288

    Google Scholar 

  27. 27.

    Baek S, Kwon D, Kim J, Suh SC, Kim H, Kim I (2017) Unsupervised labeling for supervised anomaly detection in enterprise and cloud networks. In: 4th IEEE international conference on cyber security and cloud computing, CSCloud 2017, New York, NY, USA, June 26–28, pp 205–210

  28. 28.

    Fernandes G, Carvalho LF, Rodrigues JJPC, Proença ML (2016) Network anomaly detection using ip flows with principal component analysis and ant colony optimization. J Netw Comput Appl 64(C):1–11

    Article  Google Scholar 

  29. 29.

    Ahmed M, Mahmood AN, Hu J (2016) A survey of network anomaly detection techniques. J Netw Comput Appl 60:19–31

    Article  Google Scholar 

  30. 30.

    Qin T, Guan X, Li W, Wang P, Huang Q (2011) Monitoring abnormal network traffic based on blind source separation approach. J Netw Comput Appl 34(5):1732–1742

    Article  Google Scholar 

  31. 31.

    Li B, Liu P, Lin L (2016) A cluster-based intrusion detection framework for monitoring the traffic of cloud environments. In: 3rd IEEE international conference on cyber security and cloud computing, CSCloud 2016, Beijing, China, June 25–27, pp 42–45

  32. 32.

    Papalexakis EE, Beutel A, Steenkiste P (2012) Network anomaly detection using co-clustering. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining (ASONAM 2012), ASONAM ’12, pp 403–410

  33. 33.

    Jin L, Lee D, Sim A, Borgeson S, Wu K, Spurlock CA, Todd A (2017) Comparison of clustering techniques for residential energy behavior using smart meter data. In: AAAI workshops—artificial intelligence for smart grids and buildings, March 2017, San Francisco, CA

  34. 34.

    Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endow 5(7):622–633

    Article  Google Scholar 

  35. 35.

    Rgringoli F, Salgarelli L, Dusa M, Cascarano N, Risso F, Claffy K (2009) Gt: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comput Commun Rev 39(5):12–18

    Article  Google Scholar 

  36. 36.

    Kolmogorov-Smirnov Goodness-of-Fit Test. http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm

  37. 37.

    Justel A, Pena D, Zamar R (1997) A multivariate Kolmogorov–Smirnov test of goodness of fit. Stat Probab Lett 35:251–259

    MathSciNet  Article  MATH  Google Scholar 

  38. 38.

    O’Neilla TJ, Sterna SE (2012) Finite population corrections for the Kolmogorov–Smirnov tests. J Nonparametr Stat 24(2):497–504

    MathSciNet  Article  Google Scholar 

  39. 39.

    Mills-Tettey GA, Stentz A, Dias SMB (2007) The dynamic hungarian algorithm for the assignment problem with changing costs. Technical report, Carnegie Mellon University, East Lansing, Michigan

  40. 40.

    Dart E, Rotman L, Tierney B, Hester M, Zurawski J (2013) The science dmz: a network design pattern for data-intensive science. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, SC ’13, pp 85:1–85:10

  41. 41.

    Mellia M, Cigno RL, Neri F (2005) Measuring IP and TCP behavior on edge nodes with tstat. Comput Netw 47(1):1–21

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP) (No. 2016-0-00078, Cloud based Security Intelligence Technology Development for the Customized Security Service Provisioning).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jinoh Kim.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, J., Sim, A., Tierney, B. et al. Multivariate network traffic analysis using clustered patterns. Computing 101, 339–361 (2019). https://doi.org/10.1007/s00607-018-0619-4

Download citation

Keywords

  • Network traffic analysis
  • Clustered patterns
  • Change detection
  • Anomaly detection
  • Multivariate analysis

Mathematics Subject Classification

  • 68Uxx Computing methodologies and applications