Advertisement

Journal of Computer Science and Technology

, Volume 34, Issue 2, pp 388–402 | Cite as

A New Approach to Multivariate Network Traffic Analysis

  • Jinoh KimEmail author
  • Alex Sim
Regular Paper
  • 12 Downloads

Abstract

Network traffic analysis is one of the core functions in network monitoring for effective network operations and management. While online traffic analysis has been widely studied, it is still intensively challenging due to several reasons. One of the primary challenges is the heavy volume of traffic to analyze within a finite amount of time due to the increasing network bandwidth. Another important challenge for effective traffic analysis is to support multivariate functions of traffic variables to help administrators identify unexpected network events intuitively. To this end, we propose a new approach with the multivariate analysis that offers a high-level summary of the online network traffic. With this approach, the current state of the network will display patterns compiled from a set of traffic variables, and the detection problems in network monitoring (e.g., change detection and anomaly detection) can be reduced to a pattern identification and classification problem. In this paper, we introduce our preliminary work with clustered patterns for online, multivariate network traffic analysis with the challenges and limitations we observed. We then present a grid-based model that is designed to overcome the limitations of the clustered pattern-based technique. We will discuss the potential of the new model with respect to the technical challenges including streaming-based computation and robustness to outliers.

Keywords

network traffic analysis multivariate analysis time-series similarity network monitoring 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgment

The authors would like to thank Brian Tierney at ESnet for the helpful discussion and support with the network traffic trace data.

Supplementary material

11390_2019_1915_MOESM1_ESM.pdf (769 kb)
ESM 1 (PDF 769 kb)

References

  1. [1]
    Liu D P, Zhao Y J, Xu H W Sun Y Q, Pei D, Luo J, Jing X W, Feng M. Opprentice: Towards practical and automatic anomaly detection through machine learning. In Proc. the 2015 ACM Internet Measurement Conference, October 2015, pp.211-224.Google Scholar
  2. [2]
    Krishnamurthy B, Sen S Zhang Y, Chen Y. Sketch-based change detection: Methods, evaluation, and applications. In Proc. the 3rd ACM SIGCOMM Conference on Internet Measurement, October 2003, pp.234-247.Google Scholar
  3. [3]
    Choi J, Hu K J, Sim A. Relational dynamic Bayesian networks with locally exchangeable measures. Technical Report LBNL-6341E, Lawrence Berkeley National Laboratory, 2013. https://www.osti.gov/servlets/purl/1165582, November 2018.
  4. [4]
    Yu M L, Jose L, Miao R. Software defined traffic measurement with OpenSketch. In Proc. the 10th USENIX Conference on Networked Systems Design and Implementation, April 2013, pp.29-42.Google Scholar
  5. [5]
    Cho K, Fukuda K, Esaki H, Kato A. Observing slow crustal movement in residential user traffic. In Proc. the 2008 ACM Conference on Emerging Network Experiment and Technology, December 2008, Article No. 12.Google Scholar
  6. [6]
    Schweller R, Gupta A, Parsons E, Chen Y. Reversible sketches for efficient and accurate change detection over network data streams. In Proc. the 4th ACM SIGCOMM Conference on Internet Measurement, Oct. 2004, pp.207-212.Google Scholar
  7. [7]
    Liu Z X, Manousis A, Vorsanger G, Sekar V, Braverman V. One sketch to rule them all: Rethinking network flow monitoring with UnivMon. In Proc. the 2016 ACM SIGCOMM Conference, August 2016, pp.101-114.Google Scholar
  8. [8]
    Kim J, Sim A. A new approach to online, multivariate network traffic analysis. In Proc. the 26th International Conference on Computer Communications and Networks, July 2017.Google Scholar
  9. [9]
    Manku G S, Motwani R. Approximate frequency counts over data streams. In Proc. the 28th International Conference on Very Large Data Bases, August 2002, pp.346-357.Google Scholar
  10. [10]
    Das S, Antony S, Agrawal D, Abbadi A E. CoTS: A scalable framework for parallelizing frequency counting over data streams. In Proc. the 25th IEEE International Conference on Data Engineering, March 2009, pp.1323-1326.Google Scholar
  11. [11]
    Das S, Antony S, Agrawal D, Abbadi A E. Thread cooperation in multicore architectures for frequency counting over multiple data streams. Proceedings of the VLDB Endowment, 2009, 2(1): 217-228.CrossRefGoogle Scholar
  12. [12]
    Guha S, Koudas N, Shim K. Data-streams and histograms. In Proc. the 33rd Annual ACM Symposium on Theory of Computing, July 2001, pp.471-475.Google Scholar
  13. [13]
    Aggarwal C, Han J, Wang J, Yu P. A framework for clustering evolving data streams. In Proc. the 29th International Conference on Very Large Data Bases, September 2003, pp.81-92.Google Scholar
  14. [14]
    Domingos P, Hulten G. A general method for scaling up machine learning algorithms and its application to clustering. In Proc. the 8th International Conference on Machine Learning, June 2001, pp.106-113.Google Scholar
  15. [15]
    Guha S, Mishra N, Motwani R, O’Callaghan L. Clustering data streams. In Proc. the 41st Annual Symposium on Foundations of Computer Science, November 2000, pp.356-366.Google Scholar
  16. [16]
    Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 515-528.CrossRefGoogle Scholar
  17. [17]
    Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In Proc. the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2002, pp.635-644.Google Scholar
  18. [18]
    Matias Y, Vitter J S, Wang M. Wavelet-based histograms for selectivity estimation. In Proc. the 1998 ACM SIGMOD International Conference on Management of Data, June 1998, pp.448-459.Google Scholar
  19. [19]
    Vitter J S, Wang M. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. the 1999 ACM SIGMOD International Conference on Management of Data, June 1999, pp.193-204.Google Scholar
  20. [20]
    Keogh E, Chakrabarti K, Pazzani M, Mehrotra S. Locally adaptive dimensionality reduction for indexing large time series databases. In Proc. the 2001 ACM SIGMOD International Conference on Management of Data, May 2001, pp.151-162.Google Scholar
  21. [21]
    Papadimitriou S, Sun J, Faloutsos C. Dimensionality reduction and forecasting on streams. In Data Streams, Models and Algorithms, Aggarwal C C (ed.), Springer, 2007, pp.261-288.Google Scholar
  22. [22]
    Lee S, Kim H, Barman D, Lee S, Kim C K, Kwon T, Choi Y. NeTraMark: A network traffic classification benchmark. SIGCOMM Comput. Commun. Rev., 2011, 41(1): 22-30.CrossRefGoogle Scholar
  23. [23]
    Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: Multilevel traffic classification in the dark. SIGCOMM Comput. Commun. Rev., 2005, 35(4): 229-240.CrossRefGoogle Scholar
  24. [24]
    Iliofotou M, Pappu P, Faloutsos M, Mitzenmacher M, Singh S, Varghese G. Network monitoring using traffic dispersion graphs. In Proc. the 7th ACM SIGCOMM Conference on Internet Measurement, October 2007, pp.315-320.Google Scholar
  25. [25]
    Kim J, Sim A, Suh S, Kim I. An approach to online network monitoring using clustered patterns. In Proc. the 2007 International Conference on Computing, Networking and Communication, January 2017, pp.656-661.Google Scholar
  26. [26]
    Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S. Scalable k-means++. Proceedings of the VLDB Endowment, 2012, 5(7): 622-633.CrossRefGoogle Scholar
  27. [27]
    Mills-Tettey A, Stentz A, Dias S B. The dynamic Hungarian algorithm for the assignment problem with changing costs. Technical Report, Carnegie Mellon University, 2007. https://www.ri.cmu.edu/pub_files/pub4/mills_tettey_g_ayorkor_2007_3/mills_tettey_g_ayorkor_2007_3.pdf, November 2018.
  28. [28]
    Dusi M, Este A, Gringoli F, Salgarelli L. Using GMM and SVM-based techniques for the classification of SSH-encrypted traffic. In Proc. IEEE International Conference on Communications, June 2009.Google Scholar
  29. [29]
    Rgringoli F, Salgarelli L, Dusa M, Cascarano N, Risso F, Claffy K. GT: Picking up the truth from the ground for internet traffic. ACM SIGCOMM Computer Communication Review, 2009 39(5): 13-18.Google Scholar
  30. [30]
    Fontugne R, Borgnat P, Abry P, Fukuda K. MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In Proc. the 2010 ACM Conference on Emerging Networking Experiments and Technology, November 2010, Article No. 8.Google Scholar
  31. [31]
    Estan C, Keys K, Moore D, Varghese G. Building a better NetFlow. In Proc. the 2004 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 2004, pp.245-256.Google Scholar
  32. [32]
    Wang M, Li B C, Li Z P. sFlow: Towards resource-efficient and agile service federation in service overlay networks. In Proc. the 24th International Conference on Distributed Computing Systems, March 2004, pp.628-635.Google Scholar
  33. [33]
    Schikuta E. Grid-clustering: A fast hierarchical clustering method for very large data sets. Technical Report, Rice University, 1993. https://www.researchgate.net/publication/210242098_Grid-Clustering_An_efficient_hierarchical_Clustering_method_for_very_large_data_sets, November 2018.
  34. [34]
    Kim J, Yoo W, Sim A, Suh S, Kim I. A lightweight network anomaly detection technique. In Proc. the International Workshop on Computing, Networking and Communications, January 2017, pp.896-900.Google Scholar
  35. [35]
    Tavallaee M, Bagheri E, Lu W, Ghorbani A A. A detailed analysis of the KDD CUP 99 data set. In Proc. the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, July 2009, Article No. 38.Google Scholar
  36. [36]
    Glazer A, Lindenbaum M, Markovitch S. q-OCSVM: A q-quantile estimator for high-dimensional distributions. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2013, pp.503-511.Google Scholar
  37. [37]
    Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graph. 2015, 34(4): Article No. 66.Google Scholar
  38. [38]
    Seguy V, Cuturi M. Principal geodesic analysis for probability measures under the optimal transport metric. In Proc. the 2015 Annual Conference on Neural Information Processing Systems, December 2015, pp.3312-3320.Google Scholar
  39. [39]
    Mellia M, Cigno R L, Neri F. Measuring IP and TCP behavior on edge nodes with Tstat. Comput. Netw., 2005, 47(1): 1-21.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC & Science Press, China 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceTexas A&M UniversityCommerceU.S.A.
  2. 2.Computational Research DivisionLawrence Berkeley National LaboratoryBerkeleyU.S.A.

Personalised recommendations