A New Approach to Multivariate Network Traffic Analysis
- 20 Downloads
Network traffic analysis is one of the core functions in network monitoring for effective network operations and management. While online traffic analysis has been widely studied, it is still intensively challenging due to several reasons. One of the primary challenges is the heavy volume of traffic to analyze within a finite amount of time due to the increasing network bandwidth. Another important challenge for effective traffic analysis is to support multivariate functions of traffic variables to help administrators identify unexpected network events intuitively. To this end, we propose a new approach with the multivariate analysis that offers a high-level summary of the online network traffic. With this approach, the current state of the network will display patterns compiled from a set of traffic variables, and the detection problems in network monitoring (e.g., change detection and anomaly detection) can be reduced to a pattern identification and classification problem. In this paper, we introduce our preliminary work with clustered patterns for online, multivariate network traffic analysis with the challenges and limitations we observed. We then present a grid-based model that is designed to overcome the limitations of the clustered pattern-based technique. We will discuss the potential of the new model with respect to the technical challenges including streaming-based computation and robustness to outliers.
Keywordsnetwork traffic analysis multivariate analysis time-series similarity network monitoring
Unable to display preview. Download preview PDF.
The authors would like to thank Brian Tierney at ESnet for the helpful discussion and support with the network traffic trace data.
- Liu D P, Zhao Y J, Xu H W Sun Y Q, Pei D, Luo J, Jing X W, Feng M. Opprentice: Towards practical and automatic anomaly detection through machine learning. In Proc. the 2015 ACM Internet Measurement Conference, October 2015, pp.211-224.Google Scholar
- Krishnamurthy B, Sen S Zhang Y, Chen Y. Sketch-based change detection: Methods, evaluation, and applications. In Proc. the 3rd ACM SIGCOMM Conference on Internet Measurement, October 2003, pp.234-247.Google Scholar
- Choi J, Hu K J, Sim A. Relational dynamic Bayesian networks with locally exchangeable measures. Technical Report LBNL-6341E, Lawrence Berkeley National Laboratory, 2013. https://www.osti.gov/servlets/purl/1165582, November 2018.
- Yu M L, Jose L, Miao R. Software defined traffic measurement with OpenSketch. In Proc. the 10th USENIX Conference on Networked Systems Design and Implementation, April 2013, pp.29-42.Google Scholar
- Cho K, Fukuda K, Esaki H, Kato A. Observing slow crustal movement in residential user traffic. In Proc. the 2008 ACM Conference on Emerging Network Experiment and Technology, December 2008, Article No. 12.Google Scholar
- Schweller R, Gupta A, Parsons E, Chen Y. Reversible sketches for efficient and accurate change detection over network data streams. In Proc. the 4th ACM SIGCOMM Conference on Internet Measurement, Oct. 2004, pp.207-212.Google Scholar
- Liu Z X, Manousis A, Vorsanger G, Sekar V, Braverman V. One sketch to rule them all: Rethinking network flow monitoring with UnivMon. In Proc. the 2016 ACM SIGCOMM Conference, August 2016, pp.101-114.Google Scholar
- Kim J, Sim A. A new approach to online, multivariate network traffic analysis. In Proc. the 26th International Conference on Computer Communications and Networks, July 2017.Google Scholar
- Manku G S, Motwani R. Approximate frequency counts over data streams. In Proc. the 28th International Conference on Very Large Data Bases, August 2002, pp.346-357.Google Scholar
- Das S, Antony S, Agrawal D, Abbadi A E. CoTS: A scalable framework for parallelizing frequency counting over data streams. In Proc. the 25th IEEE International Conference on Data Engineering, March 2009, pp.1323-1326.Google Scholar
- Guha S, Koudas N, Shim K. Data-streams and histograms. In Proc. the 33rd Annual ACM Symposium on Theory of Computing, July 2001, pp.471-475.Google Scholar
- Aggarwal C, Han J, Wang J, Yu P. A framework for clustering evolving data streams. In Proc. the 29th International Conference on Very Large Data Bases, September 2003, pp.81-92.Google Scholar
- Domingos P, Hulten G. A general method for scaling up machine learning algorithms and its application to clustering. In Proc. the 8th International Conference on Machine Learning, June 2001, pp.106-113.Google Scholar
- Guha S, Mishra N, Motwani R, O’Callaghan L. Clustering data streams. In Proc. the 41st Annual Symposium on Foundations of Computer Science, November 2000, pp.356-366.Google Scholar
- Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In Proc. the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2002, pp.635-644.Google Scholar
- Matias Y, Vitter J S, Wang M. Wavelet-based histograms for selectivity estimation. In Proc. the 1998 ACM SIGMOD International Conference on Management of Data, June 1998, pp.448-459.Google Scholar
- Vitter J S, Wang M. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. the 1999 ACM SIGMOD International Conference on Management of Data, June 1999, pp.193-204.Google Scholar
- Keogh E, Chakrabarti K, Pazzani M, Mehrotra S. Locally adaptive dimensionality reduction for indexing large time series databases. In Proc. the 2001 ACM SIGMOD International Conference on Management of Data, May 2001, pp.151-162.Google Scholar
- Papadimitriou S, Sun J, Faloutsos C. Dimensionality reduction and forecasting on streams. In Data Streams, Models and Algorithms, Aggarwal C C (ed.), Springer, 2007, pp.261-288.Google Scholar
- Iliofotou M, Pappu P, Faloutsos M, Mitzenmacher M, Singh S, Varghese G. Network monitoring using traffic dispersion graphs. In Proc. the 7th ACM SIGCOMM Conference on Internet Measurement, October 2007, pp.315-320.Google Scholar
- Kim J, Sim A, Suh S, Kim I. An approach to online network monitoring using clustered patterns. In Proc. the 2007 International Conference on Computing, Networking and Communication, January 2017, pp.656-661.Google Scholar
- Mills-Tettey A, Stentz A, Dias S B. The dynamic Hungarian algorithm for the assignment problem with changing costs. Technical Report, Carnegie Mellon University, 2007. https://www.ri.cmu.edu/pub_files/pub4/mills_tettey_g_ayorkor_2007_3/mills_tettey_g_ayorkor_2007_3.pdf, November 2018.
- Dusi M, Este A, Gringoli F, Salgarelli L. Using GMM and SVM-based techniques for the classification of SSH-encrypted traffic. In Proc. IEEE International Conference on Communications, June 2009.Google Scholar
- Rgringoli F, Salgarelli L, Dusa M, Cascarano N, Risso F, Claffy K. GT: Picking up the truth from the ground for internet traffic. ACM SIGCOMM Computer Communication Review, 2009 39(5): 13-18.Google Scholar
- Fontugne R, Borgnat P, Abry P, Fukuda K. MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In Proc. the 2010 ACM Conference on Emerging Networking Experiments and Technology, November 2010, Article No. 8.Google Scholar
- Estan C, Keys K, Moore D, Varghese G. Building a better NetFlow. In Proc. the 2004 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 2004, pp.245-256.Google Scholar
- Wang M, Li B C, Li Z P. sFlow: Towards resource-efficient and agile service federation in service overlay networks. In Proc. the 24th International Conference on Distributed Computing Systems, March 2004, pp.628-635.Google Scholar
- Schikuta E. Grid-clustering: A fast hierarchical clustering method for very large data sets. Technical Report, Rice University, 1993. https://www.researchgate.net/publication/210242098_Grid-Clustering_An_efficient_hierarchical_Clustering_method_for_very_large_data_sets, November 2018.
- Kim J, Yoo W, Sim A, Suh S, Kim I. A lightweight network anomaly detection technique. In Proc. the International Workshop on Computing, Networking and Communications, January 2017, pp.896-900.Google Scholar
- Tavallaee M, Bagheri E, Lu W, Ghorbani A A. A detailed analysis of the KDD CUP 99 data set. In Proc. the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, July 2009, Article No. 38.Google Scholar
- Glazer A, Lindenbaum M, Markovitch S. q-OCSVM: A q-quantile estimator for high-dimensional distributions. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2013, pp.503-511.Google Scholar
- Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graph. 2015, 34(4): Article No. 66.Google Scholar
- Seguy V, Cuturi M. Principal geodesic analysis for probability measures under the optimal transport metric. In Proc. the 2015 Annual Conference on Neural Information Processing Systems, December 2015, pp.3312-3320.Google Scholar