NEtwork Digest Analysis Driven by Association Rule Discoverers
Abstract
An important issue in network traffic analysis is to profile communications, detect anomalies or security threats, and identify recurrent patterns. To these aims, the analysis could be performed on: (i) Packet payloads, (ii) traffic metrics, and (iii) statistical features computed on traffic flows. Data mining techniques play an important role in network traffic domain, where association rules are successfully exploited for anomaly identification and network traffic characterization. However, to discover (potentially relevant) knowledge a very low support threshold needs to be enforced, thus generating a large number of unmanageable rules. To address this issue, efficient techniques to reduce traffic volume and to efficiently discover relevant knowledge are needed. This paper presents a NEtwork Digest framework, named NED, to efficiently support network traffic analysis. NED exploits continuous queries to perform real-time aggregation of captured network data and supports filtering operations to further reduce traffic volume focusing on relevant data. Furthermore, NED exploits two efficient algorithms to discover both traditional and generalized association rules. Extracted knowledge provides a high level abstraction of the network traffic by highlighting unexpected and interesting traffic rules. Experimental results performed on different network dumps showed the efficiency and effectiveness of the NED framework to characterize traffic data and detect anomalies.
Keywords
Association Rule Intrusion Detection Network Traffic Association Rule Mining Support ThresholdPreview
Unable to display preview. Download preview PDF.
References
- 1.Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)Google Scholar
- 2.Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. The VLDB Journal, The International Journal on Very Large DataBases 15(2), 121–142 (2006)CrossRefGoogle Scholar
- 3.Babu, S., Widom, J.: Continuous queries over data streams. ACM SIGMOD Record 30(3), 109–120 (2001)CrossRefGoogle Scholar
- 4.Baldi, M., Baralis, E., Risso, F.: Dipt. di Autom. e Inf. Data mining techniques for effective and scalable traffic analysis. In: 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, IM 2005, pp. 105–118 (2005)Google Scholar
- 5.Goethals, B.: Frequent Pattern Mining Implementations, http://www.adrem.ua.ac.be/~goethals/software
- 6.Burn-Thornton, K., Garibaldi, J., Mahdi, A.: Pro-active network management using data mining. In: Global Telecommunications Conference, GLOBECOM 1998, vol. 2 (1998)Google Scholar
- 7.Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: MineNet 2006, pp. 281–286. ACM Press, New York (2006)CrossRefGoogle Scholar
- 8.
- 9.Guan, Y., Ghorbani, A., Belacel, N.: Y-Means: A clustering method for intrusion detection. In: Proceedings of Canadian Conference on Electrical and Computer Engineering, pp. 4–7 (2003)Google Scholar
- 10.Harris, B., Hunt, R.: TCP/IP security threats and attack methods. Computer Communications 22(10), 885–897 (1999)CrossRefGoogle Scholar
- 11.Hossain, M., Bridges, S., Vaughn Jr., R.: Adaptive intrusion detection with data mining. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 4 (2003)Google Scholar
- 12.Han, J., Kamber, M.: Data Mining: Concepts and Techniques. In: Gray, J. (ed.) The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann Publishers, San Francisco (August 2000)Google Scholar
- 13.Karagiannis, T., Papagiannaki, K., Faloutsos, M.: Blinc: multilevel traffic classification in the dark. In: SIGCOMM, pp. 229–240 (2005)Google Scholar
- 14.Le, F., Lee, S., Wong, T., Kim, H.S., Newcomb, D.: Minerals: using data mining to detect router misconfigurations. In: MineNet 2006, pp. 293–298. ACM Press, New York (2006)CrossRefGoogle Scholar
- 15.Lee, W., Stolfo, S.: A framework for construction features and models for intrusion detection systems. ACM Transactions on Information and System Security (TISSEC) 3(4), 227–261 (2000)CrossRefGoogle Scholar
- 16.Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In: SIGMETRICS 2005, pp. 50–60. ACM Press, New York (2005)CrossRefGoogle Scholar
- 17.NetGroup, Politecnico di Torino. Analyzer 3.0, http://analyzer.polito.it/30alpha/
- 18.Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proceedings of ACM CSS Workshop on Data Mining Applied to Security, PA (November 2001)Google Scholar
- 19.The SANS Institute. Port 4662 details, http://isc.sans.org/port.html?port=4662
- 20.Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI (2004)Google Scholar
- 21.Wang, Q., Megalooikonomu, V.: A clustering algorithm for intrusion detection. Proc. SPIE 5812, 31–38 (2005)CrossRefGoogle Scholar
- 22.Yang, Q., Zhang, H.: Web-log mining for predictive Web caching. IEEE Transactions on Knowledge and Data Engineering 15(4), 1050–1053 (2003)CrossRefGoogle Scholar
- 23.Knobbe, A., Van der Wallen, D., Lewis, L.: Experiments with data mining in enterprise management. In: Proceedings of the Sixth IFIP/IEEE International Symposium on Distributed Management for the Networked Millennium, Integrated Network Management, pp. 353–366 (1999)Google Scholar
- 24.Bianco, A., Mardente, G., Mellia, M., Munafo, M., Muscariello, L.: Web User Session Characterization via Clustering techniques. In: GLOBECOM, New York, vol. 2, p. 1102 (2005)Google Scholar
- 25.Duffield, N.G., Grossglauser, M.: Trajectory sampling for direct traffic observation. IEEE/ACM Trans. Netw. 9(3), 280–292 (2001)CrossRefGoogle Scholar
- 26.Lee, I., Fapojuwo, A.: Data Mining Network Traffic. In: Canadian Conference on Electrical and Computer Engineering (2006)Google Scholar
- 27.Roesch, M.: Snort–Lightweight intrusion detection for networks. In: Proceeding of the 13th Systems Administration Conference, LISA 1999, pp. 299–238 (1999)Google Scholar
- 28.Lee, W., Stolfo, S., Mok, K.: A data mining framework for building intrusion detection models. In: IEEE Symposium on Security and Privacy, vol. 132 (1999)Google Scholar
- 29.Lee, W., Stolfo, S.: Data mining approaches for intrusion detection. In: Proceedings of the 7th USENIX Security Symposium, vol. 1, pp. 26–29 (1998)Google Scholar
- 30.World Wide Web Consortium. eXtensible Markup Language, http://www.w3.org/XML
- 31.Baralis, E., Cerquitelli, T., D’Elia, V.: Generalized itemset discovery by means of opportunistic aggregation. Technical report, Politecnico di Torino (2008), https://dbdmg.polito.it/twiki/bin/view/Public/NetworkTrafficAnalysis
- 32.Han, J., Fu, Y.: Mining multiple-level association rules in large databases. IEEE Trans. Knowl. Data Eng. 11(5), 798–804 (1999)CrossRefGoogle Scholar
- 33.Naguyen, T., Armitage, G.: A Survey of Techniques for Internet Traffic Classification Using Machine Learning. In: IEEE Communications Surveys and Tutorials 2008 (October 2008)Google Scholar
- 34.Haffner, P., Sen, S., Spatscheck, O., Wang, D.: ACAS: automated construction of application signatures. In: Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data, pp. 197–202. ACM, New York (2005)CrossRefGoogle Scholar
- 35.Auld, T., Moore, A., Gull, S.: Bayesian Neural Networks for Internet Traffic Classification. IEEE Trans. on Neural Networks 18(1), 223 (2007)CrossRefGoogle Scholar
- 36.Bernaille, L., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. ACM SIGCOMM Computer Communication Review 36(2), 23–26 (2006)CrossRefGoogle Scholar
- 37.McGregor, A., Hall, M., Lorier, P., Brunskill, J.: Flow Clustering Using Machine Learning Techniques. LNCS, pp. 205–214. Springer, Heidelberg (2004)Google Scholar
- 38.Internet Assigned Numbers Authority (IANA). Port Numbers, http://www.iana.org/assignments/port-numbers
- 39.Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2005)Google Scholar
- 40.NetGroup, Politecnico di Torino. Analyzer 3.0, http://analyzer.polito.it
- 41.Telecommunication Network Group, Politecnico di Torino. Tstat 1.01, http://tstat.polito.it
- 42.Network Research Group, Lawrence Berkeley National Laboratory. Tcpdump 4.0.0, http://www.tcpdump.org
- 43.Network Research Group, Lawrence Berkeley National Laboratory. Libpcap 1.0.0, http://www.tcpdump.org