Abstract
Detection of abnormal internet traffic has become a significant area of research in network security. Due to its importance, many predictive models are designed by utilizing machine learning algorithms. The models are well designed to show high performances in detecting abnormal internet traffic behaviors. However, they may not guarantee reliable detection performances for new incoming abnormal internet traffic because they are designed using raw features from imbalanced internet traffic data. Since internet traffic is non-stationary time-series data, it is difficult to identify abnormal internet traffic with the raw features. In this study, we propose a new approach to detecting abnormal internet traffic. Our approach begins with extracting hidden, but important, features by utilizing discrete wavelet transformation. Then, statistical analysis is performed to filter out irrelevant and less important features. Only statistically significant features are used to design a reliable predictive model with logistic regression. A comparative analysis is conducted to determine the importance of our approach by measuring accuracy, sensitivity, and the Area Under the receiver operating characteristic Curve. From the analysis, we found that our model detects abnormal internet traffic successfully with high accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Han, J., Kamber, M.: Data mining: concepts and techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science (2011)
Madhukar, A., Williamson, C.: A longitudinal study of p2p traffic classification. In: Modeling, analysis, and simulation of computer and telecommunication systems, 2006. MASCOTS 2006. 14th IEEE International Symposium on, pp. 179–188 (2006). doi:10.1109/MASCOTS.2006.6
Dashevskiy, M., Luo, Z.: Reliable probabilistic classification and its application to internet traffic. In: Huang, II, D.S., Levine, D.C.W., Levine, D.S., Jo, K.H. (eds.) ICIC (1), Lecture notes in computer science, 5226, pp. 380–388. Springer (2008)
Kim, J.T., Park, H.K., Paik, E.H.: Security issues in peer-to-peer systems. In: Advanced communication technology, 2005, ICACT 2005. The 7th International Conference on, vol. 2, 1059–1063 (2005). doi:10.1109/ICACT.2005.246141
Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: Proceedings of the 13th International Conference on World Wide Web. WWW ’04, pp. 512–521. ACM, New York, NY, USA (2004)
Raahemi, B., Zhong, W., Liu, J.: Peer-to-peer traffic identification by mining ip layer data streams using concept-adapting very fast decision tree. In: Tools with artificial intelligence, 2008. ICTAI ’08. 20th IEEE International Conference on, vol. 1, pp. 525–532 (2008)
Moore, A., Papagiannaki, K.: Toward the accurate identification of network applications. In: Dovrolis, C. (ed.) Passive and active network measurement, lecture notes in computer science, vol. 3431, pp. 41–54. Springer, Berlin (2005)
Kushida, T., Shibata, Y.: Empirical study of inter-arrival packet times and packet losses. In: Distributed computing systems workshops, 2002. In: Proceedings. 22nd international conference on, pp. 233–238 (2002). doi:10.1109/ICDCSW.2002.1030775
Li, W., Canini, M., Moore, A.W., Bolla, R.: Efficient application identification and the temporal and spatial stability of classification schema. Elsevier Computer Network (2009)
Karagiannis, T., Broido, A., Faloutsos, M., claffy, K.: Transport layer identification of p2p traffic. In: Proceedings of the 4th ACM SIGCOMM conference on internet measurement, IMC ’04, pp. 121–134. ACM, New York, NY, USA (2004). doi:10.1145/1028788.1028804
Xu, K., Zhang, M., Ye, M., Chiu, D.M., Wu, J.: Identify p2p traffic by inspecting data transfer behavior. Comput. Commun. 33(10), 1141–1150 (2010)
Holanda Filho, R., Fontenelle do Carmo, M., Maia, J., Siqueira, G.: An internet traffic classification methodology based on statistical discriminators. In: Network operations and management symposium, 2008. NOMS 2008. IEEE, pp. 907–910 (2008). doi:10.1109/NOMS.2008.4575244
Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)
Lu, X., Duan, H., Li, X.: Identification of p2p traffic based on the content redistribution characteristic. In: Communications and information technologies, 2007. ISCIT ’07. International symposium on, pp. 596–601 (2007). doi:10.1109/ISCIT.2007.4392088
He, H., Ma, Y.: Imbalanced learning: foundations, algorithms, and applications, 1st edn. Wiley-IEEE Press, London (2013)
Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced data sets II (2003)
Bhuyan, M., Bhattacharyya, D., Kalita, J.: Network anomaly detection: methods, systems and tools. Commun. Surv. Tutor. IEEE 16(1), 303–336 (2014). doi:10.1109/SURV.2013.052213.00046
Estevez-Tapiador, J.M., Garcia-Teodoro, P., Diaz-Verdejo, J.E.: Anomaly detection methods in wired networks: a survey and taxonomy. Comput. Commun. 27(16), 1569–1584 (2004). doi:10.1016/j.comcom.2004.07.002
Este, A., Gringoli, F., Salgarelli, L.: Support vector machines for tcp traffic classification. Comput. Netw. 53(14), 2476–2490 (2009). doi:10.1016/j.comnet.2009.05.003
Li, Z., Yuan, R., Guan, X.: Accurate classification of the internet traffic based on the svm method. In: Communications, 2007. ICC ’07. IEEE international conference on, pp. 1373–1378 (2007). doi:10.1109/ICC.2007.231
Huang, S.Y., Huang, Y.N.: Network traffic anomaly detection based on growing hierarchical som. In: 2013 43rd annual IEEE/IFIP international conference on dependable systems and networks (DSN) 0, 1–2 (2013)
Hoz Franco, E., Ortiz Garcia, A., Ortega Lopera, J., Hoz Correa, E., Prieto Espinosa, A.: Network anomaly detection with bayesian self-organizing maps. Advances in computational intelligence, lecture notes in computer science, vol. 7902, pp. 530–537. Springer, Berlin (2013)
Auld, T., Moore, A., Gull, S.: Bayesian neural networks for internet traffic classification. Neural Netw. IEEE Trans. 18(1), 223–239 (2007)
Sun, R., Yang, B., Peng, L., Chen, Y., Zhang, L., Jing, S.: Traffic classification using probabilistic neural networks. In: Natural computation (ICNC), 2010 sixth international conference on, vol. 4, pp. 1914–1919 (2010)
Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. SIGMETRICS Perform. Eval. Rev. 33(1), 50–60 (2005)
Alarcon-Aquino, V., Barria, J.: Anomaly detection in communication networks using wavelets. Commun. IEE Proc. 148(6), 355–362 (2001)
Kim, S., Reddy, A., Vannucci, M.: Detecting traffic anomalies using discrete wavelet transform. In: Kahng, H.K., Goto, S. (eds.) Information networking. Networking technologies for broadband and mobile networks. Lecture notes in computer science, vol. 3090, pp. 951–961. Springer, Berlin (2004)
Lu, W., Ghorbani, A.A.: Network anomaly detection based on wavelet analysis. EURASIP J. Adv. Signal Process, pp. 1–16 (2009). Hindawi Publishing Corporation, New York (2008)
Kyriakopoulos, K., Parish, D.: Using wavelets for compression and detecting events in anomalous network traffic. In: Systems and networks communications, 2009. ICSNC ’09. Fourth international conference on, pp. 195–200 (2009)
Barford, P., Kline, J., Plonka, D., Ron, A.: A signal analysis of network traffic anomalies. In: Proceedings of the 2nd ACM SIGCOMM workshop on internet measurment. IMW ’02, pp. 71–82. ACM, New York (2002)
Callegari, C., Giordano, S., Pagano, M.: Application of wavelet packet transform to network anomaly detection. In: Balandin, S., Moltchanov, D., Koucheryavy, Y. (eds.) Next generation teletraffic and wired/wireless advanced networking. Lecture notes in computer science, vol. 5174, pp. 246–257. Springer, Berlin (2008)
Gao, J., Hu, G., Yao, X., Chang, R.: Anomaly detection of network traffic based on wavelet packet. In: Communications, 2006. APCC ’06. Asia-Pacific conference on, pp. 1–5 (2006)
Tan, J., Chen, Xs, Du, M., Zhu, K.: A novel internet traffic identification approach using wavelet packet decomposition and neural network. J. Cent. South Univ. 19(8), 2218–2230 (2012). doi:10.1007/s11771-012-1266-0
Ramanathan, A.: WADeS: a tool for distributed denial of service attack detection. Texas A&M University, Texas (2002)
Dainotti, A., Pescape, A., Ventre, G.: Nis04-1: Wavelet-based detection of dos attacks. In: Global telecommunications conference, 2006. GLOBECOM ’06. IEEE, pp. 1–6 (2006). doi:10.1109/GLOCOM.2006.279
Moore, A., Crogan, M., Moore, A.W., Mary, Q., Zuev, D., Zuev, D., Crogan, M.L.: Discriminators for use in flow-based classification. Tech. rep. (2005)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). doi:10.1109/TKDE.2008.239
Wang, W., Zhang, X., Gombault, S., Knapskog, S.: Attribute normalization in network intrusion detection. In: Pervasive systems, algorithms, and networks (ISPAN), 2009 10th international symposium on, pp. 448–453 (2009)
Unser, M., Aldroubi, A.: A review of wavelets in biomedical applications. Proc. IEEE 84(4), 626–638 (1996)
Meyer, Y., Ryan, R.: Wavelets: Algorithms and applications. Miscellaneous Bks. Soc. Ind. Appl. Math. (1993)
Hasford, J., Ansari, H., Lehmann, K.: Cart and logistic regression analyses of risk factors for first dose hypotension by an ace-inhibitor. Therapie 48(5), 479–482 (1993)
Kuhnert, P.M., Do, K.A., McClure, R.: Combining non-parametric models with logistic regression: an application to motor vehicle injury data. Comput. Stat. Data Anal. 34(3), 371–386 (2000)
Long, W.J., Griffith, J.L., Selker, H.P., D’agostino, R.B.: A comparison of logistic regression to decision-tree induction in a medical domain. Comput. Biomed. Res. 74–97 (1993)
Stone, M.: Cross-validatory choice and assessment of statistical predictions. R. Stat. Soc. 36, 111–147 (1974)
Cawley, G.C., Talbot, N.L.: Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers. Pattern Recognit. 36(11), 2585–2592 (2003). doi:10.1016/S0031-3203(03)00136-5
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1–3), 131–159 (2002). doi:10.1023/A:1012450327387
Vapnik, V., Chapelle, O.: Bounds on error expectation for support vector machines. Neural Comput. 12(9), 2013–2036 (2000)
Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997). doi:10.1016/S0031-3203(96)00142-2
King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9, 137–163 (2001)
Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 28(1), 92–122 (2014). doi:10.1007/s10618-012-0295-5
Acknowledgments
This study is based on the work supported by US Army Research Office (ARO) Grant W911NF1310143.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ji, SY., Choi, S. & Jeong, D.H. Designing an Internet Traffic Predictive Model by Applying a Signal Processing Method. J Netw Syst Manage 23, 998–1015 (2015). https://doi.org/10.1007/s10922-014-9335-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10922-014-9335-3