Journal of Network and Systems Management

, Volume 23, Issue 4, pp 998–1015 | Cite as

Designing an Internet Traffic Predictive Model by Applying a Signal Processing Method

Article

Abstract

Detection of abnormal internet traffic has become a significant area of research in network security. Due to its importance, many predictive models are designed by utilizing machine learning algorithms. The models are well designed to show high performances in detecting abnormal internet traffic behaviors. However, they may not guarantee reliable detection performances for new incoming abnormal internet traffic because they are designed using raw features from imbalanced internet traffic data. Since internet traffic is non-stationary time-series data, it is difficult to identify abnormal internet traffic with the raw features. In this study, we propose a new approach to detecting abnormal internet traffic. Our approach begins with extracting hidden, but important, features by utilizing discrete wavelet transformation. Then, statistical analysis is performed to filter out irrelevant and less important features. Only statistically significant features are used to design a reliable predictive model with logistic regression. A comparative analysis is conducted to determine the importance of our approach by measuring accuracy, sensitivity, and the Area Under the receiver operating characteristic Curve. From the analysis, we found that our model detects abnormal internet traffic successfully with high accuracy.

Keywords

Internet traffic detection Discrete wavelet transformation Logistic regression Area Under ROC Curve (AUC) 

References

  1. 1.
    Han, J., Kamber, M.: Data mining: concepts and techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science (2011)Google Scholar
  2. 2.
    Madhukar, A., Williamson, C.: A longitudinal study of p2p traffic classification. In: Modeling, analysis, and simulation of computer and telecommunication systems, 2006. MASCOTS 2006. 14th IEEE International Symposium on, pp. 179–188 (2006). doi:10.1109/MASCOTS.2006.6
  3. 3.
    Dashevskiy, M., Luo, Z.: Reliable probabilistic classification and its application to internet traffic. In: Huang, II, D.S., Levine, D.C.W., Levine, D.S., Jo, K.H. (eds.) ICIC (1), Lecture notes in computer science, 5226, pp. 380–388. Springer (2008)Google Scholar
  4. 4.
    Kim, J.T., Park, H.K., Paik, E.H.: Security issues in peer-to-peer systems. In: Advanced communication technology, 2005, ICACT 2005. The 7th International Conference on, vol. 2, 1059–1063 (2005). doi:10.1109/ICACT.2005.246141
  5. 5.
    Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: Proceedings of the 13th International Conference on World Wide Web. WWW ’04, pp. 512–521. ACM, New York, NY, USA (2004)Google Scholar
  6. 6.
    Raahemi, B., Zhong, W., Liu, J.: Peer-to-peer traffic identification by mining ip layer data streams using concept-adapting very fast decision tree. In: Tools with artificial intelligence, 2008. ICTAI ’08. 20th IEEE International Conference on, vol. 1, pp. 525–532 (2008)Google Scholar
  7. 7.
    Moore, A., Papagiannaki, K.: Toward the accurate identification of network applications. In: Dovrolis, C. (ed.) Passive and active network measurement, lecture notes in computer science, vol. 3431, pp. 41–54. Springer, Berlin (2005)CrossRefGoogle Scholar
  8. 8.
    Kushida, T., Shibata, Y.: Empirical study of inter-arrival packet times and packet losses. In: Distributed computing systems workshops, 2002. In: Proceedings. 22nd international conference on, pp. 233–238 (2002). doi:10.1109/ICDCSW.2002.1030775
  9. 9.
    Li, W., Canini, M., Moore, A.W., Bolla, R.: Efficient application identification and the temporal and spatial stability of classification schema. Elsevier Computer Network (2009)Google Scholar
  10. 10.
    Karagiannis, T., Broido, A., Faloutsos, M., claffy, K.: Transport layer identification of p2p traffic. In: Proceedings of the 4th ACM SIGCOMM conference on internet measurement, IMC ’04, pp. 121–134. ACM, New York, NY, USA (2004). doi:10.1145/1028788.1028804
  11. 11.
    Xu, K., Zhang, M., Ye, M., Chiu, D.M., Wu, J.: Identify p2p traffic by inspecting data transfer behavior. Comput. Commun. 33(10), 1141–1150 (2010)CrossRefGoogle Scholar
  12. 12.
    Holanda Filho, R., Fontenelle do Carmo, M., Maia, J., Siqueira, G.: An internet traffic classification methodology based on statistical discriminators. In: Network operations and management symposium, 2008. NOMS 2008. IEEE, pp. 907–910 (2008). doi:10.1109/NOMS.2008.4575244
  13. 13.
    Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)CrossRefGoogle Scholar
  14. 14.
    Lu, X., Duan, H., Li, X.: Identification of p2p traffic based on the content redistribution characteristic. In: Communications and information technologies, 2007. ISCIT ’07. International symposium on, pp. 596–601 (2007). doi:10.1109/ISCIT.2007.4392088
  15. 15.
    He, H., Ma, Y.: Imbalanced learning: foundations, algorithms, and applications, 1st edn. Wiley-IEEE Press, London (2013)CrossRefGoogle Scholar
  16. 16.
    Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced data sets II (2003)Google Scholar
  17. 17.
    Bhuyan, M., Bhattacharyya, D., Kalita, J.: Network anomaly detection: methods, systems and tools. Commun. Surv. Tutor. IEEE 16(1), 303–336 (2014). doi:10.1109/SURV.2013.052213.00046 CrossRefGoogle Scholar
  18. 18.
    Estevez-Tapiador, J.M., Garcia-Teodoro, P., Diaz-Verdejo, J.E.: Anomaly detection methods in wired networks: a survey and taxonomy. Comput. Commun. 27(16), 1569–1584 (2004). doi:10.1016/j.comcom.2004.07.002 CrossRefGoogle Scholar
  19. 19.
    Este, A., Gringoli, F., Salgarelli, L.: Support vector machines for tcp traffic classification. Comput. Netw. 53(14), 2476–2490 (2009). doi:10.1016/j.comnet.2009.05.003 CrossRefGoogle Scholar
  20. 20.
    Li, Z., Yuan, R., Guan, X.: Accurate classification of the internet traffic based on the svm method. In: Communications, 2007. ICC ’07. IEEE international conference on, pp. 1373–1378 (2007). doi:10.1109/ICC.2007.231
  21. 21.
    Huang, S.Y., Huang, Y.N.: Network traffic anomaly detection based on growing hierarchical som. In: 2013 43rd annual IEEE/IFIP international conference on dependable systems and networks (DSN) 0, 1–2 (2013)Google Scholar
  22. 22.
    Hoz Franco, E., Ortiz Garcia, A., Ortega Lopera, J., Hoz Correa, E., Prieto Espinosa, A.: Network anomaly detection with bayesian self-organizing maps. Advances in computational intelligence, lecture notes in computer science, vol. 7902, pp. 530–537. Springer, Berlin (2013)Google Scholar
  23. 23.
    Auld, T., Moore, A., Gull, S.: Bayesian neural networks for internet traffic classification. Neural Netw. IEEE Trans. 18(1), 223–239 (2007)CrossRefGoogle Scholar
  24. 24.
    Sun, R., Yang, B., Peng, L., Chen, Y., Zhang, L., Jing, S.: Traffic classification using probabilistic neural networks. In: Natural computation (ICNC), 2010 sixth international conference on, vol. 4, pp. 1914–1919 (2010)Google Scholar
  25. 25.
    Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. SIGMETRICS Perform. Eval. Rev. 33(1), 50–60 (2005)CrossRefGoogle Scholar
  26. 26.
    Alarcon-Aquino, V., Barria, J.: Anomaly detection in communication networks using wavelets. Commun. IEE Proc. 148(6), 355–362 (2001)CrossRefGoogle Scholar
  27. 27.
    Kim, S., Reddy, A., Vannucci, M.: Detecting traffic anomalies using discrete wavelet transform. In: Kahng, H.K., Goto, S. (eds.) Information networking. Networking technologies for broadband and mobile networks. Lecture notes in computer science, vol. 3090, pp. 951–961. Springer, Berlin (2004)CrossRefGoogle Scholar
  28. 28.
    Lu, W., Ghorbani, A.A.: Network anomaly detection based on wavelet analysis. EURASIP J. Adv. Signal Process, pp. 1–16 (2009). Hindawi Publishing Corporation, New York (2008)Google Scholar
  29. 29.
    Kyriakopoulos, K., Parish, D.: Using wavelets for compression and detecting events in anomalous network traffic. In: Systems and networks communications, 2009. ICSNC ’09. Fourth international conference on, pp. 195–200 (2009)Google Scholar
  30. 30.
    Barford, P., Kline, J., Plonka, D., Ron, A.: A signal analysis of network traffic anomalies. In: Proceedings of the 2nd ACM SIGCOMM workshop on internet measurment. IMW ’02, pp. 71–82. ACM, New York (2002)Google Scholar
  31. 31.
    Callegari, C., Giordano, S., Pagano, M.: Application of wavelet packet transform to network anomaly detection. In: Balandin, S., Moltchanov, D., Koucheryavy, Y. (eds.) Next generation teletraffic and wired/wireless advanced networking. Lecture notes in computer science, vol. 5174, pp. 246–257. Springer, Berlin (2008)CrossRefGoogle Scholar
  32. 32.
    Gao, J., Hu, G., Yao, X., Chang, R.: Anomaly detection of network traffic based on wavelet packet. In: Communications, 2006. APCC ’06. Asia-Pacific conference on, pp. 1–5 (2006)Google Scholar
  33. 33.
    Tan, J., Chen, Xs, Du, M., Zhu, K.: A novel internet traffic identification approach using wavelet packet decomposition and neural network. J. Cent. South Univ. 19(8), 2218–2230 (2012). doi:10.1007/s11771-012-1266-0 CrossRefGoogle Scholar
  34. 34.
    Ramanathan, A.: WADeS: a tool for distributed denial of service attack detection. Texas A&M University, Texas (2002)Google Scholar
  35. 35.
    Dainotti, A., Pescape, A., Ventre, G.: Nis04-1: Wavelet-based detection of dos attacks. In: Global telecommunications conference, 2006. GLOBECOM ’06. IEEE, pp. 1–6 (2006). doi:10.1109/GLOCOM.2006.279
  36. 36.
    Moore, A., Crogan, M., Moore, A.W., Mary, Q., Zuev, D., Zuev, D., Crogan, M.L.: Discriminators for use in flow-based classification. Tech. rep. (2005)Google Scholar
  37. 37.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). doi:10.1109/TKDE.2008.239 CrossRefGoogle Scholar
  38. 38.
    Wang, W., Zhang, X., Gombault, S., Knapskog, S.: Attribute normalization in network intrusion detection. In: Pervasive systems, algorithms, and networks (ISPAN), 2009 10th international symposium on, pp. 448–453 (2009)Google Scholar
  39. 39.
    Unser, M., Aldroubi, A.: A review of wavelets in biomedical applications. Proc. IEEE 84(4), 626–638 (1996)CrossRefGoogle Scholar
  40. 40.
    Meyer, Y., Ryan, R.: Wavelets: Algorithms and applications. Miscellaneous Bks. Soc. Ind. Appl. Math. (1993)Google Scholar
  41. 41.
    Hasford, J., Ansari, H., Lehmann, K.: Cart and logistic regression analyses of risk factors for first dose hypotension by an ace-inhibitor. Therapie 48(5), 479–482 (1993)Google Scholar
  42. 42.
    Kuhnert, P.M., Do, K.A., McClure, R.: Combining non-parametric models with logistic regression: an application to motor vehicle injury data. Comput. Stat. Data Anal. 34(3), 371–386 (2000)CrossRefGoogle Scholar
  43. 43.
    Long, W.J., Griffith, J.L., Selker, H.P., D’agostino, R.B.: A comparison of logistic regression to decision-tree induction in a medical domain. Comput. Biomed. Res. 74–97 (1993)Google Scholar
  44. 44.
    Stone, M.: Cross-validatory choice and assessment of statistical predictions. R. Stat. Soc. 36, 111–147 (1974)Google Scholar
  45. 45.
    Cawley, G.C., Talbot, N.L.: Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers. Pattern Recognit. 36(11), 2585–2592 (2003). doi:10.1016/S0031-3203(03)00136-5 CrossRefGoogle Scholar
  46. 46.
    Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1–3), 131–159 (2002). doi:10.1023/A:1012450327387 CrossRefGoogle Scholar
  47. 47.
    Vapnik, V., Chapelle, O.: Bounds on error expectation for support vector machines. Neural Comput. 12(9), 2013–2036 (2000)CrossRefGoogle Scholar
  48. 48.
    Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997). doi:10.1016/S0031-3203(96)00142-2 CrossRefGoogle Scholar
  49. 49.
    King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9, 137–163 (2001)CrossRefGoogle Scholar
  50. 50.
    Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 28(1), 92–122 (2014). doi:10.1007/s10618-012-0295-5 MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Computer ScienceBowie State UniversityBowieUSA
  2. 2.Department of Computer Science and Information TechnologyUniversity of the District of ColumbiaWashingtonUSA

Personalised recommendations