Feature Evaluation for Early Stage Internet Traffic Identification

  • Lizhi Peng
  • Hongli Zhang
  • Bo Yang
  • Yuehui Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8630)


Identifying a network traffic at its early stage accurately is very important for the application of traffic identification. And this has caught a lot of interests in recent years. Packet sizes and statistical features are effective features that widely used in early stage traffic identification. However, an important issue is still unconcerned, that is whether there exists essential differences between using the packet sizes and derived features such as statistics in early stage traffic identification. In this paper, we set out to evaluate the effectiveness of different kinds of early stage traffic features. We firstly extract the packet sizes and their derived features of the first 10 packets on 3 traffic data sets. Then the mutual information between each feature and the corresponding traffic type label is computed to show the effectiveness of the feature. And then we execute a set of crossover identification experiments with different feature sets using 7 well-known classifiers. Our experimental results show that most classifiers get almost the same performances using packet sizes and derived features for early stage traffic identification. And the combined feature set selected by mutual information can obtain high identification performances.


Feature selection Early stage traffic classification Machine learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic Classification On The Fly. In: ACM SIGCOMM 2006, pp. 23–26 (2006)Google Scholar
  2. 2.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)CrossRefGoogle Scholar
  3. 3.
    Dainotti, A., Pescapé, A., Claffy, K.C.: Issues and future directions in traffic classification. IEEE Network 26(1), 35–40 (2012)CrossRefGoogle Scholar
  4. 4.
    Dainotti, A., Pescapé, A., Sansone, C.: Early classification of network traffic through multi-classification. In: Domingo-Pascual, J., Shavitt, Y., Uhlig, S. (eds.) TMA 2011. LNCS, vol. 6613, pp. 122–135. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Estan, C., Varghese, G.: New Directions in Traffic Measurement and Accounting: Focusing on the Elephants, Ignoring the Mice. ACM Transactions on Computer Systems 21(3), 270–313 (2003)CrossRefGoogle Scholar
  6. 6.
    Este, A., Gringoli, F., Salgarelli, L.: On the Stability of the Information Carried by Traffic Flow Features at the Packet Level. In: ACM SIGCOMM 2009, pp. 13–18 (2009)Google Scholar
  7. 7.
    Este, A., Gringoli, F., Salgarelli, L.: Support Vector Machines for TCP traffic classification. Computer Networks 53, 2476–2490 (2009)CrossRefzbMATHGoogle Scholar
  8. 8.
    Huang, N., Jai, G., Chao, H.: Early identifying application traffic with application characteristics. In: IEEE Int. Conference on Communications (ICC 2008), pp. 5788–5792 (2008)Google Scholar
  9. 9.
    Huang, N., Jai, G., Chao, H., et al.: Application traffic classification at the early stage by characterizing application rounds. Information Sciences 232(20), 130–142 (2013)CrossRefGoogle Scholar
  10. 10.
    Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering 17, 299–310 (2005)CrossRefGoogle Scholar
  11. 11.
    Hullár, B., Laki, S., Gyorgy, A.: Early identification of peer-to-peer traffic. In: 2011 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE Press (2011)Google Scholar
  12. 12.
    Gringoli, F., Salgarelli, L., Dusi, M., et al.: Gt: picking up the truth from the ground for internet traffic. ACM SIGCOMM Computer Communication Review 39(5), 12–18 (2009)CrossRefGoogle Scholar
  13. 13.
    Li, W., Moore, A.W.: A Machine Learning Approach for Efficient Traffic Classification. In: Proceedings of IEEE MASCOTS 2007, pp. 310–317 (2007)Google Scholar
  14. 14.
    Moore, A.W., Zuev, D., Crogan, M.: Discriminators for use in flow-based classification, Intel Research Tech. Rep. (2005)Google Scholar
  15. 15.
    Moore, A.W., Zuev, D.: Internet Traffic Classification Using Bayesian Analysis Techniques. In: ACM SIGMETRICS 2005, pp. 50–60 (2005)Google Scholar
  16. 16.
    Nguyen, T.T.T., Armitage, G., Branch, P., et al.: Timely and continuous machine-learning-based classification for interactive IP traffic. IEEE/ACM Transactions on Networking (TON) 20(6), 1880–1894 (2012)CrossRefGoogle Scholar
  17. 17.
  18. 18.
    Peng, L., Zhang, H., Yang, B., et al.: Traffic Labeller: Collecting Internet Traffic Samples with Accurate Application Information. China Communications 11(1), 67–78 (2014)CrossRefGoogle Scholar
  19. 19.
    Qu, B., Zhang, Z., Guo, L., et al.: On accuracy of early traffic classification. In: IEEE 7th International Conference on Networking, Architecture and Storage (NAS), pp. 348–354. IEEE Press (2012)Google Scholar
  20. 20.
  21. 21.
  22. 22.
    Waikato Internet Traffic Storage (WITS),
  23. 23.
    Weka 3: Data Mining Software in Java,
  24. 24.
    Zhang, J., Xiang, Y., Wang, Y., et al.: Network traffic classification using correlation information. IEEE Transactions on Parallel and Distributed Systems 24(1), 104–117 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Lizhi Peng
    • 1
    • 2
  • Hongli Zhang
    • 1
  • Bo Yang
    • 2
  • Yuehui Chen
    • 2
  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinP.R. China
  2. 2.Provincial Key Laboratory for Network Based Intelligent ComputingUniversity of JinanJinanP.R. China

Personalised recommendations