Mining Unclassified Traffic Using Automatic Clustering Techniques

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6613)


In this paper we present a fully unsupervised algorithm to identify classes of traffic inside an aggregate. The algorithm leverages on the K-means clustering algorithm, augmented with a mechanism to automatically determine the number of traffic clusters. The signatures used for clustering are statistical representations of the application layer protocols.

The proposed technique is extensively tested considering UDP traffic traces collected from operative networks. Performance tests show that it can clusterize the traffic in few tens of pure clusters, achieving an accuracy above 95%. Results are promising and suggest that the proposed approach might effectively be used for automatic traffic monitoring, e.g., to identify the birth of new applications and protocols, or the presence of anomalous or unexpected traffic.


Normalize Mutual Information Centroid Position Pure Cluster Packet Payload Unsupervised Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Karagiannis, T., Broido, A., Brownlee, N., Claffy, K., Faloutsos, M.: Is p2p dying or just hiding? In: Globecom, Dallas, TX (November 2004)Google Scholar
  2. 2.
    Karagiannis, T., Papagiannaki, D., Faloutsos, M.: Blinc: Multilevel traffic classification in the dark. In: SIGCOMM, Philadelphia, PA (August 2005)Google Scholar
  3. 3.
    Bernaille, L., Teixeira, R., Salamatian, K.: Early application identification. In: CoNEXT, Lisboa, PT (December 2006)Google Scholar
  4. 4.
    Zhang, M., Dusi, M., John, W., Chen, C.: Analysis of UDP Traffic Usage on Internet Backbone Links. In: Proceedings of the 2009 Ninth Annual International Symposium on Applications and the Internet, Seattle, WA (July 2009)Google Scholar
  5. 5.
    Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: ACM SIGCOMM, Pisa, IT (September 2006)Google Scholar
  6. 6.
    Erman, J., Mahanti, A., Arlitt, M.: Internet traffic identification using machine learning. In: IEEE GLOBECOM, San Francisco, CA (December 2006)Google Scholar
  7. 7.
    McGregor, A., Hall, M., Lorier, P., Brunskill, J.: Flow clustering using machine learning techniques. In: Barakat, C., Pratt, I. (eds.) PAM 2004. LNCS, vol. 3015, pp. 205–214. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Wang, Y., Xiang, Y., Yu, S.: An automatic application signature construction system for unknown traffic. Concurrency and Computation: Practice and Experience 22, 1927–1944 (2010)CrossRefGoogle Scholar
  9. 9.
    Erman, E., Mahanti, A., Arlitt, M., Cohen, I., Williamson, C.: Semi-supervised network traffic classification. In: ACM SIGMETRICS, San Diego, CA (June 2007)Google Scholar
  10. 10.
    Yuan, J., Li, Z., Yuan, R.: Information entropy based clustering method for unsupervised internet traffic classification. In: IEEE ICC, Beijing, CN (May 2008)Google Scholar
  11. 11.
    Finamore, A., Mellia, M., Meo, M., Rossi, D.: KISS: Stochastic Packet Inspection Classifier for UDP Traffic. IEEE/ACM Transactions on Networking 18(5), 1505–1515 (2010)CrossRefGoogle Scholar
  12. 12.
    Mantia, G.L., Rossi, D., Finamore, A., Mellia, M., Meo, M.: Stochastic Packet Inspection for TCP Traffic. In: IEEE International Conference on Communication - ICC, Cape Town, SA (May 2010)Google Scholar
  13. 13.
    Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, ch. 2, pp. 25–71. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Bianco, A., Mardente, G., Mellia, M., Munafò, M., Muscariello, L.: Web User-session Inference by Means of Clustering Techniques. IEEE/ACM Transactions on Networking 17(2), 405–416 (2009)CrossRefGoogle Scholar
  15. 15.
    Finamore, A., Mellia, M., Meo, M., Munafò, M., Rossi, D.: Live Traffic Monitoring with Tstat: Capabilities and Experiences. In: Osipov, E., Kassler, A., Bohnert, T.M., Masip-Bruin, X. (eds.) WWIC 2010. LNCS, vol. 6074, pp. 290–301. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Ciullo, D., da Rocha Neta, A.G., Horvath, A., Leonardi, E., Mellia, M., Rossi, D., Telek, M., Veglia, P.: Network Awareness of P2P Live Streaming Applications: a Measurement Study. IEEE Transanctions on Multimedia 12(1), 54–63 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Politecnico di TorinoItaly

Personalised recommendations