Skip to main content

Tor anonymous traffic identification based on gravitational clustering


The anonymous communication technology has brought new challenges to traffic analysis since it creates a private network pathway. Clustering analysis has been proved to be efficient in grouping Internet traffic. However, the cluster number of traditional clustering algorithms must be pointed, like K-means. In this paper, the gravitation is introduced into the process of clustering in order to develop an improved Tor anonymous traffic identifier called gravitational clustering algorithm (GCA). In the proposed method, we consider each sample in the dataset as an object in the feature space, and the new object moves into the corresponding cluster according to gravitational force and similarity. The GCA was applied to a data set consisting of 2366 Tor network flows and 20926 other network flows. Simulation test evaluated and compared the performance of the proposed classifier with three state-of-the-art clustering algorithms. The tests yielded that the average accuracy rate, R and FM coefficient of the proposed GCA algorithm exceed 0.8. However, among the other three clustering algorithms, K-means can achieve the highest detection rate (0.5).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4





  1. Kido H, Yanagisawa Y, Satoh T (2005) An anonymous communication technique using dummies for location-based services. In: Proceedings of the International Society for Magnetic Resonance in Medicine on Pervasive Services, ICPS’05, pp 88–97

  2. Dingledine R, Mathewson N, Syverson P (2004) The second-generation onion router[R]. Naval Research Lab, Washington, DC

    Book  Google Scholar 

  3. Danezis G, Diaz C (2008) A survey of anonymous communication channels[R]. Technical Report MSRTR- 2008-35, Microsoft Research

  4. Sherry J, Lan C, Popa R (2015) Blindbox: Deep packet inspection over encrypted traffic. In: Proceedings of ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015, pp 213–226

  5. Teuton J, Peterson E, Nordwall D et al (2013) LINEBACkER: Bio-inspired data reduction toward real time network traffic analysis. In: Proceedings of 2013 6th International Symposium on IEEE, pp 170–174

  6. Ranjan S, Robinson J, Chen F Machine learning based botnet detection using real-time connectivity graph based traffic features. U.S. Patent 8,762,298[P]. 2014-6-24

  7. Münz G, Li S, Carle G (2007) Traffic anomaly detection using k-means clustering. In: Proceedings of GI/ITG Workshop MMBnet. pp 13–14

  8. De Oña J, López G, Mujalli R et al (2013) Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks. Accid Anal Prev 51:1–10

    Article  Google Scholar 

  9. Fahad A, Alshatri N, Tari Z et al (2014) A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279

    Article  Google Scholar 

  10. Finch H (2005) Comparison of distance measures in cluster analysis with dichotomous data. J Data Sci 3(1):85–100

    Google Scholar 

  11. Nath G, Kumar V, Reddy KS (2016) Scalable self-conscious spectral clustering. IJRCCT 5(8):387–393

    Google Scholar 

  12. Said AB, Hadjidj R, Foufou S (2015) Cluster validity index based on jeffrey divergence. Pattern Analysis and Applications, pp 1–11

  13. Bauer KS, Sherr M, Grunwald D (2011) ExperimenTor: A Testbed for Safe and Realistic Tor Experimentation. CSET

  14. Jiangtao L, Yongling J (2005) Survey of P2P traffic identification and engineering technology. Telecommun Sci 3(017):57–61

    Google Scholar 

  15. Peter S, Westhoff D, Castelluccia C (2010) A survey on the encryption of convergecast traffic with in-network processing. IEEE Trans Dependable Secure Comput 7(1):20–34

    Article  Google Scholar 

  16. Möller U, Cottrell L, Palfrader P, Sassaman L (2003) Mixmaster protocolłversion 2. Draft

  17. Tor metrics portal[EB/OL]., 2013

  18. Wang X, Shi J, Fang B et al (2013) An empirical analysis of family in the Tor network. In: Proceedings of the 2013 IEEE International Conference on Communications (ICC Conference on SIGCOMM]//Communications (ICC). IEEE, pp 1995–2000

  19. Zhou Y, Yang Q, Yang B, Wu Z (2014) A Tor Anonymous Communication System with Security. Enhancements[J]. J Comput Res Developement 51(7):1538–1546

    Google Scholar 

  20. Feng X, Tianbo L, Puxin Y, Wanjiang H, Xiaomeng Z, Hongyu Y (2014) Designs of routerupdate and SOCKS proxy for Tor anonymous communication system[J]. WIT Trans Eng Sci 92:21–29

    Google Scholar 

  21. John H, Amir H Bypassing Chinese Censorship without Proxies Using Cached Content

  22. L7filter, application layer packet classifier for linux.

  23. Opendpi.

  24. Donato W, Dainotti A et al (2014) Traffic identification engine: an open platform for traffic classification. IEEE Netw 28(2):56 C64

    Article  Google Scholar 

  25. Arndt D. Calculating flow statistics using netmate.

  26. Sebastian Zander NW netai - network traffic based application identification.

  27. Crotti M, Dusi M, Gringoli F, Salgarelli L (2007) Traffic classification through simple statistical fingerprinting. SIGCOMM Comput Commun Rev 37(1):5C16. doi:10.1145/1198255.1198257

    Article  Google Scholar 

  28. Jamil HA, Zarei R, Fadlelssied NO, Aliyu M, Nor SM, Marsono MN Analysis of features selection for p2p traffic detection using support vector machine. In: Proceedings of the 2013 International Conference of Information and Communication Technology(ICoICT), IEEE, 2013; 116C121

  29. Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223C 239

    Article  Google Scholar 

  30. Zander S, Nguyen T, Armitage G Automated traffic classification and application identification using machine learning. 2005. 30th Anniversary. The IEEE Conference on Local Computer Networks, IEEE, 2005; 250C257

  31. Xu K, Zhang M, Ye M, Qin Z, Westberg L, Westholmb T (2009) Ntrs:Afsm-based traffic identification system

  32. Karagiannis T, Papagiannaki K, Faloutsos M (2005) Blinc: multilevel traffic classification in the dark. ACM SIGCOMM Comput Commun Rev 35:229C240. ACM

    Article  Google Scholar 

  33. Hu Y, Chiu DM, Lui JC (2009) Profiling and identification of p2p traffic. Comput Netw 53(6):849C863. doi:10.1016/j.comnet.2008.11.005., traffic Classification and Its Applications to Modern Networks

    Article  MATH  Google Scholar 

  34. Yan J, Fan X (2013) Hfbp: Identifying p2p traffic by host level and flow level behavior profiles. J Netw 8(8):1866C1873

    Google Scholar 

  35. He GF, Yang M, Luo JZ, Zhang L (2014) Online identification of Tor anonymous communication traffic. J Softw 24(3):540C546. doi:10.3724/SP.J.1001.2013.04253

    Google Scholar 

  36. Alaeddin A, Ali H, Jalal A (2015) A model for detecting tor encrypted traffic using supervised machine learning[J]. I J Comput Netw Inf Secur 7:10–23

    Google Scholar 

  37. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: A survey. ACM Comput Surv (CSUR) 41(3):15

    Article  Google Scholar 

  38. Arndt DJ, Zincir-Heywood AN (2011) A comparison of three machine learning techniques for encrypted network traffic analysis. In: Proceedings of 2011 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), pp 107C114

  39. Wang Y, Zhang Z, Guo L, Li S (2011) Using entropy to classify traffic more deeply. In: Proceedings of 2011 6th IEEE International Conference on Networking, Architecture and Storage (NAS), pp 45C52

  40. Silveira F, Diot C, Taft N, Govindan R (2010) ASTUTE: Detecting a different class of traffic anomalies. In: Proceedings of the ACM SIGCOMM 2010 Conference on SIGCOMM, pp 267–278

  41. Bauer K S, Sherr M, Grunwald D (2011) ExperimenTor: A Testbed for Safe and Realistic Tor Experimentation[C]. CSET

  42. Kanda Y, Fukuda K, Sugawara T (2010) A flow analysis for mining traffic anoMalies. In: Proceedings of the IEEE International Conference on Communications, pp 23–27

  43. Barker J, Hannay P, Szewczyk P (2011) Using traffic analysis to identify the second generation onion router. In: Proceedings IFIP 9th International Conference on Embedded and Ubiquitous Computing (EUC 2011), pp 72–78

  44. Winter P, Lindskog S (2012) How the Great Firewall of China is Blocking Tor. In: Proceedings of 2nd USENIX Workshop on Free and Open Communications on the Internet, pp 1

  45. Krawczyk H (2003) SIGMA: The SIGn-and-MAc’approach to authenticated Diffie-Hellman and its use in the IKE protocols. In: Proceedings of 23rd Annual International Cryptology Conference, pp. 400–425. Deri L and PF RING M http://www.ntop.Org[J].PFRING.html

  46. Deri L (2011) PF RING M http://www.ntop.Org[J].PFRING.html

  47. Wang J, Liu A, Yan T, Zeng Z A Resource Allocation Model Based on Double-sided Combinational Auctions for Transparent Computing, Peer-to-Peer Networking and Applications. Applications. doi:10.1007/s12083-017-0556-6

  48. Liu Y, Liu A, Li Y, Li Z, Choi Y-J, Sekiya H, Li J (2017) APMD: A fast data transmission protocol with reliability guarantee for pervasive sensing data communication. Pervasive and Mobile Computing. doi:10.1016/j.pmcj.2017.03.012

Download references


This work was supported by the National Natural Science Foundation of China (Grant Nos. 61572115), the Key Basic Research of Sichuan Province (Grant No. 2016JY0007).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Zhihong Rao.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rao, Z., Niu, W., Zhang, X. et al. Tor anonymous traffic identification based on gravitational clustering. Peer-to-Peer Netw. Appl. 11, 592–601 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Anonymous communication
  • Tor
  • Clustering analysis
  • Flow similarity
  • Gravitational clustering