Profiling User Activities with Minimal Traffic Traces

  • Tiep Mai
  • Deepak AjwaniEmail author
  • Alessandra Sala
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9114)


There is a need to strike a balance between the pursuit of personalized services based on a fine-grained behavioral analysis and the user privacy concerns. In this paper, we consider the use of web traces with truncated URLs, where each URL is trimmed to only contain the web domain, to remove sensitive user information. In order to offset the accuracy loss in user activity profiling due to URL truncation, we propose a statistical methodology that leverages specialized features extracted from a burst of consecutive URLs representing a micro user action. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. On a real dataset of mobile web traces, consisting of more than 130 million records and 10,000 users, we show that our methodology achieves around 90% accuracy in segregating URLs representing user activities from non-representative URLs.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    European Communities (Electronic Communications Networks and Services) (Privacy and Electronic Communications) Regulations (2011).
  2. 2.
    Alexa: Actionable Analytics for the Web (2015).
  3. 3.
    Nandi, A., Aghasaryan, A., Bouzid, M.: P3: a privacy preserving personalization middleware for recommendation-based services. In: Proceedings of 4th Hot Topics in Privacy Enhancing Technologies Symposium (HotPETS 2011) (2011)Google Scholar
  4. 4.
    BBC (2014). (accessed November 2014)
  5. 5.
    Bilenko, M., Richardson, M.: Predictive client-side profiles for personalized advertising. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 413–421. ACM (2011)Google Scholar
  6. 6.
    Chen, Y., Pavlov, D., Canny, J.F.: Large-scale behavioral targeting. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 209–218. ACM (2009)Google Scholar
  7. 7.
  8. 8.
    Facebook: Facebook and the Irish Data Protection Commission (2011). (accessed November 2014)
  9. 9.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Karagiannis, T., Molle, M., Faloutsos, M., Broido, A.: A nonstationary poisson view of internet traffic. In: INFOCOM (2004)Google Scholar
  11. 11.
    Kleinberg, J.M.: Bursty and hierarchical structure in streams. In: KDD, pp. 91–101 (2002)Google Scholar
  12. 12.
    Li, F., Sun, J., Papadimitriou, S., Mihaila, G.A., Stanoi, I.: Hiding in the crowd: privacy preservation on evolving streams through correlation tracking. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE, pp. 686–695. IEEE (2007)Google Scholar
  13. 13.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. TKDD 1(1) (2007)Google Scholar
  14. 14.
    Mai, T., Ajwani, D., Sala, A.: Profiling user activities with minimal traffic traces (2015). ArXiv e-printsGoogle Scholar
  15. 15.
    Nguyen, T.T.T., Armitage, G.J.: A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys and Tutorials 10(1–4), 56–76 (2008)CrossRefGoogle Scholar
  16. 16.
    Song, J., Lee, S., Kim, J.: I know the shortened urls you clicked on twitter: inference attack using public click analytics and twitter metadata. In: Proceedings of the 22Nd International Conference on World Wide Web, pp. 1191–1200. WWW 2013, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2013).
  17. 17.
    Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)zbMATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    TechCrunch (2015). (accessed March 2015)
  19. 19.
    Toubiana, V., Narayanan, A., Boneh, D., Nissenbaum, H., Barocas, S.: Adnostic: privacy preserving targeted advertising. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2010. The Internet Society (2010)Google Scholar
  20. 20.
    Wood, S.N.: Generalized additive models: an introduction with R. Chapman and Hall/CRC Texts in Statistical Science Series. Chapman and Hall/CRC Press (2006)Google Scholar
  21. 21.
    Xu, Y., Wang, K., Zhang, B., Chen, Z.: Privacy-enhancing personalized web search. In: Proceedings of the 16th International Conference on World Wide Web, WWW, pp. 591–600. ACM (2007)Google Scholar
  22. 22.
    Yan, J., Liu, N., Wang, G., Zhang, W., Jiang, Y., Chen, Z.: How much can behavioral targeting help online advertising? In: Proceedings of the 18th International Conference on World Wide Web, WWW, pp. 261–270. ACM (2009)Google Scholar
  23. 23.
  24. 24.
    Zhang, F., He, W., Liu, X., Bridges, P.G.: Inferring users’ online activities through traffic analysis. In: Proceedings of the Fourth ACM Conference on Wireless Network Security, pp. 59–70. WiSec 2011. ACM (2011)Google Scholar
  25. 25.
    Zhang, J., Xiang, Y., Wang, Y., Zhou, W., Xiang, Y., Guan, Y.: Network traffic classification using correlation information. IEEE Trans. Parallel Distrib. Syst. 24(1), 104–117 (2013)CrossRefGoogle Scholar
  26. 26.
    Zuckerberg, M.: Our commitment to the facebook community (2011). (accessed November 2014)

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Bell LaboratoriesDublinRepublic of Ireland

Personalised recommendations