Analyzing Characteristic Host Access Patterns for Re-identification of Web User Sessions

Herrmann, Dominik; Gerber, Christoph; Banse, Christian; Federrath, Hannes

doi:10.1007/978-3-642-27937-9_10

Dominik Herrmann¹⁸,
Christoph Gerber¹⁸,
Christian Banse¹⁸ &
…
Hannes Federrath¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7127))

Included in the following conference series:

Nordic Conference on Secure IT Systems

887 Accesses
15 Citations

Abstract

An attacker, who is able to observe a web user over a long period of time, learns a lot about his interests. It may be difficult to track users with regularly changing IP addresses, though. We show how patterns mined from web traffic can be used to re-identify a majority of users, i. e. link multiple sessions of them. We implement the web user re-identification attack using a Multinomial Naïve Bayes classifier and evaluate it using a real-world dataset from 28 users. Our evaluation setup complies with the limited knowledge of an attacker on a malicious web proxy server, who is only able to observe the host names visited by its users. The results suggest that consecutive sessions can be linked with high probability for session durations from 5 minutes to 48 hours and that user profiles degrade only slowly over time. We also propose basic countermeasures and evaluate their efficacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adamic, L., Huberman, B.: Zipf’s Law and the Internet. Glottometrics 3(1), 143–150 (2002)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addision Wesley, New York (1999)
Google Scholar
Barbaro, M., Zeller, T.: A Face is Exposed for AOL Searcher No. 4417749. The New York Times, August 9 (2006)
Google Scholar
Breslau, L., Cue, P., Cao, P., Fan, L., Phillips, G., Shenker, S.: Web Caching and Zipf-like Distributions: Evidence and Implications. In: INFOCOM, pp. 126–134 (1999)
Google Scholar
Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: KDD 2008: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 70–78. ACM, New York (2008)
Google Scholar
Catledge, L.D., Pitkow, J.E.: Characterizing Browsing Behaviors on the World-Wide Web. Georgia Institute of Technology (1995)
Google Scholar
Coull, S.E., Collins, M.P., Wright, C.V., Monrose, F., Reiter, M.K.: On Web Browsing Privacy in Anonymized NetFlows. In: Proceedings of the 16th USENIX Security Symposium, Boston, MA (August 2007)
Google Scholar
Coull, S.E., Wright, C.V., Keromytisz, A.D., Monrose, F., Reiter, M.K.: Taming the devil: Techniques for evaluating anonymized network data. In: Proceedings of the 15th Network and Distributed Systems Security Symposium (2008)
Google Scholar
Coull, S.E., Wright, C.V., Monrose, F., Collins, M.P., Reiter, M.K.: Playing devil’s advocate: Inferring sensitive information from anonymized network traces. In: Proceedings of the Network and Distributed System Security Symposium, pp. 35–47 (2007)
Google Scholar
Crovella, M.E., Bestavros, A.: Self-similarity in World Wide Web traffic: evidence and possible causes. IEEE/ACM Trans. Netw. 5(6), 835–846 (1997)
Article Google Scholar
Eckersley, P.: How Unique Is Your Web Browser? Technical report, Electronig Frontier Foundation (2009)
Google Scholar
Erman, J., Mahanti, A., Arlitt, M.: Internet Traffic Identification using Machine Learning. In: Proceedings of IEEE Global Telecommunications Conference (GLOBECOM), San Francisco, CA, USA, pp. 1–6 (November 2006)
Google Scholar
Herrmann, D., Wendolsky, R., Federrath, H.: Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier. In: CCSW 2009: Proceedings of the 2009 ACM Workshop on Cloud Computing Security, pp. 31–42. ACM, New York (2009)
Chapter Google Scholar
Kellar, M., Watters, C., Shepherd, M.: A field study characterizing Web-based information-seeking tasks. Journal of the American Society for Information Science and Technology 58(7), 999–1018 (2007)
Article Google Scholar
Koukis, D., Antonatos, S., Anagnostakis, K.G.: On the Privacy Risks of Publishing Anonymized IP Network Traces. In: Leitold, H., Markatos, E.P. (eds.) CMS 2006. LNCS, vol. 4237, pp. 22–32. Springer, Heidelberg (2006)
Chapter Google Scholar
Kumpošt, M.: Data Preparation for User Profiling from Traffic Log. In: The International Conference on Emerging Security Information, Systems, and Technologies, pp. 89–94 (2007)
Google Scholar
Kumpošt, M.: Context Information and user profiling. PhD thesis, Faculty of Informatics, Masaryk University, Czech Republic (2009)
Google Scholar
Kumpošt, M., Matyáš, V.: User Profiling and Re-identification: Case of University-Wide Network Analysis. In: Fischer-Hübner, S., Lambrinoudakis, C., Pernul, G. (eds.) TrustBus 2009. LNCS, vol. 5695, pp. 1–10. Springer, Heidelberg (2009)
Chapter Google Scholar
Liberatore, M., Levine, B.N.: Inferring the Source of Encrypted HTTP Connections. In: CCS 2006: Proceedings of the 13th ACM Conference on Computer and Communications Security, pp. 255–263. ACM Press, New York (2006)
Google Scholar
Malin, B., Airoldi, E.: The Effects of Location Access Behavior on Re-identification Risk in a Distributed Environment. In: Privacy Enhancing Technologies, pp. 413–429 (2006)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. In: SIGMETRICS 2005: Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 50–60. ACM Press, New York (2005)
Google Scholar
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: IEEE Symposium on Security and Privacy, pp. 111–125 (2008)
Google Scholar
Obendorf, H., Weinreich, H., Herder, E., Mayer, M.: Web Page Revisitation Revisited: Implications of a Long-term Click-stream Study of Browser Usage. In: CHI 2007, pp. 597–606. ACM Press (May 2007)
Google Scholar
Ohm, P.: Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. In: Social Science Research Network Working Paper Series (August 2009)
Google Scholar
Olivier, M.S.: Distributed Proxies for Browsing Privacy: a Simulation of Flocks. In: SAICSIT ’05: Proceedings of the 2005 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on IT Research in Developing Countries, pp. 104–112. South African Institute for Computer Scientists and Information Technologists, Republic of South Africa (2005)
Google Scholar
Padmanabhan, B., Yang, Y.: Clickprints on the Web: Are there signatures in Web Browsing Data? Working Paper Series (October 2006)
Google Scholar
Pang, J., Greenstein, B., Gummadi, R., Seshan, S., Wetherall, D.: 802.11 user fingerprinting. In: MobiCom 2007: Proceedings of the 13th Annual ACM International Conference on Mobile Computing and Networking, pp. 99–110. ACM, New York (2007)
Google Scholar
Pang, R., Allman, M., Paxson, V., Lee, J.: The devil and packet trace anonymization. SIGCOMM Comput. Commun. Rev. 36(1), 29–38 (2006)
Article Google Scholar
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explor. Newsl. 1(2), 12–23 (2000)
Article Google Scholar
Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge Based Systems 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining. Practical Machine Learning Tools and Techniques. Elsevier, San Francisco (2005)
MATH Google Scholar
Wondracek, G., Holz, T., Kirda, E., Kruegel, C.: A Practical Attack to De-Anonymize Social Network Users, iseclab.org
Yang, Y.: Web user behavioral profiling for user identification. Decision Support Systems 49, 261–271 (2010)
Article Google Scholar
Yang, Y.C., Padmanabhan, B.: Toward user patterns for online security: Observation time and online user identification. Decision Support Systems 48, 548–558 (2008)
Article Google Scholar
Zipf, G.K.: The psycho-biology of language. An introduction to dynamic philology, 2nd edn. M.I.T. Press, Cambridge (1968)
Google Scholar
Zuev, D., Moore, A.W.: Traffic Classification using a Statistical Approach. In: Dovrolis, C. (ed.) PAM 2005. LNCS, vol. 3431, pp. 321–324. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Research Group Security in Distributed Systems, Department of Informatics, University of Hamburg, 22527, Hamburg, Germany
Dominik Herrmann, Christoph Gerber, Christian Banse & Hannes Federrath

Authors

Dominik Herrmann
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Gerber
View author publications
You can also search for this author in PubMed Google Scholar
Christian Banse
View author publications
You can also search for this author in PubMed Google Scholar
Hannes Federrath
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Science and Technology, Aalto University, Konemiehentie 2, 02150, Espoo, Finland
Tuomas Aura
Department of Information and Computer Science, Aalto University, Konemiehentie 2, 02150, Aalto, Finland
Kimmo Järvinen & Kaisa Nyberg &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Herrmann, D., Gerber, C., Banse, C., Federrath, H. (2012). Analyzing Characteristic Host Access Patterns for Re-identification of Web User Sessions. In: Aura, T., Järvinen, K., Nyberg, K. (eds) Information Security Technology for Applications. NordSec 2010. Lecture Notes in Computer Science, vol 7127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27937-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-27937-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27936-2
Online ISBN: 978-3-642-27937-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics