Gold Mining in a River of Internet Content Traffic

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8406)


With the advent of Over-The-Top content providers (OTTs), Internet Service Providers (ISPs) saw their portfolio of services shrink to the low margin role of data transporters. In order to counter this effect, some ISPs started to follow big OTTs like Facebook and Google in trying to turn their data into a valuable asset. In this paper, we explore the questions of what meaningful information can be extracted from network data, and what interesting insights it can provide. To this end, we tackle the first challenge of detecting “user-URLs”, i.e., those links that were clicked by users as opposed to those objects automatically downloaded by browsers and applications. We devise algorithms to pinpoint such URLs, and validate them on manually collected ground truth traces. We then apply them on a three-day long traffic trace spanning more than 19,000 residential users that generated around 190 million HTTP transactions. We find that only 1.6% of these observed URLs were actually clicked by users. As a first application for our methods, we answer the question of which platforms participate most in promoting the Internet content. Surprisingly, we find that, despite its notoriety, only 11% of the user URL visits are coming from Google Search.


Recommendation System Cloud Service Gold Mining Internet Service Provider Internet Content 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    York, D.: What is an over-the-top (ott) application or service? (July 2012),
  2. 2.
    Telecom italia vod,
  3. 3.
  4. 4.
  5. 5.
    At&t joins verizon, facebook in selling customer data,
  6. 6.
    Kleinman, A.: Verizon selling customers’ cell phone data: Report, (November 2013)
  7. 7.
    Choi, H.-K., Limb, J.O.: A behavioral model of web traffic. In: IEEE ICNP, Toronto, CA (1999)Google Scholar
  8. 8.
    Barford, P., Crovella, M.: Generating representative web workloads for network and server performance evaluation. In: ACM SIGMETRICS, Madison, US-WI (1998)Google Scholar
  9. 9.
    Ihm, S., Pai, V.S.: Towards understanding modern web traffic. In: ACM IMC, Berlin, DE (2011)Google Scholar
  10. 10.
    Xie, G., Iliofotou, M., Karagiannis, T., Faloutsos, M., Jin, Y.: Resurf: Reconstructing web-surfing activity from network traffic. In: IFIP Networking Conference (2013)Google Scholar
  11. 11.
    Schneider, F., Ager, B., Maier, G., Feldmann, A., Uhlig, S.: Pitfalls in HTTP traffic measurements and analysis. In: Taft, N., Ricciato, F. (eds.) PAM 2012. LNCS, vol. 7192, pp. 242–251. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Finamore, A., Mellia, M., Meo, M., Munafò, M.M., Rossi, D.: Experiences of Internet traffic monitoring with Tstat. In: IEEE Network (2011)Google Scholar
  13. 13.
    Adblock Plus, (July 2013)
  14. 14.
    Facebook OpenGraph,
  15. 15.
    Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext transfer protocol–http/1.1, 1999. RFC2616 (2006)Google Scholar
  16. 16.
    Akkus, I.E., Chen, R., Hardt, M., Francis, P., Gehrke, J.: Non-tracking web analytics. In: ACM CCS, Raleigh, US-NC (2012)Google Scholar
  17. 17.
    Finamore, A., Gehlen, V., Mellia, M., Munafo, M., Nicolini, S.: The need for an intelligent measurement plane: The example of time-variant cdn policies. In: IEEE NETWORKS, Rome, IT (2012)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2014

Authors and Affiliations

  1. 1.Alcatel-Lucent Bell LabsFrance
  2. 2.Politecnico di TorinoItaly

Personalised recommendations