Abstract
With the advent of Over-The-Top content providers (OTTs), Internet Service Providers (ISPs) saw their portfolio of services shrink to the low margin role of data transporters. In order to counter this effect, some ISPs started to follow big OTTs like Facebook and Google in trying to turn their data into a valuable asset. In this paper, we explore the questions of what meaningful information can be extracted from network data, and what interesting insights it can provide. To this end, we tackle the first challenge of detecting “user-URLs”, i.e., those links that were clicked by users as opposed to those objects automatically downloaded by browsers and applications. We devise algorithms to pinpoint such URLs, and validate them on manually collected ground truth traces. We then apply them on a three-day long traffic trace spanning more than 19,000 residential users that generated around 190 million HTTP transactions. We find that only 1.6% of these observed URLs were actually clicked by users. As a first application for our methods, we answer the question of which platforms participate most in promoting the Internet content. Surprisingly, we find that, despite its notoriety, only 11% of the user URL visits are coming from Google Search.
This work was supported by the European Commission under the FP7 IP Project “An Intelligent Measurement Plane for Future Network and Application Management” (mPlane).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
York, D.: What is an over-the-top (ott) application or service? (July 2012), http://goo.gl/vmxVT
Telecom italia vod, http://www.cubovision.it/
At&t cloud service, https://www.synaptic.att.com/clouduser/
Telefonica cloud service, http://www.telefonica.com/en/digital/html/digital_services/cloud.shtml
At&t joins verizon, facebook in selling customer data, http://goo.gl/FGDEp6
Kleinman, A.: Verizon selling customers’ cell phone data: Report, http://goo.gl/RVAEAV (November 2013)
Choi, H.-K., Limb, J.O.: A behavioral model of web traffic. In: IEEE ICNP, Toronto, CA (1999)
Barford, P., Crovella, M.: Generating representative web workloads for network and server performance evaluation. In: ACM SIGMETRICS, Madison, US-WI (1998)
Ihm, S., Pai, V.S.: Towards understanding modern web traffic. In: ACM IMC, Berlin, DE (2011)
Xie, G., Iliofotou, M., Karagiannis, T., Faloutsos, M., Jin, Y.: Resurf: Reconstructing web-surfing activity from network traffic. In: IFIP Networking Conference (2013)
Schneider, F., Ager, B., Maier, G., Feldmann, A., Uhlig, S.: Pitfalls in HTTP traffic measurements and analysis. In: Taft, N., Ricciato, F. (eds.) PAM 2012. LNCS, vol. 7192, pp. 242–251. Springer, Heidelberg (2012)
Finamore, A., Mellia, M., Meo, M., Munafò, M.M., Rossi, D.: Experiences of Internet traffic monitoring with Tstat. In: IEEE Network (2011)
Adblock Plus, http://easylist.adblockplus.org/ (July 2013)
Facebook OpenGraph, http://goo.gl/2y2VN
Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext transfer protocol–http/1.1, 1999. RFC2616 (2006)
Akkus, I.E., Chen, R., Hardt, M., Francis, P., Gehrke, J.: Non-tracking web analytics. In: ACM CCS, Raleigh, US-NC (2012)
Finamore, A., Gehlen, V., Mellia, M., Munafo, M., Nicolini, S.: The need for an intelligent measurement plane: The example of time-variant cdn policies. In: IEEE NETWORKS, Rome, IT (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Ben Houidi, Z., Scavo, G., Ghamri-Doudane, S., Finamore, A., Traverso, S., Mellia, M. (2014). Gold Mining in a River of Internet Content Traffic. In: Dainotti, A., Mahanti, A., Uhlig, S. (eds) Traffic Monitoring and Analysis. TMA 2014. Lecture Notes in Computer Science, vol 8406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54999-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-54999-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54998-4
Online ISBN: 978-3-642-54999-1
eBook Packages: Computer ScienceComputer Science (R0)