Advertisement

Modeling Traffic on the Web Graph

  • Mark R. Meiss
  • Bruno Gonçalves
  • José J. Ramasco
  • Alessandro Flammini
  • Filippo Menczer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6516)

Abstract

Analysis of aggregate and individual Web requests shows that PageRank is a poor predictor of traffic. We use empirical data to characterize properties of Web traffic not reproduced by Markovian models, including both aggregate statistics such as page and link traffic, and individual statistics such as entropy and session size. As no current model reconciles all of these observations, we present an agent-based model that explains them through realistic browsing behaviors: (1) revisiting bookmarked pages; (2) backtracking; and (3) seeking out novel pages of topical interest. The resulting model can reproduce the behaviors we observe in empirical data, especially heterogeneous session lengths, reconciling the narrowly focused browsing patterns of individual users with the extreme variance in aggregate traffic measurements. We can thereby identify a few salient features that are necessary and sufficient to interpret Web traffic data. Beyond the descriptive and explanatory power of our model, these results may lead to improvements in Web applications such as search and crawling.

Keywords

Session Tree User Interest Page Content Topical Locality Relevant Page 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adar, E., Teevan, J., Dumais, S.: Large scale analysis of web revisitation patterns. In: Proc. CHI (2008)Google Scholar
  2. 2.
    Adar, E., Teevan, J., Dumais, S.: Resonance on the web: Web dynamics and revisitation patterns. In: Proc. CHI (2009)Google Scholar
  3. 3.
    Gonçalves, B., Meiss, M.R., Ramasco, J.J., Flammini, A., Menczer, F.: Remembering what we like: Toward an agent-based model of Web traffic. Late Breaking Results WSDM (2009)Google Scholar
  4. 4.
    Beauvisage, T.: The dynamics of personal territories on the web. In: Proc. HT (2009)Google Scholar
  5. 5.
    Bouklit, M., Mathieu, F.: BackRank: an alternative for PageRank? In: Proc. WWW Special Interest Tracks and Posters, pp. 1122–1123 (2005)Google Scholar
  6. 6.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks 30(1-7), 107–117 (1998)Google Scholar
  7. 7.
    Broder, A., Kumar, S., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the Web. Computer Networks 33(1-6), 309–320 (2000)CrossRefGoogle Scholar
  8. 8.
    Chierichetti, F., Kumar, R., Tomkins, A.: Stochastic models for tabbed browsing. In: Proc. WWW, pp. 241–250 (2010)Google Scholar
  9. 9.
    Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through URL ordering. Computer Networks 30(1-7), 161–172 (1998)Google Scholar
  10. 10.
    Cockburn, A., McKenzie, B.: What do web users do? an empirical analysis of web use. Int. J. of Human-Computer Studies 54(6), 903–922 (2001)CrossRefzbMATHGoogle Scholar
  11. 11.
    Davison, B.: Topical locality in the Web. In: Proc. 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 272–279 (2000)Google Scholar
  12. 12.
    Douglis, F.: What’s your PageRank? IEEE Internet Computing 11(4), 3–4 (2007)CrossRefGoogle Scholar
  13. 13.
    Fortunato, S., Boguna, M., Flammini, A., Menczer, F.: Approximating PageRank from in-degree. In: Aiello, W., Broder, A., Janssen, J., Milios, E.E. (eds.) WAW 2006. LNCS, vol. 4936, pp. 59–71. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Fortunato, S., Flammini, A., Menczer, F., Vespignani, A.: Topical interests and the mitigation of search engine bias. Proc. Natl. Acad. Sci. USA 103(34), 12684–12689 (2006)CrossRefGoogle Scholar
  15. 15.
    Gonçalves, B., Ramasco, J.J.: Human dynamics revealed through web analytics. Phys. Rev. E 78, 026123 (2008)CrossRefGoogle Scholar
  16. 16.
    Huberman, B., Pirolli, P., Pitkow, J., Lukose, R.: Strong regularities in World Wide Web surfing. Science 280(5360), 95–97 (1998)CrossRefGoogle Scholar
  17. 17.
    Catledge, L.D., Pitkow, J.E.: Characterizing browsing strategies in the World-Wide-Web. Computer Networks and ISDN Systems 27, 1065–1073 (1995)CrossRefGoogle Scholar
  18. 18.
    Liu, Y., Gao, B., Liu, T.Y., Zhang, Y., Ma, Z., He, S., Li, H.: BrowseRank: letting Web users vote for page importance. In: Proc. SIGIR, pp. 451–458 (2008)Google Scholar
  19. 19.
    Mathieu, F., Bouklit, M.: The effect of the back button in a random walk: application for PageRank. In: Proc. WWW Alternate Track Papers & Posters, pp. 370–371 (2004)Google Scholar
  20. 20.
    Meiss, M., Duncan, J., Gonçalves, B., Ramasco, J.J., Menczer, F.: What’s in a session: tracking individual behavior on the Web. In: Proc. HT (2009)Google Scholar
  21. 21.
    Meiss, M., Gonçalves, B., Ramasco, J.J., Flammini, A., Menczer, F.: Agents, bookmarks and clicks: A topical model of Web navigation. In: Proc. HT (2010)Google Scholar
  22. 22.
    Meiss, M., Menczer, F., Fortunato, S., Flammini, A., Vespignani, A.: Ranking web sites with real user traffic. In: Proc. WSDM, pp. 65–75 (2008)Google Scholar
  23. 23.
    Menczer, F.: Mapping the semantics of web text and links. IEEE Internet Computing 9(3), 27–36 (2005)CrossRefGoogle Scholar
  24. 24.
    Menczer, F., Pant, G., Srinivasan, P.: Topical web crawlers: Evaluating adaptive algorithms. ACM Transactions on Internet Technology 4(4), 378–419 (2004)CrossRefGoogle Scholar
  25. 25.
    Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Structures and Algorithms 6(2-3), 161–180 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Noh, J.D., Rieger, H.: Random walks on complex networks. Phys. Rev. Lett. 92, 118701 (2004)CrossRefGoogle Scholar
  27. 27.
    Qiu, F., Liu, Z., Cho, J.: Analysis of user web traffic with a focus on search activities. In: Proc. 8th International Workshop on the Web and Databases (WebDB), pp. 103–108 (2005)Google Scholar
  28. 28.
    Radlinski, F., Joachims, T.: Active exploration for learning rankings from clickthrough data. In: Proc. KDD (2007)Google Scholar
  29. 29.
    Fortunato, S., Flammini, A., Menczer, F.: Scale-free network growth by ranking. Phys. Rev. Lett. 96, 218701 (2006)CrossRefGoogle Scholar
  30. 30.
    Tauscher, L., Greenberg, S.: How people revisit web pages: Empirical findings and implications for the design of history systems. Int. J. of Human-Computer Studies 47(1), 97–137 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Mark R. Meiss
    • 1
    • 3
  • Bruno Gonçalves
    • 1
    • 2
    • 3
  • José J. Ramasco
    • 4
  • Alessandro Flammini
    • 1
    • 2
  • Filippo Menczer
    • 1
    • 2
    • 3
    • 4
  1. 1.School of Informatics and ComputingIndiana UniversityBloomingtonUSA
  2. 2.Center for Complex Networks and Systems ResearchIndiana UniversityBloomingtonUSA
  3. 3.Pervasive Technology InstituteIndiana UniversityBloomingtonUSA
  4. 4.Complex Networks and Systems Lagrange LaboratoryISI FoundationTurinItaly

Personalised recommendations