World Wide Web

, Volume 2, Issue 1–2, pp 29–45

Distributions of surfers' paths through the World Wide Web: Empirical characterizations

  • Peter L.T. Pirolli
  • James E. Pitkow


Surfing the World Wide Web (WWW) involves traversing hyperlink connections among documents. The ability to predict surfing patterns could solve many problems facing producers and consumers of WWW content. We analyzed WWW server logs for a WWW site, collected over ten days, to compare different path reconstruction methods and to investigate how past surfing behavior predicts future surfing choices. Since log files do not explicitly contain user paths, various methods have evolved to reconstruct user paths. Session times, number of clicks per visit, and Levenshtein Distance analyses were performed to show the impact of various reconstruction methods. Different methods for measuring surfing patterns were also compared. Markov model approximations were used to model the probability of users choosing links conditional on past surfing paths. Information‐theoretic (entropy) measurements suggest that information is gained by using longer paths to estimate the conditional probability of link choice given surf path. The improvements diminish, however, as one increases the length of path beyond one. Information‐theoretic (total divergence to the average entropy) measurements suggest that the conditional probabilities of link choice given surf path are more stable over time for shorter paths than longer paths. Direct examination of the accuracy of the conditional probability models in predicting test data also suggests that shorter paths yield more stable models and can be estimated reliably with less data than longer paths.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Arlitt, M. and C. Williamson (1996), "Web Server Workload Characterization: The Search for Invariants," In ACM SIGMETRICS Conference, Philadelphia, PA.Google Scholar
  2. Brin, S. and L. Page (1998), "The Anatomy of a Large-Scale Hypertextual Web Search Engine," World Wide Web 7.Google Scholar
  3. Catledge, L.D. and J.E. Pitkow (1995), "Characterizing Browsing Strategies in the World-Wide Web," Computer Networks and ISDN Systems 26, 6, 1065–1073.CrossRefGoogle Scholar
  4. Cunha, C. and C.F.B. Joccoud (1997), "Determining WWW User's Next Access and Its Application to Pre-Fetching," In Proceedings of the International Symposium on Computers and Communication, Alexandria, Egypt.Google Scholar
  5. Huberman, B.A. and L.A. Adamic (1998), Novelty and Social Search in the World Wide Web, Xerox PARC, Palo Alto, CA.Google Scholar
  6. Huberman, B.A., P. Pirolli, J. Pitkow, and R. Lukose (1998), "Strong Regularities in World Wide Web Surfing," Science 280, 95–97.Google Scholar
  7. Kantor, P.B. (1997), A Novel Approach to Information Finding in Networked Environments, Rutgers, Piscataway, NJ.Google Scholar
  8. Kleinberg, J. (1998), "Authoritative Sources in a Hyperlinked Environment," In Proc.9th ACM- SIAM Symposium on Discrete Algorithms.Google Scholar
  9. Levenshtein, V.I. (1966), "Binary Codes Capable of Correcting Deletions, Insertions and Reversals," Soviet Phys.Dokl.10, 8, 707–710.Google Scholar
  10. Manley, S., M. Courage, and M. Seltzer (1997), A Self-Scaling and Self-Configuring Benchmark for Web Servers, Harvard College, Boston, MA.Google Scholar
  11. Padmanabhan, V.N. and J.C. Mogul (1996), "Using Predictive Pre-Fetching to Improve World Wide Web Latency," Comput.Comm.Rev.26.Google Scholar
  12. Pirolli, P. and S.K. Card (in press), "Information Foraging," Psychol.Rev.Google Scholar
  13. Pirolli, P., J. Pitkow, and R. Rao (1996), "Silk From a Sow's Ear: Extracting Usable Structures From the Web," In Proc.of Conference on Human Factors in Computing Systems, CHI '96, Vancouver, Canada.Google Scholar
  14. Pitkow, J.E. (1997), "In Search of Reliable Usage Data on the WWW," In Proc.of The 6th International World Wide Web Conference, Santa Clara, CA.Google Scholar
  15. Pitkow, J.E. and C.M. Kehoe (1996), "GVU's 6th WWW User Survey," surveys.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Peter L.T. Pirolli
    • 1
  • James E. Pitkow
    • 1
  1. 1.Xerox Palo Alto Research CenterPalo AltoUSA

Personalised recommendations