Skip to main content
Log in

Distributions of surfers' paths through the World Wide Web: Empirical characterizations

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Surfing the World Wide Web (WWW) involves traversing hyperlink connections among documents. The ability to predict surfing patterns could solve many problems facing producers and consumers of WWW content. We analyzed WWW server logs for a WWW site, collected over ten days, to compare different path reconstruction methods and to investigate how past surfing behavior predicts future surfing choices. Since log files do not explicitly contain user paths, various methods have evolved to reconstruct user paths. Session times, number of clicks per visit, and Levenshtein Distance analyses were performed to show the impact of various reconstruction methods. Different methods for measuring surfing patterns were also compared. Markov model approximations were used to model the probability of users choosing links conditional on past surfing paths. Information‐theoretic (entropy) measurements suggest that information is gained by using longer paths to estimate the conditional probability of link choice given surf path. The improvements diminish, however, as one increases the length of path beyond one. Information‐theoretic (total divergence to the average entropy) measurements suggest that the conditional probabilities of link choice given surf path are more stable over time for shorter paths than longer paths. Direct examination of the accuracy of the conditional probability models in predicting test data also suggests that shorter paths yield more stable models and can be estimated reliably with less data than longer paths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arlitt, M. and C. Williamson (1996), "Web Server Workload Characterization: The Search for Invariants," In ACM SIGMETRICS Conference, Philadelphia, PA.

  • Brin, S. and L. Page (1998), "The Anatomy of a Large-Scale Hypertextual Web Search Engine," World Wide Web 7.

  • Catledge, L.D. and J.E. Pitkow (1995), "Characterizing Browsing Strategies in the World-Wide Web," Computer Networks and ISDN Systems 26, 6, 1065–1073.

    Article  Google Scholar 

  • Cunha, C. and C.F.B. Joccoud (1997), "Determining WWW User's Next Access and Its Application to Pre-Fetching," In Proceedings of the International Symposium on Computers and Communication, Alexandria, Egypt.

  • Huberman, B.A. and L.A. Adamic (1998), Novelty and Social Search in the World Wide Web, Xerox PARC, Palo Alto, CA.

    Google Scholar 

  • Huberman, B.A., P. Pirolli, J. Pitkow, and R. Lukose (1998), "Strong Regularities in World Wide Web Surfing," Science 280, 95–97.

  • Kantor, P.B. (1997), A Novel Approach to Information Finding in Networked Environments, Rutgers, Piscataway, NJ.

    Google Scholar 

  • Kleinberg, J. (1998), "Authoritative Sources in a Hyperlinked Environment," In Proc.9th ACM- SIAM Symposium on Discrete Algorithms.

  • Levenshtein, V.I. (1966), "Binary Codes Capable of Correcting Deletions, Insertions and Reversals," Soviet Phys.Dokl.10, 8, 707–710.

    Google Scholar 

  • Manley, S., M. Courage, and M. Seltzer (1997), A Self-Scaling and Self-Configuring Benchmark for Web Servers, Harvard College, Boston, MA.

    Google Scholar 

  • Padmanabhan, V.N. and J.C. Mogul (1996), "Using Predictive Pre-Fetching to Improve World Wide Web Latency," Comput.Comm.Rev.26.

  • Pirolli, P. and S.K. Card (in press), "Information Foraging," Psychol.Rev.

  • Pirolli, P., J. Pitkow, and R. Rao (1996), "Silk From a Sow's Ear: Extracting Usable Structures From the Web," In Proc.of Conference on Human Factors in Computing Systems, CHI '96, Vancouver, Canada.

  • Pitkow, J.E. (1997), "In Search of Reliable Usage Data on the WWW," In Proc.of The 6th International World Wide Web Conference, Santa Clara, CA.

  • Pitkow, J.E. and C.M. Kehoe (1996), "GVU's 6th WWW User Survey," http://www.gvu.gatech.edu/user surveys.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pirolli, P.L., Pitkow, J.E. Distributions of surfers' paths through the World Wide Web: Empirical characterizations. World Wide Web 2, 29–45 (1999). https://doi.org/10.1023/A:1019288403823

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1019288403823

Keywords

Navigation