Abstract
Surfing the World Wide Web (WWW) involves traversing hyperlink connections among documents. The ability to predict surfing patterns could solve many problems facing producers and consumers of WWW content. We analyzed WWW server logs for a WWW site, collected over ten days, to compare different path reconstruction methods and to investigate how past surfing behavior predicts future surfing choices. Since log files do not explicitly contain user paths, various methods have evolved to reconstruct user paths. Session times, number of clicks per visit, and Levenshtein Distance analyses were performed to show the impact of various reconstruction methods. Different methods for measuring surfing patterns were also compared. Markov model approximations were used to model the probability of users choosing links conditional on past surfing paths. Information‐theoretic (entropy) measurements suggest that information is gained by using longer paths to estimate the conditional probability of link choice given surf path. The improvements diminish, however, as one increases the length of path beyond one. Information‐theoretic (total divergence to the average entropy) measurements suggest that the conditional probabilities of link choice given surf path are more stable over time for shorter paths than longer paths. Direct examination of the accuracy of the conditional probability models in predicting test data also suggests that shorter paths yield more stable models and can be estimated reliably with less data than longer paths.
Similar content being viewed by others
References
Arlitt, M. and C. Williamson (1996), "Web Server Workload Characterization: The Search for Invariants," In ACM SIGMETRICS Conference, Philadelphia, PA.
Brin, S. and L. Page (1998), "The Anatomy of a Large-Scale Hypertextual Web Search Engine," World Wide Web 7.
Catledge, L.D. and J.E. Pitkow (1995), "Characterizing Browsing Strategies in the World-Wide Web," Computer Networks and ISDN Systems 26, 6, 1065–1073.
Cunha, C. and C.F.B. Joccoud (1997), "Determining WWW User's Next Access and Its Application to Pre-Fetching," In Proceedings of the International Symposium on Computers and Communication, Alexandria, Egypt.
Huberman, B.A. and L.A. Adamic (1998), Novelty and Social Search in the World Wide Web, Xerox PARC, Palo Alto, CA.
Huberman, B.A., P. Pirolli, J. Pitkow, and R. Lukose (1998), "Strong Regularities in World Wide Web Surfing," Science 280, 95–97.
Kantor, P.B. (1997), A Novel Approach to Information Finding in Networked Environments, Rutgers, Piscataway, NJ.
Kleinberg, J. (1998), "Authoritative Sources in a Hyperlinked Environment," In Proc.9th ACM- SIAM Symposium on Discrete Algorithms.
Levenshtein, V.I. (1966), "Binary Codes Capable of Correcting Deletions, Insertions and Reversals," Soviet Phys.Dokl.10, 8, 707–710.
Manley, S., M. Courage, and M. Seltzer (1997), A Self-Scaling and Self-Configuring Benchmark for Web Servers, Harvard College, Boston, MA.
Padmanabhan, V.N. and J.C. Mogul (1996), "Using Predictive Pre-Fetching to Improve World Wide Web Latency," Comput.Comm.Rev.26.
Pirolli, P. and S.K. Card (in press), "Information Foraging," Psychol.Rev.
Pirolli, P., J. Pitkow, and R. Rao (1996), "Silk From a Sow's Ear: Extracting Usable Structures From the Web," In Proc.of Conference on Human Factors in Computing Systems, CHI '96, Vancouver, Canada.
Pitkow, J.E. (1997), "In Search of Reliable Usage Data on the WWW," In Proc.of The 6th International World Wide Web Conference, Santa Clara, CA.
Pitkow, J.E. and C.M. Kehoe (1996), "GVU's 6th WWW User Survey," http://www.gvu.gatech.edu/user surveys.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Pirolli, P.L., Pitkow, J.E. Distributions of surfers' paths through the World Wide Web: Empirical characterizations. World Wide Web 2, 29–45 (1999). https://doi.org/10.1023/A:1019288403823
Issue Date:
DOI: https://doi.org/10.1023/A:1019288403823