Prefetching Based on Web Usage Mining

  • Daby M. Sow
  • David P. Olshefski
  • Mandis Beigi
  • Guruduth Banavar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2672)


This paper introduces a new technique for prefetching web content by learning the access patterns of individual users. The prediction scheme for prefetching is based on a learning algorithm, called Fuzzy-LZ, which mines the history of user access and identifies patterns of recurring accesses. This algorithm is evaluated analytically via a metric called learnability and validated experimentally by correlating learnability with prediction accuracy. A web prefetching system that incorporates Fuzzy-LZ is described and evaluated. Our experiments demonstrate that Fuzzy-LZ prefetching provides a gain of 41.5 % in cache hit rate over pure caching. This gain is highest for those users who are neither highly predictable nor highly random, which turns out to be the vast majority of users in our workload. The overhead of our prefetching technique for a typical user is 2.4 prefetched pages per user request.


Access Pattern Proxy Server Learnability Measure Access Behavior Soft Link 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [1]
    A. Bestavros, Using Speculation to Reduce Server Load and Service Time on the WWW, Proceedings of the 4th ACM International Conference on Information and Knowledge Management, Baltimore, MD, 1995Google Scholar
  2. [2]
    E. Cohen, B. Krishnamurthy and J. Rexford, Efficient Algorithms for Predicting Requests to Web Servers, INFOCOM (1): 284–293, 1999Google Scholar
  3. [3]
    E. Cohen and H. Kaplan, Prefetching the Means for Document Transfer: A New Approach for Reducing Web Latency, INFOCOM (2): 854–863, 2000Google Scholar
  4. [4]
    T. Cover and J. Thomas, Elements of Information Theory, J. Wiley & Son, 1991Google Scholar
  5. [5]
    S. Das and D. Cook and A. Bhattacharya and E. Heierman III and T.-Y. Lin, The Role of Prediction Algorithms in the MavHome Smart Home Architecture, to appear in IEEE Personal Communications Special Issue on Smart HomesGoogle Scholar
  6. [6]
    B. Davison, Predicting Web Actions from HTML Content, Proceedings of the The Thirteenth ACM Conference on Hypertext and Hypermedia (HT’02) 2002Google Scholar
  7. [7]
    D. Duchamps, Prefetching Hyperlinks, Proceedings of USITS’99: The 2nd USENIX Symposium on Internet Technologies and Systems, October 1999Google Scholar
  8. [8]
    M. Feider, N. Merhav and M. Gutman, Universal Prediction of Individual Sequences, IEEE Transactions on Information Theory (38): 1258–1270, 1992CrossRefGoogle Scholar
  9. [9]
    Z. Jiang and L. Kleinrock, An adaptive network prefetch scheme, IEEE Journal on Selected Areas in Communications, 16(3): 358–368, April 1998CrossRefGoogle Scholar
  10. [10]
    B. Krishnamurthy and J. Rexford, Web Protocols and Practice, Addison Wesley, 2001Google Scholar
  11. [11]
    T. M. Kroeger, D. D. E. Long and J. C. Mogul, Exploring the Bounds of Web Latency Reduction from Caching and Prefetching, USENIX Symposium on Internet Technologies and Systems (1997)Google Scholar
  12. [12]
    A. Lempel and J. Ziv, On the Complexity of Finite Sequences, IEEE Transactions on Information Theory (22): 75–81, 1976zbMATHCrossRefMathSciNetGoogle Scholar
  13. [13]
    M. Li and P. Vitanyi, An Introduction to Kolmogorov Complexity and its Apllications, second edition, Springer Verlag (1997)Google Scholar
  14. [14]
    Logging Control In W3C httpd, “
  15. [15]
    J. Kerkhofs and K. Vanhoof and D. Pannemans, Web Usage Mining on Proxy Servers: A Case Study, Workshop on Data Mining For Marketing Applications, September 2001Google Scholar
  16. [16]
    T. Luczak and W. Szpankowski, A Suboptimal Lossy Data Compression Based on Approximate Pattern Matching, IEEE Trans. Inf. Theory, (43): 1439–1451, 1997zbMATHCrossRefMathSciNetGoogle Scholar
  17. [17]
    T.M. Mitchell, Machine Learning, Mc Graw-Hill (1997)Google Scholar
  18. [18]
    PeakJet2000 Software, “
  19. [19]
    D. Olshefski, J. Neih and D. Agrawal, Inferring Client Response Time at the Web Server, SIGMETRICS Proceedings 160–171, Marina Del Rey, CA, June 2002Google Scholar
  20. [20]
    T. Palpanas and A. Mendelzon, Web Prefetching Using Partial Match Prediction, Proceedings of the 4th International Web Caching Workshop, 1998Google Scholar
  21. [21]
    J. Rissanen, World Scientific, Stochastic Complexity in Statistical Inquiry, 1989Google Scholar
  22. [22]
    L. Rizzo, Dummynet: a simple approach to the evaluation of network protocols, ACM SIGCOMM Computer Communication Review, 27(1):31–41, January 1997CrossRefGoogle Scholar
  23. [23]
    J. S. Vitter and P. Krishnan, Optimal prefetching via data compression, Journal of the ACM 435) 771–793, 1996zbMATHCrossRefMathSciNetGoogle Scholar
  24. [24]
    J. Srivastava, R. Cooley, M. Deshpande and P. Tan, Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations 1(2):12–23, 2000CrossRefGoogle Scholar
  25. [25]
    J. Van Leeuwen, (ed.): Computer Science Today. Recent Trends and Developments. Lecture Notes in Computer Science, Vol. 1000. Springer-Verlag, Berlin Heidelberg New York (1995)zbMATHGoogle Scholar
  26. [26]
    P. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. on Info. Theory (46):446–464, 2000zbMATHCrossRefMathSciNetGoogle Scholar
  27. [27]
    The Web Collector, “
  28. [28]
  29. [29]
    Q. Yang, H.H. Zhang and I.T.Y. Li, Mining web logs for prediction models in WWW caching and prefetching, in Knowledge Discovery and Data Mining, 473–478, 2001Google Scholar
  30. [30]
    O. Zaiane, M. Xin and J. Han, Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs, IEEE Advances in Digital Libraries Conference, 1998Google Scholar
  31. [31]
    J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression, IEEE Transactions on Information Theory, (23):337–343, 1977zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2003

Authors and Affiliations

  • Daby M. Sow
    • 1
  • David P. Olshefski
    • 1
  • Mandis Beigi
    • 1
  • Guruduth Banavar
    • 1
  1. 1.IBM T. J. Watson Research CenterHawthorne

Personalised recommendations