Mining Web Logs to Improve Web Caching and Prefetching

  • Qiang Yang
  • Henry Haining Zhang
  • Ian T. Y. Li
  • Ye Lu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2198)


Caching and prefetching are well known strategies for improving the performance of Internet systems. The heart of a caching system is its page replacement policy, which selects the pages to be replaced in a proxy cache when a request arrives. By the same token, the essence of a prefetching algorithm lies in its ability to accurately predict future request. In this paper, we present a method for caching variable-sized web objects using an n-gram based prediction of future web requests. Our method aims at mining a prediction model from the web logs for document access patterns and using the model to extend the well-known GDSF caching policy. In addition, we present a new method to integrate this caching algorithm with a prediction-based prefetching algorithm. We empirically show that the system performance is greatly improved using the integrated approach.


Web log mining web caching and prefetching 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [ALCJ99]
    M. Arlitt, R. Friedrich L. Cherkasova, J. Dilley, and T. Jin. Evaluating content management techniques for web proxy caches. In HP Technical report, Palo Alto, Apr. 1999.Google Scholar
  2. [AWY99]
    C. Aggarwal, J. L. Wolf, and P. S. Yu. Caching on the World Wide Web. In IEEE Transactions on Knowledge and Data Engineering, volume 11, pages 94–107, 1999.CrossRefGoogle Scholar
  3. [CFKL95]
    Pei Cao, Edward W. Felten, Anna R. Karlin and Kai Li. A Study of integrated Prefetching and Caching Strategies. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1995.Google Scholar
  4. [CD85]
    H. T. Chou and D. J. DeWitt. An evaluation of buffer management strategies for relational database systems. In Proceedings of the Eleventh International Conference on Very Large Databases, pages 127–141, August 1985.Google Scholar
  5. [CI97]
    P. Cao and S. Irani. Cost-aware www proxy caching algorithms. In USENIX Symposium on Internet Technologies and Systems, Monterey, CA, Dec. 1997.Google Scholar
  6. [CY97]
    E. Markatos and C. Chironaki. A Top Ten Approach for Prefetching the Web. In Proceedings of the INET’98 Internet Global Summit. July 1998Google Scholar
  7. [Duc99]
    Dan Duchamp. Prefetching Hyperlinks. In Proceedings of the Second USENIX Symposium on Internet Technologies and Systems (USITS’ 99), Bouder, CO Oct 1999.Google Scholar
  8. [JP99]
    Pitkow J. and Pirolli P. Mining longest repeating subsequences to predict www surfing. In Proceedings of the 1999 USENIX Annual Technical Conference, 1999.Google Scholar
  9. [KL96]
    T. M. Kroeger and D. D. E. Long. Predicting future file-system actions from prior events. In USENIX 96, San Diego, Calif., Jan. 1996.Google Scholar
  10. [Mar96]
    E. Markatos. Main memory caching of web documents. In Computer networks and ISDN Systems, volume 28, pages 893–905, 1996.CrossRefGoogle Scholar
  11. [MC98]
    K. Chinen and S. Yamaguchi. An Interactive Prefetching Proxy Server for Improvement of WWW Latency. In Proceedings of the Seventh Annual Conference of the Internet Society (INEt’97), Kuala Lumpur, June 1997.Google Scholar
  12. [OOW93]
    E. J. O’Neil, P. E. O’Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 297–306, May 1993.Google Scholar
  13. [SKS98]
    S. Schechter, M. Krishnan, and M.D. Smith. Using path profiles to predict http requests. In Proceedings of the Seventh International World Wide Web Conference Brisbane, Australia., 1998.Google Scholar
  14. [SYLZ00]
    Zhong Su, Qiang Yang, Ye Lu, and HongJiang Zhang. Whatnext: A prediction system for web requests using n-gram sequence models. In Proceedings of the First International Conference on Web Information Systems and Engineering Conference, pages 200–207, Hong Kong, June 2000.Google Scholar
  15. [SYZ00]
    Zhong Su, Qiang Yang, and HongJiang Zhang. A prediction system for multimedia pre-fetching on the internet. In ACM Muldimedia Conference 2000. ACM, October 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Qiang Yang
    • 1
  • Henry Haining Zhang
    • 1
  • Ian T. Y. Li
    • 1
  • Ye Lu
    • 1
  1. 1.School of Computing ScienceSimon Fraser UniversityBurnabyCanada

Personalised recommendations