Abstract
Caching and prefetching are well known strategies for improving the performance of Internet systems. The heart of a caching system is its page replacement policy, which selects the pages to be replaced in a proxy cache when a request arrives. By the same token, the essence of a prefetching algorithm lies in its ability to accurately predict future request. In this paper, we present a method for caching variable-sized web objects using an n-gram based prediction of future web requests. Our method aims at mining a prediction model from the web logs for document access patterns and using the model to extend the well-known GDSF caching policy. In addition, we present a new method to integrate this caching algorithm with a prediction-based prefetching algorithm. We empirically show that the system performance is greatly improved using the integrated approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Arlitt, R. Friedrich L. Cherkasova, J. Dilley, and T. Jin. Evaluating content management techniques for web proxy caches. In HP Technical report, Palo Alto, Apr. 1999.
C. Aggarwal, J. L. Wolf, and P. S. Yu. Caching on the World Wide Web. In IEEE Transactions on Knowledge and Data Engineering, volume 11, pages 94–107, 1999.
Pei Cao, Edward W. Felten, Anna R. Karlin and Kai Li. A Study of integrated Prefetching and Caching Strategies. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1995.
H. T. Chou and D. J. DeWitt. An evaluation of buffer management strategies for relational database systems. In Proceedings of the Eleventh International Conference on Very Large Databases, pages 127–141, August 1985.
P. Cao and S. Irani. Cost-aware www proxy caching algorithms. In USENIX Symposium on Internet Technologies and Systems, Monterey, CA, Dec. 1997.
E. Markatos and C. Chironaki. A Top Ten Approach for Prefetching the Web. In Proceedings of the INET’98 Internet Global Summit. July 1998
Dan Duchamp. Prefetching Hyperlinks. In Proceedings of the Second USENIX Symposium on Internet Technologies and Systems (USITS’ 99), Bouder, CO Oct 1999.
Pitkow J. and Pirolli P. Mining longest repeating subsequences to predict www surfing. In Proceedings of the 1999 USENIX Annual Technical Conference, 1999.
T. M. Kroeger and D. D. E. Long. Predicting future file-system actions from prior events. In USENIX 96, San Diego, Calif., Jan. 1996.
E. Markatos. Main memory caching of web documents. In Computer networks and ISDN Systems, volume 28, pages 893–905, 1996.
K. Chinen and S. Yamaguchi. An Interactive Prefetching Proxy Server for Improvement of WWW Latency. In Proceedings of the Seventh Annual Conference of the Internet Society (INEt’97), Kuala Lumpur, June 1997.
E. J. O’Neil, P. E. O’Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 297–306, May 1993.
S. Schechter, M. Krishnan, and M.D. Smith. Using path profiles to predict http requests. In Proceedings of the Seventh International World Wide Web Conference Brisbane, Australia., 1998.
Zhong Su, Qiang Yang, Ye Lu, and HongJiang Zhang. Whatnext: A prediction system for web requests using n-gram sequence models. In Proceedings of the First International Conference on Web Information Systems and Engineering Conference, pages 200–207, Hong Kong, June 2000.
Zhong Su, Qiang Yang, and HongJiang Zhang. A prediction system for multimedia pre-fetching on the internet. In ACM Muldimedia Conference 2000. ACM, October 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, Q., Zhang, H.H., Li, I.T.Y., Lu, Y. (2001). Mining Web Logs to Improve Web Caching and Prefetching. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds) Web Intelligence: Research and Development. WI 2001. Lecture Notes in Computer Science(), vol 2198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45490-X_62
Download citation
DOI: https://doi.org/10.1007/3-540-45490-X_62
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42730-8
Online ISBN: 978-3-540-45490-8
eBook Packages: Springer Book Archive