Advertisement

Exploiting Web Log Mining for Web Cache Enhancement

  • Alexandros Nanopoulos
  • Dimitrios Katsaros
  • Yannis Manolopoulos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2356)

Abstract

Improving the performance of the Web is a crucial requirement, since its popularity resulted in a large increase in the user perceived latency. In this paper, we describe a Web caching scheme that capitalizes on prefetching. Prefetching refers to the mechanism of deducing forthcoming page accesses of a client, based on access log information. Web log mining methods are exploited to provide effective prediction of Web-user accesses. The proposed scheme achieves a coordination between the two techniques (i.e., caching and prefetching). The prefetched documents are accommodated in a dedicated part of the cache, to avoid the drawback of incorrect replacement of requested documents. The requirements of the Web are taken into account, compared to the existing schemes for buffer management in database and operating systems. Experimental results indicate the superiority of the proposed method compared to the previous ones, in terms of improvement in cache performance.

Keywords

Prediction Web Log Mining Web Caching Prefetching Association rules 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Abrams, C.R. Standridge, G. Abdulla, E.A. Fox, and S. Williams. Removal policies in network caches for World-Wide Web documents. In Proceedings of the ACM Conference on Applications, Technologies, Architectures and Protocols for Computer Communication (ACM SIGCOMM’96), pages 293–305, 1996.Google Scholar
  2. 2.
    C. Aggarwal, J. Wolf, and P.S. Yu. Caching on the World Wide Web. IEEE Transactions on Knowledge and Data Engineering, 11(1):95–107, 1999.CrossRefGoogle Scholar
  3. 3.
    R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20 th Conference on Very Large Data Bases (VLDB’94), pages 487–499, 1994.Google Scholar
  4. 4.
    R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE Conference on Data Engineering (ICDE’95), pages 3–14, 1995.Google Scholar
  5. 5.
    V. Almeida, A. Bestavros, M. Crovella, and A. de Oliveira. Characterizing reference locality in the WWW. In Proceedings of the IEEE Conference on Parallel and Distributed Information Systems (IEEE PDIS’96), pages 92–103, 1996.Google Scholar
  6. 6.
    M. Arlitt, L. Cherkasova, J. Dilley, R. Friedrich, and T. Jin. Evaluating content management techniques for Web proxy caches. ACM SIGMETRICS Performance Evaluation Review, 27(4):3–11, 2000.CrossRefGoogle Scholar
  7. 7.
    P. Barford and M. Crovella. Generating representative Web workloads for network and server performance evaluation. In Proceedings of the ACM Conference on Measurement and Modeling of Computer Systems, (ACM SIGMETRICS’98), pages 151–160, 1998.Google Scholar
  8. 8.
    B. Berendt and M. Spiliopoulou. Analysis of navigation behavior in Web sites integrating multiple information systems. The VLDB Journal, 9(1):56–75, 2000.CrossRefGoogle Scholar
  9. 9.
    A. Bestavros. Speculative data dissemination and service to reduce server load, network traffic and service time. In Proceedings of the IEEE Conference on Data Engineering (ICDE’96), pages 180–189, 1996.Google Scholar
  10. 10.
    J. Borges and M. Levene. Data mining of user navigation patterns. In Proceedings of the Workshop on Web Usage Analysis and User Profiling (WEBKDD’99), pages 92–111, 1999.Google Scholar
  11. 11.
    S. Brin and L. Page. The anatomy of large-scale hypertextual Web search engine. In Proceedings of the World Wide Web Conference (WWW’98), pages 107–117, 1998.Google Scholar
  12. 12.
    P. Cao and S. Irani. Cost-aware WWW proxy caching algorithms. In Proceedings USENIX Symposium on Internet Technology and Systems (USITS’97), pages 193–206, 1997.Google Scholar
  13. 13.
    P. Cao, J. Zhang, and K. Beach. Active Cache: Caching dynamic contents on the Web. In Proceedings of the IFIP Conference on Distributed Systems Platforms and Open Distributed Processing (Middleware’98), pages 373–388, 1998.Google Scholar
  14. 14.
    M.S. Chen, J.S. Park, and P.S. Yu. Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Engineering, 10(2):209–221, 1998.CrossRefGoogle Scholar
  15. 15.
    K. Chinen and S. Yamaguchi. An interactive prefetching proxy server for improvement of WWW latency. In Proceedings of the INET Conference, 1997.Google Scholar
  16. 16.
    E. Cohen, B. Krishnamurthy, and J. Rexford. Improving end-to-end performance of the Web using server volumes and proxy filters. In Proceedings of the ACM Conference on Applications, Technologies, Architectures and Protocols for Computer Communication (ACM SIGCOMM’98), pages 241–253, 1998.Google Scholar
  17. 17.
    R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining World Wide Web browsing patterns. Knowledge and Information Systems, 1(1):5–32, 1999.Google Scholar
  18. 18.
    K.M. Curewitz, P. Krishnan, and J.S. Vitter. Practical prefetching via data compression. In Proceedings of the ACM Conference on Management of Data (ACM SIGMOD’93), pages 257–266, 1993.Google Scholar
  19. 19.
    J. Dean and M. Henzinger. Finding related pages in the World Wide Web. In Proceedings of the World Wide Web Conference (WWW’99), pages 1467–1479, 1999.Google Scholar
  20. 20.
    M. Deshpande and G. Karypis. Selective Markov models for predicting Web page accesses. In Proceedings of the SIAM Conference on Data Mining (SDM’01), 2001.Google Scholar
  21. 21.
    D. Duchamp. Prefetching hyperlinks. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS’99), 1999.Google Scholar
  22. 22.
    L. Fan, P. Cao, W. Lin, and Q. Jacobson. Web prefetching between low-bandwidth clients and proxies: Potential and performance. In Proceedings of the ACM Conference on Measurement and Modeling of Computer Systems (ACM SIGMET-RICS’99), pages 178–187, 1999.Google Scholar
  23. 23.
    W. Gaul and L. Schmidt-Thieme. Mining Web navigation path fragments. In Proceedings of the Workshop on Web Usage Analysis and User Profiling (WE-BKDD’00), 2000.Google Scholar
  24. 24.
    S. Hosseini-Khayat. On optimal replacement of nonuniform cache objects. IEEE Transactions on Computers, 49(8):769–778, 2000.CrossRefGoogle Scholar
  25. 25.
    H.S. Jeon and S.H. Noh. A database disk buffer management algorithm based on prefetching. In Proceedings of the ACM Conference in Information and Knowledge Management (ACM CIKM’98), pages 167–174, 1998.Google Scholar
  26. 26.
    T. Johnson and D. Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. In Proceedings of the 20 th Conference on Very Large Data Bases (VLDB’94), pages 439–450, 1994.Google Scholar
  27. 27.
    R. Klemm. WebCompanion: A friendly client-side Web prefetching agent. IEEE Transactions on Knowledge and Data Engineering, 11(4):577–594, 1999.CrossRefMathSciNetGoogle Scholar
  28. 28.
    T. Kroeger, D.E. Long, and J. Mogul. Exploring the bounds of Web latency reduction from caching and prefetching. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS’97), pages 13–22, 1997.Google Scholar
  29. 29.
    B. Lan, S. Bressan, B.S. Ooi, and Y. Tay. Making Web servers pushier. In Proceedings of the Workshop on Web Usage Analysis and User Profiling (WEBKDD’99), 1999.Google Scholar
  30. 30.
    H. Mannila, H. Toivonen, and A.I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, 1997.CrossRefGoogle Scholar
  31. 31.
    A. Nanopoulos, D. Katsaros, and Y. Manolopoulos. A data mining algorithm for generalized Web prefetching. IEEE Transactions on Knowledge and Data Engineering, 2002. to appear.Google Scholar
  32. 32.
    A. Nanopoulos and Y. Manolopoulos. Finding generalized path patterns for Web log data mining. In Proceedings of the East-European Conference on Advances in Databases and Information Systems (ADBIS-DASFA’ 00), pages 215–228, 2000.Google Scholar
  33. 33.
    A. Nanopoulos and Y. Manolopoulos. Mining patterns from graph traversals. Data and Knowledge Engineering (DKE), 37(3):243–266, 2001.zbMATHCrossRefGoogle Scholar
  34. 34.
    V. Padmanabhan and J. Mogul. Using predictive prefetching to improve World Wide Web latency. ACM SIGCOMM Computer Communications Review, 26(3), 1996.Google Scholar
  35. 35.
    T. Palpanas and A. Mendelzon. Web prefetching using partial match prediction. In Proceedings of the 4 th Web Caching Workshop, 1999.Google Scholar
  36. 36.
    H. Patterson, G. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed prefetching and caching. In Proceedings of the ACM Symposium on Operating Systems Principles (ACM SOSP’95), pages 79–95, 1995.Google Scholar
  37. 37.
    J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from Web logs. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00), 2000.Google Scholar
  38. 38.
    P. Pirolli, H. Pitkow, and R. Rao. Silk from a sow’s ear: Extracting usable structures from the Web. In Proceedings of the ACM Conference on Human Factors and Computing Systems (ACM CHI’ 96), pages 118–125, 1996.Google Scholar
  39. 39.
    J. Pitkow and P. Pirolli. Mining longest repeating subsequences to predict World Wide Web surfing. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS’99), 1999.Google Scholar
  40. 40.
    R. Sarukkai. Link prediction and path analysis using Markov chains. Computer Networks, 33(1–6):377–386, 2000.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Alexandros Nanopoulos
    • 1
  • Dimitrios Katsaros
    • 1
  • Yannis Manolopoulos
    • 1
  1. 1.Department of InformaticsAristotle UniversityThessalonikiGreece

Personalised recommendations