Skip to main content
Log in

Distributed Web Log Mining Using Maximal Large Itemsets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract.

We introduce a partitioning-based distributed document-clustering algorithm using user access patterns from multi-server web sites. Our algorithm makes it possible to exploit simultaneously adaptive document replication and persistent connections, two techniques that are most effective in decreasing the response time that is observed by web users. The algorithm first distributes the user access data evenly among the servers by using a hash function. Then, each server generates a local clustering on its fair share of the user sessions records by employing a traditional single-machine document-clustering algorithm. Finally, those local clustering results are combined together by using a novel procedure that generates maximal large itemsets of web documents. We present preliminary experimental results and discuss alternative approaches to be pursued in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Additional information

Received 30 August 2000 / Revised 30 January 2001 / Accepted in revised form 9 May 2001

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sayal, M., Scheuermann, P. Distributed Web Log Mining Using Maximal Large Itemsets. Knowledge and Information Systems 3, 389–404 (2001). https://doi.org/10.1007/PL00011675

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/PL00011675

Navigation