Distributed Web Log Mining Using Maximal Large Itemsets

Sayal, Mehmet; Scheuermann, Peter

doi:10.1007/PL00011675

Distributed Web Log Mining Using Maximal Large Itemsets

Regular Paper
Published: November 2001

Volume 3, pages 389–404, (2001)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Mehmet Sayal¹ &
Peter Scheuermann²

82 Accesses
11 Citations
Explore all metrics

Abstract.

We introduce a partitioning-based distributed document-clustering algorithm using user access patterns from multi-server web sites. Our algorithm makes it possible to exploit simultaneously adaptive document replication and persistent connections, two techniques that are most effective in decreasing the response time that is observed by web users. The algorithm first distributes the user access data evenly among the servers by using a hash function. Then, each server generates a local clustering on its fair share of the user sessions records by employing a traditional single-machine document-clustering algorithm. Finally, those local clustering results are combined together by using a novel procedure that generates maximal large itemsets of web documents. We present preliminary experimental results and discuss alternative approaches to be pursued in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

Hewlett-Packard Labs, Palo Alto, California, USA, , , , , , US
Mehmet Sayal
Department of Electrical and Computer Engineering, Northwestern University, Evanston, Illinois, USA, , , , , , US
Peter Scheuermann

Authors

Mehmet Sayal
View author publications
You can also search for this author in PubMed Google Scholar
Peter Scheuermann
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Received 30 August 2000 / Revised 30 January 2001 / Accepted in revised form 9 May 2001

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sayal, M., Scheuermann, P. Distributed Web Log Mining Using Maximal Large Itemsets. Knowledge and Information Systems 3, 389–404 (2001). https://doi.org/10.1007/PL00011675

Download citation

Issue Date: November 2001
DOI: https://doi.org/10.1007/PL00011675

Keywords: Maximal large itemsets; User access patterns; Web document clustering

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed Web Log Mining Using Maximal Large Itemsets

Abstract.

Access this article

Similar content being viewed by others

A dockerized framework for hierarchical frequency-based document clustering on cloud computing infrastructures

Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce

Lopper: An Efficient Method for Online Log Pattern Mining Based on Hybrid Clustering Tree

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Distributed Web Log Mining Using Maximal Large Itemsets

Abstract.

Access this article

Similar content being viewed by others

A dockerized framework for hierarchical frequency-based document clustering on cloud computing infrastructures

Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce

Lopper: An Efficient Method for Online Log Pattern Mining Based on Hybrid Clustering Tree

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation