Web Usage Mining: Sequential Pattern Extraction with a Very Low Support

Masseglia, F.; Tanasa, D.; Trousse, B.

doi:10.1007/978-3-540-24655-8_56

F. Masseglia¹⁶,
D. Tanasa¹⁶ &
B. Trousse¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3007))

Included in the following conference series:

Asia-Pacific Web Conference

543 Accesses
7 Citations

Abstract

The goal of this work is to increase the relevance and the interestingness of patterns discovered by a Web Usage Mining process. Indeed, the sequential patterns extracted on web log files, unless they are found under constraints, often lack interest because of their obvious content. Our goal is to discover minority users’ behaviors having a coherence which we want to be aware of (like hacking activities on the Web site or a users’ activity limited to a specific part of the Web site). By means of a clustering method on the extracted sequential patterns, we propose a recursive division of the problem. The developed clustering method is based on patterns summaries and neural networks. Our experiments show that we obtain the targeted patterns whereas their extraction by means of a classical process is impossible because of a very weak support (down to 0.006%). The diversity of users’ behaviors is so large that the minority ones are both numerous and difficult to locate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, May 1993, pp. 207–216 (1993)
Google Scholar
Benedek, A., Trousse, B.: Adaptation of Self-Organizing Maps for CBR case indexing. In: 27th Annual Conference of the Gesellschaft fur Klassifikation, Cottbus, Germany (March 2003)
Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems 1(1), 5–32 (1999)
Google Scholar
Fayad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)
Google Scholar
Giacometti, A.: Modèles hybrides de l’expertise, novembre, PhD Thesis (in french), ENST Paris (1992)
Google Scholar
Jaczynski, M.: Modèle et plate-forme à objets pour l’indexation des cas par situation comportementales: application à l’assistance à la navigation sur le web, décembre, PhD thesis (in french), Université de Nice Sophia-Antipolis (1998)
Google Scholar
Malek, M.: Un modèle hybride de mémoire pour le raisonnement à partir de cas, octobre, PhD thesis (in french), Université Joseph Fourrier (1996)
Google Scholar
Masseglia, F., Cathala, F., Poncelet, P.: The PSP Approach for Mining Sequential Patterns. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 176–184. Springer, Heidelberg (1998)
Chapter Google Scholar
Masseglia, F., Poncelet, P., Cicchetti, R.: An efficient algorithm for web usage mining. Networking and Information Systems Journal (NIS) (April 2000)
Google Scholar
Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Chapter Google Scholar
Tanasa, D., Trousse, B.: Web access pattern discovery and analysis based on page classification and on indexing sessions with a generalised suffix tree. In: Proceedings of the 3^rd International Workshop on Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, Romania, October 2001, pp. 62–72 (2001)
Google Scholar
W3C. httpd-log files (1995), http://www.w3.org/Daemon/User/Config/Logging.html

Download references

Author information

Authors and Affiliations

INRIA Sophia Antipolis, 2004 route des lucioles, BP 93, 06902, Sophia Antipolis, France
F. Masseglia, D. Tanasa & B. Trousse

Authors

F. Masseglia
View author publications
You can also search for this author in PubMed Google Scholar
D. Tanasa
View author publications
You can also search for this author in PubMed Google Scholar
B. Trousse
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu
The University of News South Wales, NSW 2052, Australia
Xuemin Lin
Department of Computer Science, Tsinghua University, 100084, Beijing, P.R. China
Hongjun Lu
Victoria University, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Masseglia, F., Tanasa, D., Trousse, B. (2004). Web Usage Mining: Sequential Pattern Extraction with a Very Low Support. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds) Advanced Web Technologies and Applications. APWeb 2004. Lecture Notes in Computer Science, vol 3007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24655-8_56

Download citation

DOI: https://doi.org/10.1007/978-3-540-24655-8_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21371-0
Online ISBN: 978-3-540-24655-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics