Advertisement

Web Usage Mining: Sequential Pattern Extraction with a Very Low Support

  • F. Masseglia
  • D. Tanasa
  • B. Trousse
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3007)

Abstract

The goal of this work is to increase the relevance and the interestingness of patterns discovered by a Web Usage Mining process. Indeed, the sequential patterns extracted on web log files, unless they are found under constraints, often lack interest because of their obvious content. Our goal is to discover minority users’ behaviors having a coherence which we want to be aware of (like hacking activities on the Web site or a users’ activity limited to a specific part of the Web site). By means of a clustering method on the extracted sequential patterns, we propose a recursive division of the problem. The developed clustering method is based on patterns summaries and neural networks. Our experiments show that we obtain the targeted patterns whereas their extraction by means of a classical process is impossible because of a very weak support (down to 0.006%). The diversity of users’ behaviors is so large that the minority ones are both numerous and difficult to locate.

Keywords

Web usage mining sequential patterns clustering patterns summary neural networks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, May 1993, pp. 207–216 (1993)Google Scholar
  2. 2.
    Benedek, A., Trousse, B.: Adaptation of Self-Organizing Maps for CBR case indexing. In: 27th Annual Conference of the Gesellschaft fur Klassifikation, Cottbus, Germany (March 2003)Google Scholar
  3. 3.
    Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems 1(1), 5–32 (1999)Google Scholar
  4. 4.
    Fayad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)Google Scholar
  5. 5.
    Giacometti, A.: Modèles hybrides de l’expertise, novembre, PhD Thesis (in french), ENST Paris (1992)Google Scholar
  6. 6.
    Jaczynski, M.: Modèle et plate-forme à objets pour l’indexation des cas par situation comportementales: application à l’assistance à la navigation sur le web, décembre, PhD thesis (in french), Université de Nice Sophia-Antipolis (1998)Google Scholar
  7. 7.
    Malek, M.: Un modèle hybride de mémoire pour le raisonnement à partir de cas, octobre, PhD thesis (in french), Université Joseph Fourrier (1996)Google Scholar
  8. 8.
    Masseglia, F., Cathala, F., Poncelet, P.: The PSP Approach for Mining Sequential Patterns. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 176–184. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  9. 9.
    Masseglia, F., Poncelet, P., Cicchetti, R.: An efficient algorithm for web usage mining. Networking and Information Systems Journal (NIS) (April 2000)Google Scholar
  10. 10.
    Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  11. 11.
    Tanasa, D., Trousse, B.: Web access pattern discovery and analysis based on page classification and on indexing sessions with a generalised suffix tree. In: Proceedings of the 3rd International Workshop on Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, Romania, October 2001, pp. 62–72 (2001)Google Scholar
  12. 12.

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • F. Masseglia
    • 1
  • D. Tanasa
    • 1
  • B. Trousse
    • 1
  1. 1.INRIA Sophia AntipolisSophia AntipolisFrance

Personalised recommendations