Dissimilarities for Web Usage Mining

  • Fabrice Rossi
  • Francisco De Carvalho
  • Yves Lechevallier
  • Alzennyr Da Silva
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

The obtention of a set of homogeneous classes of pages according to the browsing patterns identified in web server log files can be very useful for the analysis of organization of the site and of its adequacy to user needs. Such a set of homogeneous classes is often obtained from a dissimilarity measure between the visited pages defined via the visits extracted from the logs. There are however many possibilities for defined such a measure. This paper presents an analysis of different dissimilarity measures based on the comparison between the semantic structure of the site identified by experts and the clustering constructed with standard algorithms applied to the dissimilarity matrices generated by the chosen measures.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. CELEUX, G., DIDAY, E., GOVAERT, G., LECHEVALLIER, Y. and RALAM-BONDRAINY, H. (1989): Classification Automatique des Données. Bordas, Paris.Google Scholar
  2. CHEN, C. (1998): Generalized similarity analysis and pathfinder network scaling. Interacting with Computers, 10:107–128.MATHCrossRefGoogle Scholar
  3. FOSS, A., WANG, W. and ZAÏANE, O.R. (2001): A non-parametric approach to web log analysis. In Proc. of Workshop on Web Mining in First International SIAM Conference on Data Mining (SDM2001), pages 41–50, Chicago, IL, April 2001.Google Scholar
  4. GOWER, J. and LEGENDRE, P. (1986): Metric and euclidean properties of dissimilarity coefficients. Journal of Classification, 3:5–48.MATHMathSciNetCrossRefGoogle Scholar
  5. HUBERT, L. and ARABIE, P. (1985): Comparing partitions. Journal of Classification, 2:193–218.CrossRefGoogle Scholar
  6. KAUFMAN, L. and ROUSSEEUW, P.J. (1987): Clustering by means of medoids. In Y. Dodge, editor, Statistical Data Analysis Based on the L1-Norm and Related Methods, pages 405–416. North-Holland, 1987.Google Scholar
  7. ROSSI, F., EL GOLLI, A. and LECHEVALLIER, Y. (2005): Usage guided clustering of web pages with the median self organizing map. In Proceedings of XIIIth European Symposium on Artificial Neural Networks (ESANN 2005), pages 351–356, Bruges (Belgium), April 2005.Google Scholar
  8. TANASA, D. and TROUSSE, B. (2004): Advanced data preprocessing for intersites web usage mining. IEEE Intelligent Systems, 19(2):59–65, March–April 2004. ISSN 1094-7167.CrossRefGoogle Scholar
  9. TANASA, D. and TROUSSE, B. (2004): Data preprocessing for wum. IEEE Potentials, 23(3):22–25, August–September 2004.CrossRefGoogle Scholar
  10. VAN RIJSBERGEN, C.J. (1979): Information Retrieval (second ed.). London: Butterworths.MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 2006

Authors and Affiliations

  • Fabrice Rossi
    • 1
  • Francisco De Carvalho
    • 2
  • Yves Lechevallier
    • 1
  • Alzennyr Da Silva
    • 1
    • 2
  1. 1.Domaine de Voluceau, RocquencourtProjet AxIS, INRIA RocquencourtLe Chesnay cedexFrance
  2. 2.Centro de Informatica - CIn/UFPERecife (PE)Brasil

Personalised recommendations