Dissimilarities for Web Usage Mining
The obtention of a set of homogeneous classes of pages according to the browsing patterns identified in web server log files can be very useful for the analysis of organization of the site and of its adequacy to user needs. Such a set of homogeneous classes is often obtained from a dissimilarity measure between the visited pages defined via the visits extracted from the logs. There are however many possibilities for defined such a measure. This paper presents an analysis of different dissimilarity measures based on the comparison between the semantic structure of the site identified by experts and the clustering constructed with standard algorithms applied to the dissimilarity matrices generated by the chosen measures.
Unable to display preview. Download preview PDF.
- CELEUX, G., DIDAY, E., GOVAERT, G., LECHEVALLIER, Y. and RALAM-BONDRAINY, H. (1989): Classification Automatique des Données. Bordas, Paris.Google Scholar
- FOSS, A., WANG, W. and ZAÏANE, O.R. (2001): A non-parametric approach to web log analysis. In Proc. of Workshop on Web Mining in First International SIAM Conference on Data Mining (SDM2001), pages 41–50, Chicago, IL, April 2001.Google Scholar
- KAUFMAN, L. and ROUSSEEUW, P.J. (1987): Clustering by means of medoids. In Y. Dodge, editor, Statistical Data Analysis Based on the L1-Norm and Related Methods, pages 405–416. North-Holland, 1987.Google Scholar
- ROSSI, F., EL GOLLI, A. and LECHEVALLIER, Y. (2005): Usage guided clustering of web pages with the median self organizing map. In Proceedings of XIIIth European Symposium on Artificial Neural Networks (ESANN 2005), pages 351–356, Bruges (Belgium), April 2005.Google Scholar