A Divergence-Oriented Approach for Web Users Clustering

  • Sophia G. Petridou
  • Vassiliki A. Koutsonikola
  • Athena I. Vakali
  • Georgios I. Papadimitriou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3981)


Clustering web users based on their access patterns is a quite significant task in Web Usage Mining. Further to clustering it is important to evaluate the resulted clusters in order to choose the best clustering for a particular framework. This paper examines the usage of Kullback-Leibler divergence, an information theoretic distance, in conjuction with the k-means clustering algorithm. It compares KL-divergence with other well known distance measures (Euclidean, Standardized Euclidean and Manhattan) and evaluates clustering results using both objective function’s value and Davies-Bouldin index. Since it is imperative to assess whether the results of a clustering process are susceptible to noise, especially in noisy environments such as Web environment, our approach takes the impact of noise into account. The clusters obtained with KL approach seem to be superior to those obtained with the other distance measures in case our data have been corrupted by noise.


Cluster Process External Knowledge Cluster Validity Manhattan Distance Distance Table 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. In: SIGKDD Exploratios, vol. 1(2) (January 2000)Google Scholar
  2. 2.
    Petridou, S., Pallis, G., Vakali, A., Papadimitriou, G., Pomportsis, A.: Web Data Accessing and the Web Searching Process. In: ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2003), Tunis, Tunisia, July 14-18 (2003)Google Scholar
  3. 3.
    Vakali, A., Papadimitriou, G.: Web Engineering: The Evolution of New Technologies. Guest Editorial in IEEE Computing in Science and Engineering 6(4), 10–11 (2004)Google Scholar
  4. 4.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)MATHGoogle Scholar
  5. 5.
    McQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proc. 5th Berkley Symposium on Mathematical Statistics and Probability, Statistics, vol. I, pp, 281–297 (1994)Google Scholar
  6. 6.
    Kerr, M.K., Churchill, G.A.: Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 98, 8961–8965 (2001)MATHCrossRefGoogle Scholar
  7. 7.
    Stein, B., Eissen, S.M.Z., Wißbrock, F.: On Cluster Validity and the Information Need of Users. In: 3rd IASTED Int. Conference on Artificial Intelligence and Applications (AIA 2003) (2003)Google Scholar
  8. 8.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering Validity Checking Methods: Part II. In: SIGMOD Record, vol. 31(3) (September 2002)Google Scholar
  9. 9.
    Kasturi, J., Acharya, R., Ramanathan, M.: An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics 19(4), 449–458 (2003)CrossRefGoogle Scholar
  10. 10.
    Sturn, A.: Cluster analysis for large scale gene expression studies. Master’s thesis, Graz University of Technology, Graz, Austria (2001)Google Scholar
  11. 11.
    Dhillon, I.S., Mallela, S., Kumar, R.: Enchanced Word Clustering for Hierarchical Text Classification. In: KDD 2002, pp. 191–200 (2002)Google Scholar
  12. 12.
    Dhillon, I.S., Mallela, S., Kumar, R.: Information Theoretic Feature Clustering for Text Classification. Journal of Machine Learning Research 3, 1265–1287 (2003)MATHCrossRefGoogle Scholar
  13. 13.
    Boutin, F., Hascoer, M.: Cluster Validity Indices for Graph Partitioning. In: Proceedings of the Eighth International Conference on Information Visualisation (IV 2004), 1093-9547/04 IEEE (2004)Google Scholar
  14. 14.
    Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4(3), 95–104 (1974)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Learning 1(2) (1979)Google Scholar
  16. 16.
    Larsen, B., Aone, C.: Fast and Effective: Text Mining Using Linear-time Document Clustering. In: Proc. KDD 1999 Workshop, San Diego, CA, USA (1999)Google Scholar
  17. 17.
    Mobasher, B., Cooley, R., Srivastava, J.: Creating Adaptive Web Sites Through Usage-Based Clustering of URLs. In: Proccedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX 1999) (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sophia G. Petridou
    • 1
  • Vassiliki A. Koutsonikola
    • 1
  • Athena I. Vakali
    • 1
  • Georgios I. Papadimitriou
    • 1
  1. 1.Dept of Informatics Aristotle UniversityThessalonikiGreece

Personalised recommendations