Clustering Web Page Sessions Using Sequence Alignment Method

  • G. Poornalatha
  • S. Raghavendra Prakash
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 250)

Abstract

This paper illustrates clustering of web page sessions in order to identify the users’ navigation pattern. In the approach presented here, user sessions of variable lengths are compared pair wise, numbers of alignments are found between them and the distances are measured. Web page sessions are clustered by employing the modified k-means algorithm. A couple of web access logs including the well known NASA data set are used to illustrate the effectiveness of the clustering. R-squared measure is applied to determine the optimal number of clusters and chi-squared test is carried out to see the association between the various web page sessions that are clustered. These two measures show the goodness of the clusters formed.

Keywords

clustering sequence alignment web usage mining R-squared measure dynamic programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Nina, S.P., Rahman, M., Bhuiyan, K.I., Ahmed, K.: Pattern Discovery of Web Usage Mining. In: Int. Conf. on Computer Technology and Development. IEEE (2009)Google Scholar
  2. 2.
    Mojica, J.A., Rojas, D.A., Gomez, J., Gonzalez, F.: Page Clustering Using a Distance Based Algorihm. In: 3rd Latin American Web congress, LA-WEB 2005. IEEE (2005)Google Scholar
  3. 3.
    Yilmaz, H., Senkul, P.: Using Ontology and Sequence Information for Extracting Behavior Patterns from Web Navigation Logs. In: IEEE Int. Conf. on Data Mining Workshops, pp. 549–556. IEEE (2010)Google Scholar
  4. 4.
    Oh, S.: Mining Clusters of Sequences Using Extended Sequence Element-Based Similarity Measure. In: 2nd Int. Conf. on Innovative Computing, Information and Control. IEEE (2007)Google Scholar
  5. 5.
    Yanchi, L., Zhongmou, L., Hui, X., Xuedong, G., Junjie, W.: Understanding of Internal Clustering Validation Measures. In: IEEE 10th Int. Conf. on Data Mining, pp. 911–916. IEEE (2010)Google Scholar
  6. 6.
    Hey, B., Wets, G., Vanhoof, K.: Mining Navigation Patterns Using a Sequence Alignment Method. J. Know. and Info. Systems, 150–163 (2004)Google Scholar
  7. 7.
    George, P., Lefteris, A., Anthena, V.: Validation and Interpretation of Web users’ sessions clusters. J. Info. Processing & Mgmt. 43, 1348–1367 (2007)CrossRefGoogle Scholar
  8. 8.
    Poornalatha, G., Raghavendra, P.S.: Web user session clustering using modified K-means algorithm. In: Abraham, A., Lloret Mauri, J., Buford, J.F., Suzuki, J., Thampi, S.M. (eds.) ACC 2011, Part II. CCIS, vol. 191, pp. 243–252. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Poornalatha, G., Raghavendra, P.S.: Alignment Based Similarity Distance Measure for Better Web Sessions Clustering. J. Procedia Computer Science 5, 450–457 (2011)CrossRefGoogle Scholar
  10. 10.
    What’s a good value for R-squared?, http://www.duke.edu/~rnau/rsquared.htm
  11. 11.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • G. Poornalatha
    • 1
  • S. Raghavendra Prakash
    • 1
  1. 1.Department of Information TechnologyNational Institute of Technology Karnataka (NITK)MangaloreIndia

Personalised recommendations