Social Network Analysis and Mining

, Volume 3, Issue 2, pp 257–268 | Cite as

Web sessions clustering using hybrid sequence alignment measure (HSAM)

Original Article


Web usage mining inspects the navigation patterns in web access logs and extracts previously unknown and useful information. This may lead to strategies for various web-oriented applications like web site restructure, recommender system, web page prediction and so on. The current work demonstrates clustering of user sessions of uneven lengths to discover the access patterns by proposing a distance method to group user sessions. The proposed hybrid distance measure uses the access path information to find the distance between any two sessions without altering the order in which web pages are visited. R 2 is used to make a decision regarding the number of clusters to be constructed. Jaccard Index and Davies–Bouldin validity index are employed to assess the clustering done. The results obtained by these two standard statistic measures are encouraging and illustrate the goodness of the clusters created.


Clustering Sequence alignment Web usage mining Dynamic programming 



The authors wish to thank anonymous reviewers for the useful and valuable suggestions.


  1. Adnan M, Nagi M, Kianmehr K, Tahboub R, Ridley M, Rokne J (2011) Promoting where, when and what? An analysis of web logs by integrating data mining and social network techniques to guide ecommerce business promotions. Soc Netw Anal Min 1:173–185. doi: 10.1007/s13278-010-0015-3
  2. Brudno M, Malde S, Do ACB, Courancne O, Dubchak I, Batzogiou S (2003) Glocal alignment: finding rearrangements during alignent. J Bioinform 19:i54–i63CrossRefGoogle Scholar
  3. Chaofeng L, Yansheng L (2007) Similarity measurement of web sessions based on sequence alignment. Wuhan Univ J Nat Sci 12(5):814–818CrossRefGoogle Scholar
  4. Cooley R, Mobasher B, Srivastava J (1997a) Grouping web page references into transactions for mining World Wide Web browsing patterns. In: Proceedings of the IEEE knowledge and data engineering exchange workshop (KDEX-97), pp 2–9Google Scholar
  5. Cooley R, Mobasher B, Srivastava J (1997b) Web mining: information and pattern discovery on the World Wide Web. In: Proceedings of ninth IEEE international conference on tools with artificial intelligence (ICTAI’97), pp 558–567Google Scholar
  6. Facca FM, Lanzi PL (2005) Mining interesting knowledge from weblogs: a survey. J Data Knowl Eng 53:225–241CrossRefGoogle Scholar
  7. Fu Y, Sandhu K, Shih M-Y (1999) Clustering of web users based on access patterns. In: KDD workshop on web miningGoogle Scholar
  8. Gunduz S, Tamer Ozsu M (2003) A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of 9th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  9. Hay B, Wets G, Vanhoof K (2004) Mining navigation patterns using a sequence alignment method. J Knowl Inform Syst 6:150–163Google Scholar
  10. Hofgesang PI (2006) Relevance of time spent on web page. In: Proceedings of WEBKDD’06. ACM, New YorkGoogle Scholar
  11. Jin Y, Lin C, Matsuo Y, Ishizuka M (2012) Mining dynamic social networks from public news articles for company value prediction. Soc Netw Anal Min. doi: 10.1007/s13278-011-0045-5
  12. Khasawneh N, Chan C-C (2008) Multidimensional sessions comparison method using dynamic programming. IEEE, pp 581–585Google Scholar
  13. Krol D, Scigajlo M, Trawinski B (2008) Investigation of Internet system user behavior using cluster analysis. In: Proceedings of the seventh international conference on machine learning and cybernetics. IEEE, pp 3408–3412Google Scholar
  14. Li C (2008) Algorithm of web session clustering based on increase of similarities. In: Proceedings of international conference on information management, innovation management and industrial engineering. IEEE, pp 316–319Google Scholar
  15. Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of Internal Clustering Validation Measures. In:Proceedings of the 2010 IEEE International Conference on Data Mining, IEEEGoogle Scholar
  16. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453CrossRefGoogle Scholar
  17. Poornalatha G, Raghavendra PS (2011a) Web user session clustering using modified K-means algorithm. In: Proceedings of ACC-2011, part II, CCIS 191. Springer, Berlin, pp 243–252. doi: 10.1007/978-3-642-22714-1_26
  18. Poornalatha G, Raghavendra PS (2011b) Alignment based similarity distance measure for better web sessions clustering. J Procedia CS 5:450–457. doi: 10.1016/j.procs.2011.07.058 Google Scholar
  19. Scott J (2010) Social network analysis: developments, advances, and prospects. Soc Netw Anal Min 1:21–26. doi: 10.1007/s13278-010-0012-6
  20. Shi P (2009) An efficient approach for clustering web access patterns from web logs. Int J Adv Sci Technol 5:1–13Google Scholar
  21. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197CrossRefGoogle Scholar
  22. Srivastava J, Cooley R, Deshpande M (2000) Web usage mining: discovery and applications of usage patterns from Web data. ACM SIGKDD 1:12–23CrossRefGoogle Scholar
  23. Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. J Data Mining Knowl Disc 6(1):9–35. doi: 10.1023/A:1013228602957 Google Scholar
  24. Tseng VS, Lin KW, Chang J (2008) Prediction of user navigation patterns by mining the temporal web usage evolution. J Soft Comput 12(2):157–163CrossRefGoogle Scholar
  25. Umapathi C, Raja J (2008) Discovering frequent patterns and trends by applying web mining technology in web log data. Int J Soft Comput 3(2):99–105Google Scholar
  26. Xing D, Shen J (2004) Efficient data mining for web navigation patterns. J Inform Softw Technol 46:55–63CrossRefGoogle Scholar
  27. Xu J, Liu H (2010) Web user clustering analysis based on KMeans algorithm. In: Proceedings of the international conference on information, networking and automation (ICINA). IEEE, pp v26–v29Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.National Institute of Technology Karnataka (NITK), SurathkalMangaloreIndia

Personalised recommendations