Skip to main content

Web sessions clustering using hybrid sequence alignment measure (HSAM)

Abstract

Web usage mining inspects the navigation patterns in web access logs and extracts previously unknown and useful information. This may lead to strategies for various web-oriented applications like web site restructure, recommender system, web page prediction and so on. The current work demonstrates clustering of user sessions of uneven lengths to discover the access patterns by proposing a distance method to group user sessions. The proposed hybrid distance measure uses the access path information to find the distance between any two sessions without altering the order in which web pages are visited. R 2 is used to make a decision regarding the number of clusters to be constructed. Jaccard Index and Davies–Bouldin validity index are employed to assess the clustering done. The results obtained by these two standard statistic measures are encouraging and illustrate the goodness of the clusters created.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. What is a good value for R 2? http://www.duke.edu/~rnau/rsquared.htm. Accessed June 2011.

  2. How high, R 2? http://cooldata.wordpress.com/2010/04/19/how-high-r-squared/. Accessed June 2011.

  3. Jaccard Index, http://en.wikipedia.org/wiki/Jaccard_index. Accessed March 2011.

  4. Cluster validity algorithms, http://machaon.karanagai.com/validation_algorithms.html. Accessed August 2011.

References

  • Adnan M, Nagi M, Kianmehr K, Tahboub R, Ridley M, Rokne J (2011) Promoting where, when and what? An analysis of web logs by integrating data mining and social network techniques to guide ecommerce business promotions. Soc Netw Anal Min 1:173–185. doi:10.1007/s13278-010-0015-3

  • Brudno M, Malde S, Do ACB, Courancne O, Dubchak I, Batzogiou S (2003) Glocal alignment: finding rearrangements during alignent. J Bioinform 19:i54–i63

    Article  Google Scholar 

  • Chaofeng L, Yansheng L (2007) Similarity measurement of web sessions based on sequence alignment. Wuhan Univ J Nat Sci 12(5):814–818

    Article  Google Scholar 

  • Cooley R, Mobasher B, Srivastava J (1997a) Grouping web page references into transactions for mining World Wide Web browsing patterns. In: Proceedings of the IEEE knowledge and data engineering exchange workshop (KDEX-97), pp 2–9

  • Cooley R, Mobasher B, Srivastava J (1997b) Web mining: information and pattern discovery on the World Wide Web. In: Proceedings of ninth IEEE international conference on tools with artificial intelligence (ICTAI’97), pp 558–567

  • Facca FM, Lanzi PL (2005) Mining interesting knowledge from weblogs: a survey. J Data Knowl Eng 53:225–241

    Article  Google Scholar 

  • Fu Y, Sandhu K, Shih M-Y (1999) Clustering of web users based on access patterns. In: KDD workshop on web mining

  • Gunduz S, Tamer Ozsu M (2003) A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of 9th ACM SIGKDD international conference on knowledge discovery and data mining

  • Hay B, Wets G, Vanhoof K (2004) Mining navigation patterns using a sequence alignment method. J Knowl Inform Syst 6:150–163

    Google Scholar 

  • Hofgesang PI (2006) Relevance of time spent on web page. In: Proceedings of WEBKDD’06. ACM, New York

  • Jin Y, Lin C, Matsuo Y, Ishizuka M (2012) Mining dynamic social networks from public news articles for company value prediction. Soc Netw Anal Min. doi:10.1007/s13278-011-0045-5

  • Khasawneh N, Chan C-C (2008) Multidimensional sessions comparison method using dynamic programming. IEEE, pp 581–585

  • Krol D, Scigajlo M, Trawinski B (2008) Investigation of Internet system user behavior using cluster analysis. In: Proceedings of the seventh international conference on machine learning and cybernetics. IEEE, pp 3408–3412

  • Li C (2008) Algorithm of web session clustering based on increase of similarities. In: Proceedings of international conference on information management, innovation management and industrial engineering. IEEE, pp 316–319

  • Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of Internal Clustering Validation Measures. In:Proceedings of the 2010 IEEE International Conference on Data Mining, IEEE

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    Article  Google Scholar 

  • Poornalatha G, Raghavendra PS (2011a) Web user session clustering using modified K-means algorithm. In: Proceedings of ACC-2011, part II, CCIS 191. Springer, Berlin, pp 243–252. doi:10.1007/978-3-642-22714-1_26

  • Poornalatha G, Raghavendra PS (2011b) Alignment based similarity distance measure for better web sessions clustering. J Procedia CS 5:450–457. doi:10.1016/j.procs.2011.07.058

    Google Scholar 

  • Scott J (2010) Social network analysis: developments, advances, and prospects. Soc Netw Anal Min 1:21–26. doi:10.1007/s13278-010-0012-6

  • Shi P (2009) An efficient approach for clustering web access patterns from web logs. Int J Adv Sci Technol 5:1–13

    Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  Google Scholar 

  • Srivastava J, Cooley R, Deshpande M (2000) Web usage mining: discovery and applications of usage patterns from Web data. ACM SIGKDD 1:12–23

    Article  Google Scholar 

  • Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. J Data Mining Knowl Disc 6(1):9–35. doi:10.1023/A:1013228602957

    Google Scholar 

  • Tseng VS, Lin KW, Chang J (2008) Prediction of user navigation patterns by mining the temporal web usage evolution. J Soft Comput 12(2):157–163

    Article  Google Scholar 

  • Umapathi C, Raja J (2008) Discovering frequent patterns and trends by applying web mining technology in web log data. Int J Soft Comput 3(2):99–105

    Google Scholar 

  • Xing D, Shen J (2004) Efficient data mining for web navigation patterns. J Inform Softw Technol 46:55–63

    Article  Google Scholar 

  • Xu J, Liu H (2010) Web user clustering analysis based on KMeans algorithm. In: Proceedings of the international conference on information, networking and automation (ICINA). IEEE, pp v26–v29

Download references

Acknowledgments

The authors wish to thank anonymous reviewers for the useful and valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Poornalatha.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Poornalatha, G., Prakash, S.R. Web sessions clustering using hybrid sequence alignment measure (HSAM). Soc. Netw. Anal. Min. 3, 257–268 (2013). https://doi.org/10.1007/s13278-012-0070-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-012-0070-z

Keywords

  • Clustering
  • Sequence alignment
  • Web usage mining
  • Dynamic programming