Using Glocal Event Alignment for Comparing Sequences of Significantly Different Lengths

  • Vinh-Trung Luu
  • Mathis Ripken
  • Germain Forestier
  • Frédéric Fondement
  • Pierre-Alain Muller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9729)

Abstract

This work takes place in the context of conversion rate optimization by enhancing the user experience during navigation on e-commerce web sites. The requirement is to be able to segment visitors into meaningful clusters, which can then be targeted with specific call-to-actions, in order to increase the web site turnover. This paper presents an original approach, which equally combines global- and local-alignment techniques (Needleman-Wunsch and Smith-Waterman) in order to automatically segment visitors according to the sequence of visited pages. Experimental results on synthetic datasets show that our approach out-performs other typically used alignment metrics, such as hybrid approaches or Dynamic Time Warping.

Keywords

Web mining Sequential pattern mining Clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Algiriyage, N., Jayasena, S., Dias, G.: Web user profiling using hierarchical clustering with improved similarity measure. In: Moratuwa Engineering Research Conference (MERCon), pp. 295–300. IEEE (2015)Google Scholar
  2. 2.
    Aruk, T., Ustek, D., Kursun, O.: A comparative analysis of smith-waterman based partial alignment. In: 2012 IEEE Symposium on Computers and Communications (ISCC), pp. 000250–000252. IEEE (2012)Google Scholar
  3. 3.
    Bouguessa, M.: A practical approach for clustering transaction data. In: Perner, P. (ed.) MLDM 2011. LNCS, vol. 6871, pp. 265–279. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    Brudno, M., Malde, S., Poliakov, A., Do, C.B., Couronne, O., Dubchak, I., Batzoglou, S.: Glocal alignment: finding rearrangements during alignment. Bioinformatics 19(Suppl. 1), i54–i62 (2003)CrossRefGoogle Scholar
  5. 5.
    Chitraa, V., Thanamni, A.S.: An enhanced clustering technique for web usage mining. International Journal of Engineering Research and Technology 1. ESRSA Publications (2012)Google Scholar
  6. 6.
    Chordia, B.S., Adhiya, K.P.: Grouping web access sequences using sequence alignment method. Indian Journal of Computer Science and Engineering (IJCSE) 2(3), 308–314 (2011)Google Scholar
  7. 7.
    Della Vedova, G.: Multiple Sequence Alignment and Phylogenetic Reconstruction: Theory and Methods in Biological Data Analysis. Ph.D. thesis, Citeseer (2000)Google Scholar
  8. 8.
    Dimopoulos, C., Makris, C., Panagis, Y., Theodoridis, E., Tsakalidis, A.: A web page usage prediction scheme using sequence indexing and clustering techniques. Data & Knowledge Engineering 69(4), 371–382 (2010)CrossRefGoogle Scholar
  9. 9.
    Duraiswamy, K., Mayil, V.V.: Similarity matrix based session clustering by sequence alignment using dynamic programming. Computer and Information Science 1(3), 66 (2008)Google Scholar
  10. 10.
    Giegerich, R., Wheeler, D.: Pairwise sequence alignment. BioComputing Hypertext Coursebook 2 (1996)Google Scholar
  11. 11.
    Hay, B., Wets, G., Vanhoof, K.: Clustering navigation patterns on a website using a sequence alignment method. Intelligent Techniques for Web Personalization: IJCAI, 1–6 (2001)Google Scholar
  12. 12.
    Likic, V.: The needleman-wunsch algorithm for sequence alignment. Lecture given at the 7th Melbourne Bioinformatics Course, Bi021 Molecular Science and Biotechnology Institute, University of Melbourne (2008)Google Scholar
  13. 13.
    Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: International Conference on Data Mining, pp. 911–916. IEEE (2010)Google Scholar
  14. 14.
    Liu, Y., Hong, Y., Lin, C.Y., Hung, C.L.: Accelerating smith-waterman alignment for protein database search using frequency distance filtration scheme based on cpu-gpu collaborative system. International Journal of Genomics 2015 (2015)Google Scholar
  15. 15.
    Lu, L., Dunham, M., Meng, Y.: Discovery of significant usage patterns from clusters of clickstream data. In: Proc. of WebKDD, pp. 21–24. Citeseer (2005)Google Scholar
  16. 16.
    Luu, V.-T., Forestier, G., Fondement, F., Muller, P.-A.: Web site audience segmentation using hybrid alignment techniques. In: Li, X.-L., Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D. (eds.) PAKDD 2015. LNCS, vol. 9441, pp. 29–40. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25660-3_3 CrossRefGoogle Scholar
  17. 17.
    Mandal, O.P., Azad, H.K.: Web access prediction model using clustering and artificial neural network. International Journal of Engineering Research and Technology 3. ESRSA Publications (2014)Google Scholar
  18. 18.
    Meesrikamolkul, W., Niennattrakul, V., Ratanamahatana, C.A.: Shape-based clustering for time series data. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part I. LNCS, vol. 7301, pp. 530–541. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  19. 19.
    Muhamad, F.N., Ahmad, R., Asi, S.M., Murad, M.: Reducing the search space and time complexity of needleman-wunsch algorithm (global alignment) and smith-waterman algorithm (local alignment) for dna sequence alignment. Jurnal Teknologi 77(20) (2015)Google Scholar
  20. 20.
    Nakamura, A., Kudo, M.: Packing alignment: alignment for sequences of various length events. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 234–245. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  21. 21.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)CrossRefGoogle Scholar
  22. 22.
    Perner, P.: A novel method for the interpretation of spectrometer signals based on delta-modulation and similarity determination. In: 2014 IEEE 28th International Conference on Advanced Information Networking and Applications (AINA), pp. 1154–1160. IEEE (2014)Google Scholar
  23. 23.
    Petitjean, F., Forestier, G., Webb, G., Nicholson, A.E., Chen, Y., Keogh, E., et al.: Dynamic time warping averaging of time series allows faster and more accurate classification. In: International Conference on Data Mining, pp. 470–479. IEEE (2014)Google Scholar
  24. 24.
    Petitjean, F., Gançarski, P.: Summarizing a set of time series by averaging: From steiner sequence to compact multiple alignment. Theoretical Computer Science 414(1), 76–91 (2012)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Poornalatha, G., Raghavendra, P.S.: Web user session clustering using modified k-means algorithm. In: Lloret Mauri, J., Buford, J.F., Suzuki, J., Thampi, S.M., Abraham, A. (eds.) ACC 2011, Part II. CCIS, vol. 191, pp. 243–252. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  26. 26.
    Qi, Z., Redding, S., Lee, J.Y., Gibb, B., Kwon, Y., Niu, H., Gaines, W.A., Sung, P., Greene, E.C.: Dna sequence alignment by microhomology sampling during homologous recombination. Cell 160(5), 856–869 (2015)CrossRefGoogle Scholar
  27. 27.
    Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.: Internal versus external cluster validation indexes. International Journal of Computers and Communications 5(1), 27–34 (2011)Google Scholar
  28. 28.
    Si, J., Li, Q., Qian, T., Deng, X.: Discovering K web user groups with specific aspect interests. In: Perner, P. (ed.) MLDM 2012. LNCS, vol. 7376, pp. 321–335. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  29. 29.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)CrossRefGoogle Scholar
  30. 30.
    Yan, R., Xu, D., Yang, J., Walker, S., Zhang, Y.: A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction. Scientific Reports 3 (2013)Google Scholar
  31. 31.
    Zahid, S.K., Hasan, L., Khan, A.A., Ullah, S.: A novel structure of the smith-waterman algorithm for efficient sequence alignment. In: International Conference on Digital Information, Networking, and Wireless Communications (DINWC), pp. 6–9. IEEE (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Vinh-Trung Luu
    • 1
  • Mathis Ripken
    • 1
  • Germain Forestier
    • 1
  • Frédéric Fondement
    • 1
  • Pierre-Alain Muller
    • 1
  1. 1.MIPSUniversité de Haute AlsaceMulhouse CedexFrance

Personalised recommendations