Skip to main content

Modelling Visit Similarity Using Click-Stream Data: A Supervised Approach

  • Conference paper
Web Information Systems Engineering – WISE 2014 (WISE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8786))

Included in the following conference series:

Abstract

Identifying and targeting visitors on e-commerce website with personalized content in real-time is extremely important to marketers. Although such targeting exists today, it is based on demographic attributes of the visitors. We show that dynamic visitor attributes extracted from their click-stream provide much better predictive capabilities of visitor intent. In this work, we propose a mechanism for identifying similar visitor sessions on a website based on their click-streams. Novel techniques for extracting features from visitor clicks are employed. Large margin nearest neighbour (LMNN) algorithm is used to learn a similarity metric between any two sessions. Further the sessions are classified into purchasers and non-purchasers using k-nearest neighbour (kNN) classification. Experimental results showing significant improvements over baseline algorithms based on Hidden Markov Model(HMM), support vector machine (SVM) and random forest are presented on two large real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aizawa, A.: An information-theoretic perspective of tf–idf measures. Information Processing & Management 39(1), 45–65 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  2. Blitzer, J., Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 1473–1480 (2005)

    Google Scholar 

  3. Bucklin, R.E., Sismeiro, C.: A model of web site browsing behavior estimated on clickstream data. Journal of Marketing Research, 249–267 (2003)

    Google Scholar 

  4. Cadez, I., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of navigation patterns on a web site using model-based clustering. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 280–284. ACM (2000)

    Google Scholar 

  5. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: ACM SIGCOMM Computer Communication Review, vol. 29, pp. 251–262. ACM (1999)

    Google Scholar 

  6. Joachims, T.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical report, DTIC Document (1996)

    Google Scholar 

  7. Li, J., Tian, H., Xing, D.: Clustering user session data for web applications test. Journal of Computational Information Systems 7(9), 3174–3181 (2011)

    Google Scholar 

  8. Mahalanobis, P.C.: On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta) 2, 49–55 (1936)

    MATH  Google Scholar 

  9. Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 488–501. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Moe, W.W.: Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. Journal of Consumer Psychology 13(1), 29–39 (2003)

    Article  Google Scholar 

  11. Montgomery, A.L., Li, S., Srinivasan, K., Liechty, J.C.: Modeling online browsing and path analysis using clickstream data. Marketing Science (2004)

    Google Scholar 

  12. Newman, M.E.: Power laws, pareto distributions and zipf’s law. Contemporary Physics 46(5), 323–351 (2005)

    Article  Google Scholar 

  13. Petridou, S.G., Koutsonikola, V.A., Vakali, A.I., Papadimitriou, G.I.: Time-aware web users’ clustering. IEEE Transactions on Knowledge and Data Engineering 20(5), 653–667 (2008)

    Article  Google Scholar 

  14. Poggi, N., Carrera, D., Gavalda, R., Ayguadé, E., Torres, J.: A methodology for the evaluation of high response time on e-commerce users and sales

    Google Scholar 

  15. Scott, S.L., Hann, I.-H.: A nested hidden markov model for internet browsing behavior (2006)

    Google Scholar 

  16. Sismeiro, C., Bucklin, R.E.: Modeling purchase behavior at an e-commerce web site: a task-completion approach. Journal of Marketing Research (2004)

    Google Scholar 

  17. Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Review (1996)

    Google Scholar 

  18. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)

    MATH  Google Scholar 

  19. Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS) 26(3), 13 (2008)

    Article  Google Scholar 

  20. Ypma, A., Ypma, E., Heskes, T.: Categorization of web pages and user clustering with mixtures of hidden markov models (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Pai, D., Sharang, A., Yadagiri, M.M., Agrawal, S. (2014). Modelling Visit Similarity Using Click-Stream Data: A Supervised Approach. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11749-2_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11748-5

  • Online ISBN: 978-3-319-11749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics