Skip to main content

Mining and Analysis of Clickstream Patterns

  • Chapter

Part of the Studies in Computational Intelligence book series (SCI,volume 206)

Abstract

The explosive growth of the web has drastically changed the way in which information is managed and accessed. The large-scale of web data sources and the wide availability of services over the internet have increased the need for effective web data mining techniques and mechanisms . A sophisticated method to organize the layout of the information and assist user navigation is therefore particularly important. In this work, we focus on web usage mining, applying data mining techniques to web server logs. Web usage mining is the non-trivial process of distinguishing implicit, previously unknown but potentially useful clickstream patterns that may exist in any collection of web access logs. The required abstraction can be generated by clustering the web access logs based on some sort of similarity measure. Clustering is done such that the web access logs within the same group or cluster are more similar than data points from different clusters. In this chapter, we propose a partitional algorithm namely Multi Pass Combined Standard Deviation(CSD) Means algorithm which automatically generates the optimum number of clusters from the web clickstream patterns. The quality of clusters obtained using these algorithms are compared using K-Means algorithm, Rough K-Means algorithm and model based algorithms ANTCLUST and ACCANTCLUST. The experimental analysis of mined clickstream patterns shows the effectiveness of the proposed algorithm.

Keywords

  • Clickstreams
  • Clustering
  • K-Means
  • Ant-Clustering
  • Rough K-Means

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-01091-0_1
  • Chapter length: 25 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   149.00
Price excludes VAT (USA)
  • ISBN: 978-3-642-01091-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   199.99
Price excludes VAT (USA)
Hardcover Book
USD   249.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abraham, A.: Natural Computation for Business Intelligence from Web Usage Mining. In: Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2005) (2005)

    Google Scholar 

  2. Baumgarten, M., Bchner, A.G., Anand, S.S., Mulvenna, M.D., Hughes, J.G.: Navigation Pattern Discovery from Internet Data. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS, vol. 1836. Springer, Heidelberg (2000)

    CrossRef  Google Scholar 

  3. Bezdek, J.C.: Numerical Taxonomy with Fuzzy Sets. J. Math. Biol. 1, 57–71 (1974)

    MATH  CrossRef  MathSciNet  Google Scholar 

  4. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)

    MATH  Google Scholar 

  5. Cooley, R.: Web Usage Mining: Discovery and Application of Interesting Patterns from web data, Ph.D. Thesis, University of Minnesota (2000)

    Google Scholar 

  6. Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. J. Knowledge and Information Systems 1(1), 5–32 (1999)

    Google Scholar 

  7. Cooley, R., Srivastava, J., Mobasher, B.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 1997), pp. 558–567 (1997b)

    Google Scholar 

  8. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Analysis and Machine Intelligence 1(4), 224–227 (1979)

    CrossRef  Google Scholar 

  9. Pierrakos, D., Paliouras, G.O., Papatheodorou, C., Spyropoulos, C.D.: Web Usage Mining as a Tool for Personalization: A Survey. User Modeling and User-Adapted Interaction 13, 311–372 (2003)

    CrossRef  Google Scholar 

  10. Dubes, R., Jain, A.K.: Validity studies in clustering methodologies. Pattern Recognition 11(1), 235–253 (1979)

    MATH  CrossRef  Google Scholar 

  11. Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley Interscience, New York (1973)

    MATH  Google Scholar 

  12. Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters. Journal Cybern. 3(3), 32–57 (1973)

    MATH  CrossRef  MathSciNet  Google Scholar 

  13. Estivill-Castro, V., Yang, J.: Fast and robust general purpose clustering algorithms. In: Pacific Rim International Conference on Artificial intelligence, pp. 208–218 (1979)

    Google Scholar 

  14. Flake, G.W., Lawrence, S., Lee Giles, C., Coetzee, F.M.: Self- organization and identification of Web communities. IEEE Computer 35(3), 66–71 (2002)

    Google Scholar 

  15. Fu, Y., Sandhu, K., Shi, M.: Clustering of web users based on access patterns. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 21–38. Springer, Heidelberg (2000)

    CrossRef  Google Scholar 

  16. Pallis, G., Angelis, L., Vakali, A.: Validation and interpretation of Web users’ sessions clusters. In: Information Processing and Management (2006)

    Google Scholar 

  17. Peters, G.: Some refinements of rough k-means clustering. Pattern Recognition 39, 1481–1491 (2006)

    MATH  CrossRef  Google Scholar 

  18. Hannah Inbarani, H., Thangavel, K.: Clickstream Intelligent Clustering using Accelerated Ant Colony Algorithm. In: Advanced Computing and Communications, 2006. ADCOM 2006. International Conference I, pp. 129–134 (2006)

    Google Scholar 

  19. Hannah Inbarani, H., Thangavel, K., Pethalakshmi, A.: Rough Set Based Feature Selection for Web Usage Mining. In: Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), pp. 33–38 (2007) ISBN:0-7695-3050-8

    Google Scholar 

  20. Heer, J., Chi, E.: Mining the structure of user activity using cluster stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining, Arlington, VA (April 2002)

    Google Scholar 

  21. Chang, H.-J., Hung, L.-P., Ho, C.-L.: An anticipation model of potential customers’ purchasing behavior based on clustering analysis and association rules analysis. Expert Systems with Applications 32, 753–764 (2007)

    CrossRef  Google Scholar 

  22. Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993)

    CrossRef  Google Scholar 

  23. Kuo, R.J., Wang, H.S., Hu, T.-L., Chou, S.H.: Application of Ant K-Means on Clustering Analysis. Computers and Mathematics with Applications 50, 1709–1724 (2005)

    MATH  CrossRef  MathSciNet  Google Scholar 

  24. Lingras, P., West, C.: Interval Set Clustering of Web Users with Rough K-means. Journal of Intelligent Information Systems (2002)

    Google Scholar 

  25. McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Newman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  26. Mobasher, B., Cooley, R., Srivastava, J.: Creating adaptive web sites through usage-based clustering of URLs. In: Proceedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX) (1999)

    Google Scholar 

  27. Labroche, N., Monmarche, N., Venturini, G.: A new clustering algorithm based on the chemical recognition system of ants. In: Proceedings of 15th European Conference on Artificial Intelligence (ECAI 2002), Lyon FRANCE, pp. 345–349 (2002)

    Google Scholar 

  28. Labroche, N., Monmarche, N., Venturini, G.: Web session clustering with artificial ants colonies’. In: Proc. of WWW 2003, May 20-24 (2003)

    Google Scholar 

  29. Perkowitz, M., Etzioni O.: Adaptive sites: automatically learning from user access patterns. In: Proceedings of WWW6 (1997), www.scope.gmd.de/info/www6/posters/722/index.html

  30. Song, Q., Shepperd, M.: Mining web browsing patterns for E-commerce. Computers in Industry 57, 622–630 (2006)

    CrossRef  Google Scholar 

  31. Bucklin, R.E., Lattin, J.M., Ansari, A., Gupta, S., Bell, D., Coupey, E., Little, J.D.C., Mela, C., Montgomery, A., Steckel, J.: Choice And the Internet: From Clickstream to Research Stream. Marketing Letters 13(3), 245–258 (2002)

    CrossRef  Google Scholar 

  32. Selamat, A., Sigeru, O.: Web page feature selection and classification using neural networks. Information Sciences 158, 69–88 (2004)

    CrossRef  MathSciNet  Google Scholar 

  33. Song, A.-B., Zhao, M.-X., Liang, Z.-P., Dong, Y.-S., Luo, J.-Z.: Discovering user profiles for Web personalization recommendation. Journal of Computer Science and Technology 19(3), 320–328 (2004)

    CrossRef  Google Scholar 

  34. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.T.: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations 1(2), 2–23 (2000)

    CrossRef  Google Scholar 

  35. Kumar De, S., Radha Krishna, P.: Clustering web transactions using rough approximation. Fuzzy Sets and Systems 148, 131–138 (2004)

    MATH  CrossRef  MathSciNet  Google Scholar 

  36. Mitra, S.: Rough-Fuzzy Collaborative Clustering. IEEE Transactions on Systems, Man and Cybernetics 36(4) (2006)

    Google Scholar 

  37. Thangavel, K., Ashok Kumar, D.: Pattern Clustering using Neural Network, Vision 2020: The Strategic role of Operational Research. Allied Publishers PVT LTD, New Delhi, pp. 662–679 (2006)

    Google Scholar 

  38. Titterington, D., Smith, A., Makov, U.: Statistical analysis of finite mixture distributions. John Wiley and Sons, Chichester (1985)

    MATH  Google Scholar 

  39. Voges, K.E., Pope, N.K.L., Brown, M.R.: Cluster analysis of marketing data examining online shopping orientation: a comparison of k-means and rough clustering approaches. In: Abbass, H.A., Sarker, R.A., Newton, C.S. (eds.) Heuristics and Optimization for Knowledge Discovery, pp. 207–224. Idea Group Publishing, Hershey (2002)

    Google Scholar 

  40. Wang, X., Abraham, A., Smith, K.: Intelligent web traffic mining and analysis. Journal of Network and Computer Applications 28(2), 147–165 (2005)

    CrossRef  Google Scholar 

  41. WangBin, Liuzhijing: Web Mining Research. In: Proceedings of the Fifth nternational Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2003) (2003)

    Google Scholar 

  42. Xing, W., Ghorbani, A.: Weighted PageRank Algorithm. In: Proceedings of the Second Annual Conference on Communication Networks and Services Research (CNSR 2004) (2004)

    Google Scholar 

  43. W.W.W. Consortium. The Common Log file Format (1995), http://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format

  44. Xie, X.L., Beni, G.: A Validity Measure for fuzzy Clustering. IEEE Trans. on Pattern Analysis and MachineIntelligence 13(8), 841–847 (1991)

    CrossRef  Google Scholar 

  45. Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From user access patterns to dynamic hypertext linking. In: Proceedings of 5th WWW, pp. 1007–1014 (1996)

    Google Scholar 

  46. Zhang, X., Gong, W., Kawamura, Y.: Customer behavior pattern discovering with web mining. In: Proceedings of Asia Pacific web conference, Hangzhou, China, pp. 844–853 (2004)

    Google Scholar 

  47. Pabarskaite, Z., Raudys, A.: A process of knowledge discovery from web log data: Systematization and critical review. Journal of Intelligent Information Systems 28, 79–104 (2007)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Inbarani, H.H., Thangavel, K. (2009). Mining and Analysis of Clickstream Patterns. In: Abraham, A., Hassanien, AE., de Leon F. de Carvalho, A.P., Snášel, V. (eds) Foundations of Computational, IntelligenceVolume 6. Studies in Computational Intelligence, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01091-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01091-0_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01090-3

  • Online ISBN: 978-3-642-01091-0

  • eBook Packages: EngineeringEngineering (R0)