Mining and Analysis of Clickstream Patterns

Inbarani, H. Hannah; Thangavel, K.

doi:10.1007/978-3-642-01091-0_1

H. Hannah Inbarani⁶ &
K. Thangavel⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 206))

1084 Accesses
5 Citations

Abstract

The explosive growth of the web has drastically changed the way in which information is managed and accessed. The large-scale of web data sources and the wide availability of services over the internet have increased the need for effective web data mining techniques and mechanisms . A sophisticated method to organize the layout of the information and assist user navigation is therefore particularly important. In this work, we focus on web usage mining, applying data mining techniques to web server logs. Web usage mining is the non-trivial process of distinguishing implicit, previously unknown but potentially useful clickstream patterns that may exist in any collection of web access logs. The required abstraction can be generated by clustering the web access logs based on some sort of similarity measure. Clustering is done such that the web access logs within the same group or cluster are more similar than data points from different clusters. In this chapter, we propose a partitional algorithm namely Multi Pass Combined Standard Deviation(CSD) Means algorithm which automatically generates the optimum number of clusters from the web clickstream patterns. The quality of clusters obtained using these algorithms are compared using K-Means algorithm, Rough K-Means algorithm and model based algorithms ANTCLUST and ACCANTCLUST. The experimental analysis of mined clickstream patterns shows the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abraham, A.: Natural Computation for Business Intelligence from Web Usage Mining. In: Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2005) (2005)
Google Scholar
Baumgarten, M., Bchner, A.G., Anand, S.S., Mulvenna, M.D., Hughes, J.G.: Navigation Pattern Discovery from Internet Data. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS, vol. 1836. Springer, Heidelberg (2000)
Chapter Google Scholar
Bezdek, J.C.: Numerical Taxonomy with Fuzzy Sets. J. Math. Biol. 1, 57–71 (1974)
Article MATH MathSciNet Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
MATH Google Scholar
Cooley, R.: Web Usage Mining: Discovery and Application of Interesting Patterns from web data, Ph.D. Thesis, University of Minnesota (2000)
Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. J. Knowledge and Information Systems 1(1), 5–32 (1999)
Google Scholar
Cooley, R., Srivastava, J., Mobasher, B.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 1997), pp. 558–567 (1997b)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Analysis and Machine Intelligence 1(4), 224–227 (1979)
Article Google Scholar
Pierrakos, D., Paliouras, G.O., Papatheodorou, C., Spyropoulos, C.D.: Web Usage Mining as a Tool for Personalization: A Survey. User Modeling and User-Adapted Interaction 13, 311–372 (2003)
Article Google Scholar
Dubes, R., Jain, A.K.: Validity studies in clustering methodologies. Pattern Recognition 11(1), 235–253 (1979)
Article MATH Google Scholar
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley Interscience, New York (1973)
MATH Google Scholar
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters. Journal Cybern. 3(3), 32–57 (1973)
Article MATH MathSciNet Google Scholar
Estivill-Castro, V., Yang, J.: Fast and robust general purpose clustering algorithms. In: Pacific Rim International Conference on Artificial intelligence, pp. 208–218 (1979)
Google Scholar
Flake, G.W., Lawrence, S., Lee Giles, C., Coetzee, F.M.: Self- organization and identification of Web communities. IEEE Computer 35(3), 66–71 (2002)
Google Scholar
Fu, Y., Sandhu, K., Shi, M.: Clustering of web users based on access patterns. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 21–38. Springer, Heidelberg (2000)
Chapter Google Scholar
Pallis, G., Angelis, L., Vakali, A.: Validation and interpretation of Web users’ sessions clusters. In: Information Processing and Management (2006)
Google Scholar
Peters, G.: Some refinements of rough k-means clustering. Pattern Recognition 39, 1481–1491 (2006)
Article MATH Google Scholar
Hannah Inbarani, H., Thangavel, K.: Clickstream Intelligent Clustering using Accelerated Ant Colony Algorithm. In: Advanced Computing and Communications, 2006. ADCOM 2006. International Conference I, pp. 129–134 (2006)
Google Scholar
Hannah Inbarani, H., Thangavel, K., Pethalakshmi, A.: Rough Set Based Feature Selection for Web Usage Mining. In: Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), pp. 33–38 (2007) ISBN:0-7695-3050-8
Google Scholar
Heer, J., Chi, E.: Mining the structure of user activity using cluster stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining, Arlington, VA (April 2002)
Google Scholar
Chang, H.-J., Hung, L.-P., Ho, C.-L.: An anticipation model of potential customers’ purchasing behavior based on clustering analysis and association rules analysis. Expert Systems with Applications 32, 753–764 (2007)
Article Google Scholar
Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993)
Article Google Scholar
Kuo, R.J., Wang, H.S., Hu, T.-L., Chou, S.H.: Application of Ant K-Means on Clustering Analysis. Computers and Mathematics with Applications 50, 1709–1724 (2005)
Article MATH MathSciNet Google Scholar
Lingras, P., West, C.: Interval Set Clustering of Web Users with Rough K-means. Journal of Intelligent Information Systems (2002)
Google Scholar
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Newman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Mobasher, B., Cooley, R., Srivastava, J.: Creating adaptive web sites through usage-based clustering of URLs. In: Proceedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX) (1999)
Google Scholar
Labroche, N., Monmarche, N., Venturini, G.: A new clustering algorithm based on the chemical recognition system of ants. In: Proceedings of 15th European Conference on Artificial Intelligence (ECAI 2002), Lyon FRANCE, pp. 345–349 (2002)
Google Scholar
Labroche, N., Monmarche, N., Venturini, G.: Web session clustering with artificial ants colonies’. In: Proc. of WWW 2003, May 20-24 (2003)
Google Scholar
Perkowitz, M., Etzioni O.: Adaptive sites: automatically learning from user access patterns. In: Proceedings of WWW6 (1997), www.scope.gmd.de/info/www6/posters/722/index.html
Song, Q., Shepperd, M.: Mining web browsing patterns for E-commerce. Computers in Industry 57, 622–630 (2006)
Article Google Scholar
Bucklin, R.E., Lattin, J.M., Ansari, A., Gupta, S., Bell, D., Coupey, E., Little, J.D.C., Mela, C., Montgomery, A., Steckel, J.: Choice And the Internet: From Clickstream to Research Stream. Marketing Letters 13(3), 245–258 (2002)
Article Google Scholar
Selamat, A., Sigeru, O.: Web page feature selection and classification using neural networks. Information Sciences 158, 69–88 (2004)
Article MathSciNet Google Scholar
Song, A.-B., Zhao, M.-X., Liang, Z.-P., Dong, Y.-S., Luo, J.-Z.: Discovering user profiles for Web personalization recommendation. Journal of Computer Science and Technology 19(3), 320–328 (2004)
Article Google Scholar
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.T.: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations 1(2), 2–23 (2000)
Article Google Scholar
Kumar De, S., Radha Krishna, P.: Clustering web transactions using rough approximation. Fuzzy Sets and Systems 148, 131–138 (2004)
Article MATH MathSciNet Google Scholar
Mitra, S.: Rough-Fuzzy Collaborative Clustering. IEEE Transactions on Systems, Man and Cybernetics 36(4) (2006)
Google Scholar
Thangavel, K., Ashok Kumar, D.: Pattern Clustering using Neural Network, Vision 2020: The Strategic role of Operational Research. Allied Publishers PVT LTD, New Delhi, pp. 662–679 (2006)
Google Scholar
Titterington, D., Smith, A., Makov, U.: Statistical analysis of finite mixture distributions. John Wiley and Sons, Chichester (1985)
MATH Google Scholar
Voges, K.E., Pope, N.K.L., Brown, M.R.: Cluster analysis of marketing data examining online shopping orientation: a comparison of k-means and rough clustering approaches. In: Abbass, H.A., Sarker, R.A., Newton, C.S. (eds.) Heuristics and Optimization for Knowledge Discovery, pp. 207–224. Idea Group Publishing, Hershey (2002)
Google Scholar
Wang, X., Abraham, A., Smith, K.: Intelligent web traffic mining and analysis. Journal of Network and Computer Applications 28(2), 147–165 (2005)
Article Google Scholar
WangBin, Liuzhijing: Web Mining Research. In: Proceedings of the Fifth nternational Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2003) (2003)
Google Scholar
Xing, W., Ghorbani, A.: Weighted PageRank Algorithm. In: Proceedings of the Second Annual Conference on Communication Networks and Services Research (CNSR 2004) (2004)
Google Scholar
W.W.W. Consortium. The Common Log file Format (1995), http://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format
Xie, X.L., Beni, G.: A Validity Measure for fuzzy Clustering. IEEE Trans. on Pattern Analysis and MachineIntelligence 13(8), 841–847 (1991)
Article Google Scholar
Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From user access patterns to dynamic hypertext linking. In: Proceedings of 5th WWW, pp. 1007–1014 (1996)
Google Scholar
Zhang, X., Gong, W., Kawamura, Y.: Customer behavior pattern discovering with web mining. In: Proceedings of Asia Pacific web conference, Hangzhou, China, pp. 844–853 (2004)
Google Scholar
Pabarskaite, Z., Raudys, A.: A process of knowledge discovery from web log data: Systematization and critical review. Journal of Intelligent Information Systems 28, 79–104 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Periyar University, Salem, 636 011,
H. Hannah Inbarani & K. Thangavel

Authors

H. Hannah Inbarani
View author publications
You can also search for this author in PubMed Google Scholar
K. Thangavel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Intelligence Research Labs, (MIR Labs), Scientific Network for Innovation and Research Excellence, Auburn, P.O. Box 2259, 98071-2259, Washington, USA
Ajith Abraham
College of Business Administration, Quantitative and Information System Department, Kuwait University, P.O. Box 5486, 13055, Safat, Kuwait
Aboul-Ella Hassanien
Department of Computer Science, University of São Paulo, Caixa Postal 668, 13560-970, Sao Carlos, SP, Brazil
André Ponce de Leon F. de Carvalho
Dept. Computer Science, Technical University Ostrava, Tr. 17. Listopadu 15, 708 33, Ostrava, Czech Republic
Václav Snášel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Inbarani, H.H., Thangavel, K. (2009). Mining and Analysis of Clickstream Patterns. In: Abraham, A., Hassanien, AE., de Leon F. de Carvalho, A.P., Snášel, V. (eds) Foundations of Computational, IntelligenceVolume 6. Studies in Computational Intelligence, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01091-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-01091-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01090-3
Online ISBN: 978-3-642-01091-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics