Refinement and evaluation of web session cluster quality

Dixit, V. S.; Bhatia, Shveta Kundra

doi:10.1007/s13198-014-0266-x

V. S. Dixit¹ &
Shveta Kundra Bhatia²

222 Accesses
3 Citations
Explore all metrics

Abstract

Refinement of web session clusters is an open research area these days. The basic reason for proposing the refinement algorithm is quite obvious because in any clustering algorithm the obtained clusters shall have some data items that are inappropriately clustered, hence never giving us 100 % quality. This inappropriateness can be improved through refinement and hence enhance the quality of clusters. In the proposed work, initial clusters are formed using K-Means clustering algorithm which suffers from local minima. The refinement on clusters is performed on the basis of access and time features Modified Knockout Refinement Algorithm (MKRA) which is a distance based dissimilarity measure. Refinement is also performed using Genetic Algorithm (GA), Particle Swarm Optimization (PSO), a combination of GA and PSO and a combination of MKRA, GA and PSO. The issue of local minima is overcome by a combination of GA and PSO. GA and PSO both find a true global optimal solution; GA suffers due to a costly fitness function and expensive computational cost which is resolved by PSO implemented in a linear fashion as it has better computational efficiency. Combination of GA and PSO help to overcome the problem of local minima. Results are evaluated on five synthetic datasets and three real datasets. Further it is shown experimentally that effectiveness of combining MKRA with evolutionary techniques produces well separated and cohesive clusters with improved quality. After getting refined clusters the same can be used to provide recommendations to the target user as an application of web usage clusters. Results show that the accuracy of recommender systems using refined clusters is better than the recommender system implemented using original clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ahmadyfard A, Modares H (2008) ‘Combining sPSO and K-Means to enhance data clustering’. International symposium on telecommunications. Published by IEEE, pp 688–691
Alam S, Dobbie G, Riddle P (2012) Towards recommender system using particle swarm optimization based web usage clustering’. LNAI 7104. Springer, Berlin, pp. 316–326
Asllani A, Lari A (2007) Using genetic algorithm for dynamic and multiple criteria web-site optimizations. Eur J Oper Res 176(3):1767–1777
Baldi P, Frasconi P, Smyth P (2003) Modeling the internet and the web. Wiley, New York, pp 1–296
Google Scholar
Banerjee A, Ghosh J (2001) Click stream clustering using weighted longest common subsequences. Proceedings of the web mining workshop at the 1st SIAM conference on data mining, pp 1–8
Bentley J (1975) Multidimensional Binary Search Trees Used for Associative Searching. ACM 18(9):509–517
Article MATH Google Scholar
Berkhin P (2006) Survey of clustering data mining techniques. Springer, Berlin
Bradley PS, Fayyad U, Reina C (1998) Scaling clustering algorithms to large databases. 4th international conference on knowledge discovery and data Mining (KDD-98). AAAI Press
Cadez IV, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Disc 7(4):399–424
Article Google Scholar
Castellano G, Fanelli AM, Mencar C, Torsello MA (2007) Similarity based Fuzzy clustering for user profiling. Proceedings of international conference on web intelligence and intelligent agent technology. IEEE/WIC/ACM, pp 75–78
Chakrabarti S (2003) Mining the web. Morgan Kaufmann Publishers, Burlington, pp 1–352
Deborah L, Baskaran R, Kannan A (2010) A Survey on Internal Validity Measure for Cluster Validation. Int J of Comput Sci Eng Surv (IJCSES) 1(2):85–102
Article Google Scholar
Dhillon IS, Fan J, Guan Y (2001) Efficient clustering of very large document collections. In: Grossman VKR, Kamath C, Namburu R (eds) Data mining for scientific and engineering applications. Kluwer Academic Publishers
Dixit VS, Bhatia SK (2014) Refinement of clusters based on dissimilarity measures. Int J Multidiscip Res Adv Eng (IJMRAE) 6(1):33–54
Google Scholar
Eiron N, McCurley KS (2003) Untangling compound documents on the Web. Proceedings of the fourteenth ACM conference on hypertext and hypermedia, pp 85–94
Elkan C. (2003) Using the triangle inequality to accelerate k-Means’. Proceedings of the twentieth international conference on machine learning (ICML-2003), pp 609–616
Flake G, Lawrence S, Giles CL, Coetzee F (2002) Self-organization and identification of Web Communities. IEEE Comput 35:66–71
Article Google Scholar
Fu Y, Sandhu K, Shih MY (1999) Clustering of Web users based on access patterns. Proceedings of WEBKDD, pp 1–6
Gonzales E, Mabu S, Taboada K, Hirasawa K (2010) ‘Web mining using Genetic Relation Algorithm’. SICE annual conference, pp. 1622–1627
Greco G, Greco S, Zumpano E (2004) Web communities: models and algorithms. J World Wide Web 7(1):58–82
Article Google Scholar
Hay B, Vanhoof K, Wetsr G (2001) Clustering navigation patterns on a Website using a sequence alignment method. Proceedings of 17th international joint conference on artificial intelligence, Seattle, Washington, USA, pp 1–6
Heer J, Chi EH (2002) Mining the structure of user activity using cluster stability. Proceedings of the workshop on web analytics, Second SIAM conference on data mining, ACM Press, pp 1–10
Kanungo T, Mount DM, Netanyahu N, Piatko C, Silverman R, Wu AY (2002) An efficient kmeans clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Article Google Scholar
Karypis G, Han E, Kumar V (1999) Multilevel Refinement for Hierarchical Clustering. Department of computer science & engineering army HPC research center
Kivi M, Azmi R (2011) A webpage similarity measure for web sessions clustering using sequence alignment. Proceedings of artificial intelligence and signal processing (AISP), 2011 international symposium. IEEE press
Krishna K, Murty MN (1999) Genetic K-Means algorithm. Published in: Systems, man, and cybernetics, Part B: Cybernetics, IEEE transactions in vol 29. Issue 3, pp 433–439
Liu P, Li W (2011) Navigation pattern discovery on web site based on the distance between sequences’. Artificial intelligence, Management science and Electronic commerce (AIMSEC). IEEE press, pp 2200–2202
Merwe VD, Engelbrecht AP (2003) Data clustering using particle swarm optimization. The 2003 congress on evolutionary computation, CEC 2003, vol 1, pp 215–220. IEEExplore
Mitchell M(1998) An introduction to genetic algorithms, Ch. 1–6. MIT Press, pp 1–203
Mobasher B, Dai H, Luo T, Nakagawa M (2000) Discovery of aggregate usage profiles for web personalization. In Proceedings of WebKDD 2000 Workshop at the ACM SIGKDD 2000, Boston, pp 142–151
Mobasher B, Jin X, Zhou Y (2003) Semantically enhanced collaborative filtering on the web. EWMF, pp 57–76
Nasraoui O, Frigui H, Joshi A, Krishnapuram R. (1999) Mining web access logs using relational competitive fuzzy clustering’. Presented at the eight international fuzzy systems association world congress–IFSA 99, Taipei
Nock R, Nielsen F (2006) On Weighting Clustering. IEEE Trans Pattern Anal Mach Intell 28(8):1223–1235
Article Google Scholar
Omran M, Salman A, Engelbrecht A (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8:332–344
Article MathSciNet Google Scholar
Oyanagi S, Kubota K, Nakase A (2001) Application of matrix clustering to web log analysis and access prediction. EBKDD 2001—Mining web log data across all customers touch points, Third international workshop, pp 13–21
Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy C Means and K-Means clustering techniques: A comprehensive study. Advances in computer science and applications. Advances in intelligent and soft computing. vol 166, Springer, Berlin, pp 451–460
Pelleg D, Moore A (1999. Accelerating exact kmeans algorithm with geometric reasoning. Proceedings of the fifth ACM SIGKDD International conference on KnowledgeDiscovery and Data mining, New York, pp 727–734
Rios A, Silva A, Aguilera F (2012) ‘A Dissimilarity Measure for Automate Moderation in Online Social Networks’. Proceedings of the 4th international workshop on web intelligence & communities. WIC’12 (April). Article Number 3
Sanghoun O, Chang WA, Moongu J (2008) An evolutionary cluster validation index. Bio-Inspired computing: Theories and applications’. BICTA 2008. IEEE Press, pp 83–88
Scholkopf B, Smola J, Muller R (1998) Technical report nonlinear component analysis as a kernel eigen value problem. Neural Comput 10(5):1299–1319
Article Google Scholar
Shahabe C, Zarkesh AM, Abidi J, Shah V (1997) Knowledge discovery from user’s web-page navigation. Proceedings seventh IEEE international workshop on research issues in data engineering (RIDE). pp 20–29
Shahabi C, Kashani F (2002) A framework for efficient and anonymous web usage mining based on client-side tracking. WEBKDD 2001—Mining web log data across all customers touch points, THIRD international workshop, San Francisco, CA, USA, August 26, 2001. Revised papers, vol 2356 of Lecture Notes in Comp Sc, Springer. pp 113–144
Sujatha N, Iyakutty K (2010) Refinement of web usage data clustering from K-means with genetic algorithm. Eur. J. Sci. Res 42(3):478–490
Google Scholar
Wang, W. and Zaane, OR. (2002) Clustering web sessions by sequence alignment. Proceedings of the 13th international workshop on database and expert systems applications Washington, DC. IEEE Computer Society, pp 394–398
Xiao X, Dow ER, Eberhart R, Miled ZB, Oppelt RJ (2003) ‘Gene clustering using self-organizing maps and particle swarm optimization’. ISPA 2003. LNCS, vol 2745. pp 154–160. Springer, Heidelberg (2003)
Xie Y, Phoha VV (2001) Web user clustering from access log using belief function. Proceedings of the first international conference on knowledge capture (K-CAP 2001), ACM press. pp 202–208
Xu J, Liu H (2010) Web user clustering analysis based on K-Means algorithm’. International conference on information networking and automation. IEEE, vol 2. pp 6–9
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Networks 16(3):645–678
Article Google Scholar
Yin J, Sun H, Yang J, Guo Q (2014) Comparison of K-Means and Fuzzy c-Means algorithm performance for automated determination of the arterial input function. PLoS One 9(2):e85884. doi:10.1371/journal.pone.0085884
Article Google Scholar
Ypma A, Heskes T (2002) Clustering web surfers with mixtures of hidden markov models. Proceedings of the 14th Belgian–Dutch Conference on AI (BNAIC_02)

Download references

Author information

Authors and Affiliations

Atma Ram Sanatan Dharma College, University of Delhi, Delhi, India
V. S. Dixit
University of Delhi, Delhi, India
Shveta Kundra Bhatia

Authors

V. S. Dixit
View author publications
You can also search for this author in PubMed Google Scholar
Shveta Kundra Bhatia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. S. Dixit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dixit, V.S., Bhatia, S.K. Refinement and evaluation of web session cluster quality. Int J Syst Assur Eng Manag 6, 373–389 (2015). https://doi.org/10.1007/s13198-014-0266-x

Download citation

Received: 20 February 2014
Revised: 02 May 2014
Published: 08 June 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s13198-014-0266-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Refinement and evaluation of web session cluster quality

Abstract

Access this article

Similar content being viewed by others

Evaluation of Web Session Cluster Quality Based on Access-Time Dissimilarity and Evolutionary Algorithms

Algorithm for Clustering of Web Search Results from a Hyper-heuristic Approach

Application of Particle Swarm Optimization and User Clustering in Web Search

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Refinement and evaluation of web session cluster quality

Abstract

Access this article

Similar content being viewed by others

Evaluation of Web Session Cluster Quality Based on Access-Time Dissimilarity and Evolutionary Algorithms

Algorithm for Clustering of Web Search Results from a Hyper-heuristic Approach

Application of Particle Swarm Optimization and User Clustering in Web Search

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation