Fuzzy C-means is very popular data clustering algorithm use in many systems modeling to determine system behavior in concise way. However, it requires the specifications of numbers of clusters in advance, which is not feasible in many system modeling. Prediction Models, based on web caching and prefetching, are such modeling systems in which to predict numbers of clusters in advance is quite impossible so Fuzzy C-means algorithm is in original way is not suit to such modeling. In this paper, new clustering algorithm is proposed which is fusion of Fuzzy Means and threshold concepts. The paper exhibits experiments of the proposed algorithm in context to web caching and prefetching model. The paper also compares the result of this algorithm with familiar Markov Model.
This is a preview of subscription content, log in to check access.
Buy single article
Instant unlimited access to the full article PDF.
Price includes VAT for USA
Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM 1.0: step-by-step data mining guide, NCR Systems Engineering Copenhagen (USA and Denmark), DaimlerChrysler AG (Germany), SPSS Inc. (USA) and OHRA Verzekeringenen Bank Group B.V (The Netherlands)
Aggarwal CC, Yu PS (1999) Data mining techniques for associations. Clustering and classification, PAKDD’99. Springer, New York, pp 13–23
Yang Q, Zhang HH (2001) Integrating web prefetching and caching using prediction models. World Wide Web 4(4):299–321
Gracia CD, Dev Anand M, Sudha S (2017) Prefetching in information superhighway—a retrospective study. Int J Pure Appl Math 116(22):187–204
Yang Q, Zhang HH (2003) Web-log mining for predictive web caching. IEEE Trans Knowl Data Eng 15(4):1050–1053
Bonchi F, Giannotti F, Gozzi C, Manco G, Nanni M, Pedreschi D, Renso C, Ruggieri S (2001) Web log data warehousing and mining for intelligent web caching. Data Knowl Eng 39(2):165–189
Cadez I, Heckerman D, Meek C, Smyth P, Whire S (2000) Visualization of navigation patterns on a website using model based clustering. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 280–284
Chen Y, Qiu L, Chen W, Nguyen L, Katz RH (2003) Efficient and adaptive web replication using content clustering. IEEE J Sel Areas Commun 21(6):979–994
Pallis G, Vakali A, Pokorny J (2008) A clustering-based prefetching scheme on a web cache environment. Comput Electr Eng 34:309–323
Ho LY, Wu JJ, Liu P, Shih CC, Huang CC, Huang CW (2017) Efficient cache update for in-memory cluster computing with spark. In: 17th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID)
Wan M, Jönsson A, Wang C, Li L, Yang Y (2012) Web user clustering and Web prefetching using random indexing with weight functions. Knowl Inf Syst 33(1):89–115
Leung CK, Jiang F, Pazdor AGM (2017) Bitwise parallel association rule mining for web page recommendation, WI’17, August 23–26. ACM, Leipzig, pp 662–669
Han J, Kamber M (2011) Datamining concepts and techniques, 2nd edn. Elsevier, Amsterdam, pp 285–295
Phyu TN (2009) Survey of classification techniques in datamining. In: Proceedings of the international multi conference of engineers and computer scientists, vol 1
Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers, Burlington
Gosain A, Bhugra MA (2013) Comprehensive survey of association rules on quantitative data in data mining. In: Proceedings of IEEE conference on information & communication technologies. JeJu Island, pp 1003–1008
Olmezogullari E, Ari I (2013) Online association rule mining over fast data. In: IEEE international congress on big data (BigData Congress). IEEE, Santa Clara, pp 110–117
Jain A, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Beeferman D, Berger A (2000) Agglomerative clustering of a search engine query log. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 407–416
Xu R, Wunsch D (2005) Survey of clustering algorithm. IEEE Trans Neural Netw 16(3):645–678. https://doi.org/10.1109/TNN.2005.845141
Onan A (2017) A K-medoids based clustering scheme with an application to document clustering. In: 2017 international conference on computer science and engineering (UBMK). IEEE, Antalya, Turkey
Umamaheswari M, Isakki Devi P (2017) Prediction of myocardial infarction using K-medoid clustering algorithm. In: 2017 IEEE international conference on intelligent techniques in control, optimization and signal processing (INCOS). IEEE, Srivilliputhur, India
Rakhlin A, Caponnetto A (2007) Stability of K-means clusterin, advances in neural information processing systems. MIT Press, Cambridge, pp 216–222
Park HS, Lee JS, Jun CH (2009) A k-means-like algorithm for k-medoids clustering and its performance. Department of Industrial and Management Engineering, POSTECH
Kapoor A, Singhal A (2017) A comparative study of K-means, K-means++ and fuzzy C-means clustering algorithms. In: 2017 3rd international conference on computational intelligence & communication technology (CICT). IEEE, Ghaziabad, India
Bezdek JC, Trivedi M, Ehrlich R, Full W (1982) Fuzzy clustering; a new approach for geo statistical analysis. Int J Syst Meas Decisions
Yong Y, Chongxun Z, Pan L (2004) A novel fuzzy C-means clustering algorithm for image thresholding. Meas Sci Rev 4(1):11–19
Hussain T, Asghar S, Masood N (2011) Web log cleaning for mining of web usage patterns. IEEE ICCRD, pp 490–494
Srivastava J, Cooley R, Despande M, Tan P (2000) Web usage mining: discovery and applications of usage patterns from web data. SIGKDD explorations, pp 12–23
Cooley R (2000) Web usage mining: discovery and application of interesting patterns from web data. Ph.D. thesis, University of Minnesota
Amir A, Lewenstein M, Porat E (2004) Faster algorithms for string matching with K-mismatches. J Algorithms 50(2):257–275
Gomaa NH, Fahmy AA (2012) Short answer grading using string similarity and corpus-based similarity. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2012.031119
Baeza Yates R, Perleberg C (1992) Fast and practical approximate string matching algorithms. In: Apostolico A et al (eds) Combinatorial pattern matching, proceedings of third annual symposium, lecture notes in computer science, vol 644. Springer, Berlin, pp 185–192
Patel D (2016) Study of distance measurement techniques in context to prediction model of web caching and web prefetching. Int J Soft Comput Artif Intell Appl 5(1):1–8
Pattern discovery steps and coding based on Markov Model
Test description To generate occurrence matrix that determines occurrences of particular web object from current state.
Result Occurrence matrix is generated (refer Table 5.3)
Tools used Microsoft Excel Tool is used for this experiment. One macro is generating to determine number of occurrences.
Macro code Following code is generated for that.
Test description To generate transition probability matrix based on current state.
In order to generate transition probability matrix number of tests is carried out.
Test 1 Determine summation of number of occurrences from current state to all other states.
Tools used Microsoft Excel
Query SUM(X: Y) Where X and Y are cell numbers.
Result It generates summation figure from current state to all other states.
Test 2 Generate transition probability from current state to all other states.
Tools used Microsoft Excel
Query SUM(X: Y)/N Where N is addition that is generated from test-1.
Result It generates transition probability value of every cell from one cell to another.
Test 3 To determine maximum value of transition probability in order to predict next web object.
Tools used Microsoft Excel.
Query MAX(X: Y).
Result Prediction of Next Web Object.
Pattern discovery steps and coding based on distance measurement techniques
Test description To determine distance measure between web sessions according to Lavensthein distance measurement technique.
Tool used One online tool is used to determine distance measure between web sessions. Reference is http://asecuritysite.com/forensics/simstring.
Results One metric with distance value is generated as a result of this test.
Test description To determine proximity of different web sessions according to Lavensthein measurement technique.
Tool used Microsoft Excel tool is used to determine proximity based on conditional formatting option. Metric generated in previous test result is used as an input.
Results As results of this test number of sessions involved in each cluster is determined based on particular threshold value.
Test description To determine accuracy of pattern.
Tool used Microsoft Excel tool is used to determine accuracy of pattern. Accuracy of pattern is determine by taking average of each permutation combination web session pair.
Results Accuracy value is generating for each pattern.
Test description To determine mean and standard deviation in order to take appropriate action.
Tool used Microsoft Excel tool is used to determine mean and standard deviation of patterns generated at specific threshold value.
Results Mean and standard deviation of patterns are generated as a result of test.
About this article
Cite this article
Patel, D. Threshold based partial partitioning fuzzy means clustering algorithm (TPPFMCA) for pattern discovery. Int. j. inf. tecnol. (2019) doi:10.1007/s41870-019-00343-5
- Fuzzy C-means
- Web caching
- Web prefetching
- Markov model