Advertisement

Threshold based partial partitioning fuzzy means clustering algorithm (TPPFMCA) for pattern discovery

  • 12 Accesses

Abstract

Fuzzy C-means is very popular data clustering algorithm use in many systems modeling to determine system behavior in concise way. However, it requires the specifications of numbers of clusters in advance, which is not feasible in many system modeling. Prediction Models, based on web caching and prefetching, are such modeling systems in which to predict numbers of clusters in advance is quite impossible so Fuzzy C-means algorithm is in original way is not suit to such modeling. In this paper, new clustering algorithm is proposed which is fusion of Fuzzy Means and threshold concepts. The paper exhibits experiments of the proposed algorithm in context to web caching and prefetching model. The paper also compares the result of this algorithm with familiar Markov Model.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM 1.0: step-by-step data mining guide, NCR Systems Engineering Copenhagen (USA and Denmark), DaimlerChrysler AG (Germany), SPSS Inc. (USA) and OHRA Verzekeringenen Bank Group B.V (The Netherlands)

  2. 2.

    Aggarwal CC, Yu PS (1999) Data mining techniques for associations. Clustering and classification, PAKDD’99. Springer, New York, pp 13–23

  3. 3.

    Yang Q, Zhang HH (2001) Integrating web prefetching and caching using prediction models. World Wide Web 4(4):299–321

  4. 4.

    Gracia CD, Dev Anand M, Sudha S (2017) Prefetching in information superhighway—a retrospective study. Int J Pure Appl Math 116(22):187–204

  5. 5.

    Yang Q, Zhang HH (2003) Web-log mining for predictive web caching. IEEE Trans Knowl Data Eng 15(4):1050–1053

  6. 6.

    Bonchi F, Giannotti F, Gozzi C, Manco G, Nanni M, Pedreschi D, Renso C, Ruggieri S (2001) Web log data warehousing and mining for intelligent web caching. Data Knowl Eng 39(2):165–189

  7. 7.

    Cadez I, Heckerman D, Meek C, Smyth P, Whire S (2000) Visualization of navigation patterns on a website using model based clustering. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 280–284

  8. 8.

    Chen Y, Qiu L, Chen W, Nguyen L, Katz RH (2003) Efficient and adaptive web replication using content clustering. IEEE J Sel Areas Commun 21(6):979–994

  9. 9.

    Pallis G, Vakali A, Pokorny J (2008) A clustering-based prefetching scheme on a web cache environment. Comput Electr Eng 34:309–323

  10. 10.

    Ho LY, Wu JJ, Liu P, Shih CC, Huang CC, Huang CW (2017) Efficient cache update for in-memory cluster computing with spark. In: 17th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID)

  11. 11.

    Wan M, Jönsson A, Wang C, Li L, Yang Y (2012) Web user clustering and Web prefetching using random indexing with weight functions. Knowl Inf Syst 33(1):89–115

  12. 12.

    Leung CK, Jiang F, Pazdor AGM (2017) Bitwise parallel association rule mining for web page recommendation, WI’17, August 23–26. ACM, Leipzig, pp 662–669

  13. 13.

    Han J, Kamber M (2011) Datamining concepts and techniques, 2nd edn. Elsevier, Amsterdam, pp 285–295

  14. 14.

    Phyu TN (2009) Survey of classification techniques in datamining. In: Proceedings of the international multi conference of engineers and computer scientists, vol 1

  15. 15.

    Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers, Burlington

  16. 16.

    Gosain A, Bhugra MA (2013) Comprehensive survey of association rules on quantitative data in data mining. In: Proceedings of IEEE conference on information & communication technologies. JeJu Island, pp 1003–1008

  17. 17.

    Olmezogullari E, Ari I (2013) Online association rule mining over fast data. In: IEEE international congress on big data (BigData Congress). IEEE, Santa Clara, pp 110–117

  18. 18.

    Jain A, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

  19. 19.

    Beeferman D, Berger A (2000) Agglomerative clustering of a search engine query log. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 407–416

  20. 20.

    Xu R, Wunsch D (2005) Survey of clustering algorithm. IEEE Trans Neural Netw 16(3):645–678. https://doi.org/10.1109/TNN.2005.845141

  21. 21.

    Onan A (2017) A K-medoids based clustering scheme with an application to document clustering. In: 2017 international conference on computer science and engineering (UBMK). IEEE, Antalya, Turkey

  22. 22.

    Umamaheswari M, Isakki Devi P (2017) Prediction of myocardial infarction using K-medoid clustering algorithm. In: 2017 IEEE international conference on intelligent techniques in control, optimization and signal processing (INCOS). IEEE, Srivilliputhur, India

  23. 23.

    Rakhlin A, Caponnetto A (2007) Stability of K-means clusterin, advances in neural information processing systems. MIT Press, Cambridge, pp 216–222

  24. 24.

    Park HS, Lee JS, Jun CH (2009) A k-means-like algorithm for k-medoids clustering and its performance. Department of Industrial and Management Engineering, POSTECH

  25. 25.

    Kapoor A, Singhal A (2017) A comparative study of K-means, K-means++ and fuzzy C-means clustering algorithms. In: 2017 3rd international conference on computational intelligence & communication technology (CICT). IEEE, Ghaziabad, India

  26. 26.

    Bezdek JC, Trivedi M, Ehrlich R, Full W (1982) Fuzzy clustering; a new approach for geo statistical analysis. Int J Syst Meas Decisions

  27. 27.

    Yong Y, Chongxun Z, Pan L (2004) A novel fuzzy C-means clustering algorithm for image thresholding. Meas Sci Rev 4(1):11–19

  28. 28.

    Hussain T, Asghar S, Masood N (2011) Web log cleaning for mining of web usage patterns. IEEE ICCRD, pp 490–494

  29. 29.

    Srivastava J, Cooley R, Despande M, Tan P (2000) Web usage mining: discovery and applications of usage patterns from web data. SIGKDD explorations, pp 12–23

  30. 30.

    Cooley R (2000) Web usage mining: discovery and application of interesting patterns from web data. Ph.D. thesis, University of Minnesota

  31. 31.

    Amir A, Lewenstein M, Porat E (2004) Faster algorithms for string matching with K-mismatches. J Algorithms 50(2):257–275

  32. 32.

    Gomaa NH, Fahmy AA (2012) Short answer grading using string similarity and corpus-based similarity. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2012.031119

  33. 33.

    Baeza Yates R, Perleberg C (1992) Fast and practical approximate string matching algorithms. In: Apostolico A et al (eds) Combinatorial pattern matching, proceedings of third annual symposium, lecture notes in computer science, vol 644. Springer, Berlin, pp 185–192

  34. 34.

    Patel D (2016) Study of distance measurement techniques in context to prediction model of web caching and web prefetching. Int J Soft Comput Artif Intell Appl 5(1):1–8

Download references

Author information

Correspondence to Dharmendra Patel.

Appendix

Appendix

  1. 1.

    Pattern discovery steps and coding based on Markov Model

    Markovian Test-1

    • Test description To generate occurrence matrix that determines occurrences of particular web object from current state.

    • Result Occurrence matrix is generated (refer Table 5.3)

    • Tools used Microsoft Excel Tool is used for this experiment. One macro is generating to determine number of occurrences.

    • Macro code Following code is generated for that.

      figurea

    Markovian Test-2

    Test description To generate transition probability matrix based on current state.

    In order to generate transition probability matrix number of tests is carried out.

    1. (a)

      Test 1 Determine summation of number of occurrences from current state to all other states.

      Tools used Microsoft Excel

      Query SUM(X: Y) Where X and Y are cell numbers.

      Result It generates summation figure from current state to all other states.

    2. (b)

      Test 2 Generate transition probability from current state to all other states.

      • Tools used Microsoft Excel

      • Query SUM(X: Y)/N Where N is addition that is generated from test-1.

      • Result It generates transition probability value of every cell from one cell to another.

    3. (c)

      Test 3 To determine maximum value of transition probability in order to predict next web object.

      • Tools used Microsoft Excel.

      • Query MAX(X: Y).

      • Result Prediction of Next Web Object.

  2. 2.

    Pattern discovery steps and coding based on distance measurement techniques

    Test-1

    • Test description To determine distance measure between web sessions according to Lavensthein distance measurement technique.

    • Tool used One online tool is used to determine distance measure between web sessions. Reference is http://asecuritysite.com/forensics/simstring.

    • Results One metric with distance value is generated as a result of this test.

    Test-2

    • Test description To determine proximity of different web sessions according to Lavensthein measurement technique.

    • Tool used Microsoft Excel tool is used to determine proximity based on conditional formatting option. Metric generated in previous test result is used as an input.

    • Results As results of this test number of sessions involved in each cluster is determined based on particular threshold value.

    Test-3

    • Test description To determine accuracy of pattern.

    • Tool used Microsoft Excel tool is used to determine accuracy of pattern. Accuracy of pattern is determine by taking average of each permutation combination web session pair.

    • Results Accuracy value is generating for each pattern.

    Test-4

    • Test description To determine mean and standard deviation in order to take appropriate action.

    • Tool used Microsoft Excel tool is used to determine mean and standard deviation of patterns generated at specific threshold value.

    • Results Mean and standard deviation of patterns are generated as a result of test.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Patel, D. Threshold based partial partitioning fuzzy means clustering algorithm (TPPFMCA) for pattern discovery. Int. j. inf. tecnol. (2019) doi:10.1007/s41870-019-00343-5

Download citation

Keywords

  • Fuzzy C-means
  • Clustering
  • Web caching
  • Web prefetching
  • Markov model