Skip to main content

Advertisement

Log in

Linguistic frequent pattern mining using a compressed structure

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Traditional association-rule mining (ARM) considers only the frequency of items in a binary database, which provides insufficient knowledge for making efficient decisions and strategies. The mining of useful information from quantitative databases is not a trivial task compared to conventional algorithms in ARM. Fuzzy-set theory was invented to represent a more valuable form of knowledge for human reasoning, which can also be applied and utilized for quantitative databases. Many approaches have adopted fuzzy-set theory to transform the quantitative value into linguistic terms with its corresponding degree based on defined membership functions for the discovery of FFIs, also known as fuzzy frequent itemsets. Only linguistic terms with maximal scalar cardinality are considered in traditional fuzzy frequent itemset mining, but the uncertainty factor is not involved in past approaches. In this paper, an efficient fuzzy mining (EFM) algorithm is presented to quickly discover multiple FFIs from quantitative databases under type-2 fuzzy-set theory. A compressed fuzzy-list (CFL)-structure is developed to maintain complete information for rule generation. Two pruning techniques are developed for reducing the search space and speeding up the mining process. Several experiments are carried out to verify the efficiency and effectiveness of the designed approach in terms of runtime, the number of examined nodes, memory usage, and scalability under different minimum support thresholds and different linguistic terms used in the membership functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://www.philippe-fournier-viger.com/spmf/

  2. http://www.Almaden.ibm.com/cs/768quest/syndata.html

References

  1. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD record, pp 207–216

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The International conference on very large databases, pp 487–499

  3. Au WH, Chan KCC (1998) An effective algorithm for discovering fuzzy rules in relational databases. In: IEEE International conference on fuzzy systems, pp 1314–1319

  4. Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 6:866–883

    Article  Google Scholar 

  5. Li C, Yan B, Tang M, Yi J, Zhang X (2018) Data driven hybrid fuzzy model for short-term traffic flow prediction. J Intell Fuzzy Sys 35:6525–6536

    Article  Google Scholar 

  6. Chen JS, Chen FG, Wang JY (2012) Enhance the multi-level fuzzy association rules based on cumulative probability distribution approach. The ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing, pp 89–94

  7. Chen CH, Hong TP, Li Y (2015) Fuzzy association rule mining with type-2 membership functions. Lect Notes Comput Sci, 128–134

  8. Gan W, Lin JCW, Fournier-Viger P, Chao HC, Tseng VS, Yu PS (2017) FDHUP: fast algorithm for mining discriminative high utility patterns. Knowl Inf Syst 51(3):873–909

    Article  Google Scholar 

  9. Gupta PK, Muhuri PK (2020) Perceptual reasoning based solution methodology for linguistic optimization problems. arXiv:https://arxiv.org/abs/2004.14933

  10. Holland J (1975) Adaptation in natural and artificial systems. MIT Press, Cambridge

    Google Scholar 

  11. Han J, Fu Y (1995) Discovery of multiple-level association rules from large databases. In: The international conference on very large data bases, pp 420–431

  12. Hong TP, Kuo CS, Chi SC (1999) Mining association rules from quantitative data. Intell Data Analy 3:363–376

    MATH  Google Scholar 

  13. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Discov 8:53–87

    Article  MathSciNet  Google Scholar 

  14. Hagras H (2008) Type-2 fuzzy logic controllers: a way forward for fuzzy systems in real world environments. Lect Notes Comput Sci, 181–200

  15. Hong TP, Lan GC, Lin YH, Pan ST (2013) An effective gradual data-reduction strategy for fuzzy itemset mining. Int J Fuzzy Syst 15(2):170–181

    Google Scholar 

  16. Hong TP, Lin CW, Lin TC (2014) The MFFP-tree fuzzy mining algorithm to discover complete linguistci frequent itemsets. Comput Intell 30:145–166

    Article  MathSciNet  Google Scholar 

  17. Karnik NN, Mendel JM (1998) Introduction to type-2 fuzzy logic systems. In: International conference on fuzzy systems, pp 915–920

  18. Kuok CM, Fu A, Wong MH (1998) Mining fuzzy association rules in databases. ACM SIGMOD record 27:41–46

    Article  Google Scholar 

  19. Kar S, Kabir MMJ (2019) Comparative analysis of mining fuzzy association rule using genetic algorithm. In: The international conference on electrical, computer and communication engineering, pp 1–5

  20. Lin CW, Hong TP, Lu WH (2009) The pre-FUFP algorithm for incremental mining. Expert Syst Appl 36:9498–9505

    Article  Google Scholar 

  21. Lin CW, Hong TP, Lu WH (2010) Linguistic data mining with fuzzy FP-trees. Expert Syst Appl 37:4560–4567

    Article  Google Scholar 

  22. Lin CW, Hong TP, Lu WH (2010) An efficient tree-based fuzzy data mining approach. Int J Fuzzy Syst 12:150–157

    Google Scholar 

  23. Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev Data Min Knowl Discov 3:190–199

    Article  Google Scholar 

  24. Lin CW, Hong TP (2014) Mining fuzzy frequent itemsets based on UBFFP trees. J Intell Fuzzy Syst 27:535–548

    Article  MathSciNet  Google Scholar 

  25. Lin JCW, Hong TP, Lin TC (2015) A CMFFP-tree algorithm to mine complete multiple fuzzy frequent itemsets. Appl Soft Comput 28:431–439

    Article  Google Scholar 

  26. Lin JCW, Hong TP, Lin TC, Pan ST (2015) An UBMFFP tree for mining multiple fuzzy frequent itemsets, International journal of uncertainty. Fuzz Knowl-Based Syst 23:861–879

    Article  Google Scholar 

  27. Lin JCW, Li T, Fournier-Viger P, Hong TP (2015) A fast algorithm for mining fuzzy frequent itemsets. J Intell Fuzz Syst 29:2373–2379

    Article  Google Scholar 

  28. Lin JCW, Lv X, Fournier-Viger P, Wu TY, Hong TP (2016) Efficient mining of fuzzy frequent itemsets with type-2 membership functions. In: The Asian conference on intelligent information and database systems, pp 191–200

  29. Lin JCW, Yang L, Fournier-Viger P, Wu JMT, Hong TP, Wang LSL, Zhan J (2016) Mining high-utility itemsets based on particle swarm optimization. Eng Appl Artif Intel 55:320–330

    Article  Google Scholar 

  30. Fournier-Viger P, Lin CW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1:54–77

    Google Scholar 

  31. Lin JCW, Gan W, Fournier-Viger P, Hong TP, Chao HC (2017) Mining of skyline patterns by considering both frequent and utility constraints. Knowl Inf Syst 51(3):873–909

    Article  Google Scholar 

  32. Lin JCW, Srivastava G, Djenouri Y, Zhang Y, Aloqaily M (2020) Privacy preserving multi-objective sanitization model in 6G IoT environments. IEEE Internet of Things Journal

  33. Lin JCW, Shao Y, Djenouri Y, Yun U (2020) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowledge-based Systems

  34. Mendel JM, John RIB (2002) Type-2 fuzzy sets made simple. IEEE Trans Fuzzy Syst 10:117–127

    Article  Google Scholar 

  35. Mishra D, Mishra S, Satapathy SK, Patnaik S (2012) Genetic algorithm based fuzzy frequent pattern mining from gene expression data. Soft computing techniques in vision science, pp 1–14

  36. Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: The SIGMOD international conference on management of data, pp 1–12

  37. Shukla AK, Muhuri PK (2019) Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets. Eng Appl Artif Intel 77:268–282

    Article  Google Scholar 

  38. Srivastava DK, Roychoudhury B, Samalia HV (2019) Fuzzy association rule mining for economic development indicators. Int J Intell Enterprise 6(1):3–18

    Article  Google Scholar 

  39. Srivastava G, Lin JCW, Zhang X, Li Y (2020) Large-scale high-utility sequential pattern analytics in Internet of things. IEEE Internet of Things Journal

  40. Srivastava G, Lin JCW, Jolfaei A, Li Y, Djenouri Y (2020) Uncertain-driven analytics of sequence data in IoCV environments. IEEE trans Intell Transp Syst

  41. Watanabe T, Fujioka R (2012) Fuzzy association rules mining algorithm based on equivalence redundancy of items. In: IEEE International conference on systems, man, and cybernetics, pp 1960–1965

  42. Wu JMT, Lin JCW, Tamrakar A (2019) High-utility itemset mining with effective pruning strategies. ACM transactions on knowledge discovery from data, 13, Article 58

  43. Wang L, Ma Q, Meng J (2019) Incremental fuzzy association rule mining for classification and regression. IEEE Access 7:121095–121110

    Article  Google Scholar 

  44. Wu TY, Lin JCW, Yun U, Chen CH, Srivastava G, Lv X (2020) An efficient algorithm for fuzzy frequent itemset mining. J Intell Fuzzy Syst, 1–11

  45. Zadeh LA (1965) . Fuzzy sets, Inf Control 8:338–353

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerry Chun-Wei Lin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 1

For an termset X, if Sup(X) or rSup(X) is less than the minimum support threshold, then any supersets (extension) of X is not multiple fuzzy frequent pattern and should be pruned.

Proof

∀ transaction \(T\supseteq X^{\prime }\),

  • \(\because \)

  • \(X^{\prime }\) is an extension of X, \((X^{\prime } - X) = (X^{\prime }/X)\), we can obtain that \(X\subseteq X^{\prime }\subseteq T\Rightarrow (X^{\prime }/X)\subseteq (T/X)\),

  • \(\therefore \)

  • \( fv(X^{\prime }, T) = fv(X, T)\cup fv((X^{\prime } - X), T) = min(fv(X, T), fv(X^{\prime }/X, T))\leq fv(X, T)\) and \(min(fv(X, T), fv(X^{\prime }/X, T))\leq fv(X^{\prime }/X, T) = rmrfv(X, T)\).

Suppose that X.tids denotes the set of tids of X,

  • \(\because \)

  • \( X\subseteq X^{\prime }\Rightarrow X^{\prime }.tids\subseteq X.tids\),

  • \(\therefore \)

  • \(\frac {{\sum }_{id(T)\in X^{\prime }.tids}fv(X^{\prime }, T)}{N}\leq \frac {{\sum }_{id(T)\in X.tids}fv(X, T)}{N}\Rightarrow Sup(X) < minSup\).

Furthermore, we can obtain that \(\frac {{\sum }_{id(T)\in X^{\prime }.tids}rmrfv(X^{\prime }, T)}{N}\leq \frac {{\sum }_{id(T)\in X.tids}rmrfv(X, T)}{N}\Rightarrow rSup(X) < minSup\). □

Lemma 2

For a termset X, if Sup(X) or relative remaining support rSup(X) is less than the minimum support threshold, then any supersets (extension) of X is not a MFFP and should be discarded.

Proof

  • \(\because \)

  • \( X\subseteq X^{\prime }\Rightarrow X^{\prime }.tids\subseteq X.tids\),

  • \(\therefore \)

  • \( Sup(X^{\prime }) = \frac {{\sum }_{id(T)\in X.tids}fv(X^{\prime }, T)}{N} = \frac {{\sum }_{id(T)\in X^{\prime }.tids}min(fv(X, T), fv(X^{\prime }/X, T)}{N}\\ \leq \frac {{\sum }_{id(T)\in X^{\prime }.tids}min(fv(X, T), rmrfv(X, T)}{N} = \frac {{\sum }_{id(T)\in Q^{\prime }}fv(X, T) +{\sum }_{id(T)\in Q^{\prime \prime }}rmrfv(X, T)}{N}= rSup(X)\leq minSup\).

Note that suppose \(Q^{\prime }\cup Q^{\prime \prime } = X^{\prime }.tids\) and \(Q^{\prime }\cap Q^{\prime \prime } = \null \), \(T\in Q^{\prime }, fv(X, T) < rmrfv(X, T)\), and \(T\in Q^{\prime }, fv(X, T)\geq rmrfv(X, T)\). □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, J.CW., Ahmed, U., Srivastava, G. et al. Linguistic frequent pattern mining using a compressed structure. Appl Intell 51, 4806–4823 (2021). https://doi.org/10.1007/s10489-020-02080-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02080-w

Keywords

Navigation