Linguistic frequent pattern mining using a compressed structure

Lin, Jerry Chun-Wei; Ahmed, Usman; Srivastava, Gautam; Wu, Jimmy Ming-Tai; Hong, Tzung-Pei; Djenouri, Youcef

doi:10.1007/s10489-020-02080-w

Linguistic frequent pattern mining using a compressed structure

Published: 06 January 2021

Volume 51, pages 4806–4823, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jerry Chun-Wei Lin ORCID: orcid.org/0000-0001-8768-9709¹,
Usman Ahmed¹,
Gautam Srivastava^2,3,
Jimmy Ming-Tai Wu⁴,
Tzung-Pei Hong⁵ &
…
Youcef Djenouri⁶

6 Citations
Explore all metrics

Abstract

Traditional association-rule mining (ARM) considers only the frequency of items in a binary database, which provides insufficient knowledge for making efficient decisions and strategies. The mining of useful information from quantitative databases is not a trivial task compared to conventional algorithms in ARM. Fuzzy-set theory was invented to represent a more valuable form of knowledge for human reasoning, which can also be applied and utilized for quantitative databases. Many approaches have adopted fuzzy-set theory to transform the quantitative value into linguistic terms with its corresponding degree based on defined membership functions for the discovery of FFIs, also known as fuzzy frequent itemsets. Only linguistic terms with maximal scalar cardinality are considered in traditional fuzzy frequent itemset mining, but the uncertainty factor is not involved in past approaches. In this paper, an efficient fuzzy mining (EFM) algorithm is presented to quickly discover multiple FFIs from quantitative databases under type-2 fuzzy-set theory. A compressed fuzzy-list (CFL)-structure is developed to maintain complete information for rule generation. Two pruning techniques are developed for reducing the search space and speeding up the mining process. Several experiments are carried out to verify the efficiency and effectiveness of the designed approach in terms of runtime, the number of examined nodes, memory usage, and scalability under different minimum support thresholds and different linguistic terms used in the membership functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Mining of Multiple Fuzzy Frequent Itemsets

Article 06 September 2016

Mining Frequent Fuzzy Itemsets Using Node-List

Fuzzy Maximal Frequent Itemset Mining Over Quantitative Databases

Notes

References

Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD record, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The International conference on very large databases, pp 487–499
Au WH, Chan KCC (1998) An effective algorithm for discovering fuzzy rules in relational databases. In: IEEE International conference on fuzzy systems, pp 1314–1319
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 6:866–883
Article Google Scholar
Li C, Yan B, Tang M, Yi J, Zhang X (2018) Data driven hybrid fuzzy model for short-term traffic flow prediction. J Intell Fuzzy Sys 35:6525–6536
Article Google Scholar
Chen JS, Chen FG, Wang JY (2012) Enhance the multi-level fuzzy association rules based on cumulative probability distribution approach. The ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing, pp 89–94
Chen CH, Hong TP, Li Y (2015) Fuzzy association rule mining with type-2 membership functions. Lect Notes Comput Sci, 128–134
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Tseng VS, Yu PS (2017) FDHUP: fast algorithm for mining discriminative high utility patterns. Knowl Inf Syst 51(3):873–909
Article Google Scholar
Gupta PK, Muhuri PK (2020) Perceptual reasoning based solution methodology for linguistic optimization problems. arXiv:https://arxiv.org/abs/2004.14933
Holland J (1975) Adaptation in natural and artificial systems. MIT Press, Cambridge
Google Scholar
Han J, Fu Y (1995) Discovery of multiple-level association rules from large databases. In: The international conference on very large data bases, pp 420–431
Hong TP, Kuo CS, Chi SC (1999) Mining association rules from quantitative data. Intell Data Analy 3:363–376
MATH Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Discov 8:53–87
Article MathSciNet Google Scholar
Hagras H (2008) Type-2 fuzzy logic controllers: a way forward for fuzzy systems in real world environments. Lect Notes Comput Sci, 181–200
Hong TP, Lan GC, Lin YH, Pan ST (2013) An effective gradual data-reduction strategy for fuzzy itemset mining. Int J Fuzzy Syst 15(2):170–181
Google Scholar
Hong TP, Lin CW, Lin TC (2014) The MFFP-tree fuzzy mining algorithm to discover complete linguistci frequent itemsets. Comput Intell 30:145–166
Article MathSciNet Google Scholar
Karnik NN, Mendel JM (1998) Introduction to type-2 fuzzy logic systems. In: International conference on fuzzy systems, pp 915–920
Kuok CM, Fu A, Wong MH (1998) Mining fuzzy association rules in databases. ACM SIGMOD record 27:41–46
Article Google Scholar
Kar S, Kabir MMJ (2019) Comparative analysis of mining fuzzy association rule using genetic algorithm. In: The international conference on electrical, computer and communication engineering, pp 1–5
Lin CW, Hong TP, Lu WH (2009) The pre-FUFP algorithm for incremental mining. Expert Syst Appl 36:9498–9505
Article Google Scholar
Lin CW, Hong TP, Lu WH (2010) Linguistic data mining with fuzzy FP-trees. Expert Syst Appl 37:4560–4567
Article Google Scholar
Lin CW, Hong TP, Lu WH (2010) An efficient tree-based fuzzy data mining approach. Int J Fuzzy Syst 12:150–157
Google Scholar
Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev Data Min Knowl Discov 3:190–199
Article Google Scholar
Lin CW, Hong TP (2014) Mining fuzzy frequent itemsets based on UBFFP trees. J Intell Fuzzy Syst 27:535–548
Article MathSciNet Google Scholar
Lin JCW, Hong TP, Lin TC (2015) A CMFFP-tree algorithm to mine complete multiple fuzzy frequent itemsets. Appl Soft Comput 28:431–439
Article Google Scholar
Lin JCW, Hong TP, Lin TC, Pan ST (2015) An UBMFFP tree for mining multiple fuzzy frequent itemsets, International journal of uncertainty. Fuzz Knowl-Based Syst 23:861–879
Article Google Scholar
Lin JCW, Li T, Fournier-Viger P, Hong TP (2015) A fast algorithm for mining fuzzy frequent itemsets. J Intell Fuzz Syst 29:2373–2379
Article Google Scholar
Lin JCW, Lv X, Fournier-Viger P, Wu TY, Hong TP (2016) Efficient mining of fuzzy frequent itemsets with type-2 membership functions. In: The Asian conference on intelligent information and database systems, pp 191–200
Lin JCW, Yang L, Fournier-Viger P, Wu JMT, Hong TP, Wang LSL, Zhan J (2016) Mining high-utility itemsets based on particle swarm optimization. Eng Appl Artif Intel 55:320–330
Article Google Scholar
Fournier-Viger P, Lin CW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1:54–77
Google Scholar
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Chao HC (2017) Mining of skyline patterns by considering both frequent and utility constraints. Knowl Inf Syst 51(3):873–909
Article Google Scholar
Lin JCW, Srivastava G, Djenouri Y, Zhang Y, Aloqaily M (2020) Privacy preserving multi-objective sanitization model in 6G IoT environments. IEEE Internet of Things Journal
Lin JCW, Shao Y, Djenouri Y, Yun U (2020) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowledge-based Systems
Mendel JM, John RIB (2002) Type-2 fuzzy sets made simple. IEEE Trans Fuzzy Syst 10:117–127
Article Google Scholar
Mishra D, Mishra S, Satapathy SK, Patnaik S (2012) Genetic algorithm based fuzzy frequent pattern mining from gene expression data. Soft computing techniques in vision science, pp 1–14
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: The SIGMOD international conference on management of data, pp 1–12
Shukla AK, Muhuri PK (2019) Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets. Eng Appl Artif Intel 77:268–282
Article Google Scholar
Srivastava DK, Roychoudhury B, Samalia HV (2019) Fuzzy association rule mining for economic development indicators. Int J Intell Enterprise 6(1):3–18
Article Google Scholar
Srivastava G, Lin JCW, Zhang X, Li Y (2020) Large-scale high-utility sequential pattern analytics in Internet of things. IEEE Internet of Things Journal
Srivastava G, Lin JCW, Jolfaei A, Li Y, Djenouri Y (2020) Uncertain-driven analytics of sequence data in IoCV environments. IEEE trans Intell Transp Syst
Watanabe T, Fujioka R (2012) Fuzzy association rules mining algorithm based on equivalence redundancy of items. In: IEEE International conference on systems, man, and cybernetics, pp 1960–1965
Wu JMT, Lin JCW, Tamrakar A (2019) High-utility itemset mining with effective pruning strategies. ACM transactions on knowledge discovery from data, 13, Article 58
Wang L, Ma Q, Meng J (2019) Incremental fuzzy association rule mining for classification and regression. IEEE Access 7:121095–121110
Article Google Scholar
Wu TY, Lin JCW, Yun U, Chen CH, Srivastava G, Lv X (2020) An efficient algorithm for fuzzy frequent itemset mining. J Intell Fuzzy Syst, 1–11
Zadeh LA (1965) . Fuzzy sets, Inf Control 8:338–353
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin & Usman Ahmed
Department of Mathematics & Computer Science, Brandon University, Brandon, Canada
Gautam Srivastava
Research Centre for Interneural Computing, China Medical University, Taichung, Taiwan
Gautam Srivastava
College of Computer Science and Engineering, Shandong University of Science & Technology, Shandong, China
Jimmy Ming-Tai Wu
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
SINTEF Digital, Mathematics and Cybernetics, Oslo, Norway
Youcef Djenouri

Authors

Jerry Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Usman Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Jimmy Ming-Tai Wu
View author publications
You can also search for this author in PubMed Google Scholar
Tzung-Pei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Youcef Djenouri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerry Chun-Wei Lin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Lemma 1

For an termset X, if Sup(X) or rSup(X) is less than the minimum support threshold, then any supersets (extension) of X is not multiple fuzzy frequent pattern and should be pruned.

Proof

∀ transaction \(T\supseteq X^{\prime }\),

\(\because \)
\(X^{\prime }\) is an extension of X, \((X^{\prime } - X) = (X^{\prime }/X)\), we can obtain that \(X\subseteq X^{\prime }\subseteq T\Rightarrow (X^{\prime }/X)\subseteq (T/X)\),
\(\therefore \)
\( fv(X^{\prime }, T) = fv(X, T)\cup fv((X^{\prime } - X), T) = min(fv(X, T), fv(X^{\prime }/X, T))\leq fv(X, T)\) and \(min(fv(X, T), fv(X^{\prime }/X, T))\leq fv(X^{\prime }/X, T) = rmrfv(X, T)\).

Suppose that X.tids denotes the set of tids of X,

\(\because \)
\( X\subseteq X^{\prime }\Rightarrow X^{\prime }.tids\subseteq X.tids\),
\(\therefore \)
\(\frac {{\sum }_{id(T)\in X^{\prime }.tids}fv(X^{\prime }, T)}{N}\leq \frac {{\sum }_{id(T)\in X.tids}fv(X, T)}{N}\Rightarrow Sup(X) < minSup\).

Furthermore, we can obtain that \(\frac {{\sum }_{id(T)\in X^{\prime }.tids}rmrfv(X^{\prime }, T)}{N}\leq \frac {{\sum }_{id(T)\in X.tids}rmrfv(X, T)}{N}\Rightarrow rSup(X) < minSup\). □

Lemma 2

For a termset X, if Sup(X) or relative remaining support rSup(X) is less than the minimum support threshold, then any supersets (extension) of X is not a MFFP and should be discarded.

Proof

\(\because \)
\( X\subseteq X^{\prime }\Rightarrow X^{\prime }.tids\subseteq X.tids\),
\(\therefore \)
\( Sup(X^{\prime }) = \frac {{\sum }_{id(T)\in X.tids}fv(X^{\prime }, T)}{N} = \frac {{\sum }_{id(T)\in X^{\prime }.tids}min(fv(X, T), fv(X^{\prime }/X, T)}{N}\\ \leq \frac {{\sum }_{id(T)\in X^{\prime }.tids}min(fv(X, T), rmrfv(X, T)}{N} = \frac {{\sum }_{id(T)\in Q^{\prime }}fv(X, T) +{\sum }_{id(T)\in Q^{\prime \prime }}rmrfv(X, T)}{N}= rSup(X)\leq minSup\).

Note that suppose \(Q^{\prime }\cup Q^{\prime \prime } = X^{\prime }.tids\) and \(Q^{\prime }\cap Q^{\prime \prime } = \null \), \(T\in Q^{\prime }, fv(X, T) < rmrfv(X, T)\), and \(T\in Q^{\prime }, fv(X, T)\geq rmrfv(X, T)\). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, J.CW., Ahmed, U., Srivastava, G. et al. Linguistic frequent pattern mining using a compressed structure. Appl Intell 51, 4806–4823 (2021). https://doi.org/10.1007/s10489-020-02080-w

Download citation

Accepted: 13 November 2020
Published: 06 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10489-020-02080-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linguistic frequent pattern mining using a compressed structure

Abstract

Access this article

Similar content being viewed by others

Efficient Mining of Multiple Fuzzy Frequent Itemsets

Mining Frequent Fuzzy Itemsets Using Node-List

Fuzzy Maximal Frequent Itemset Mining Over Quantitative Databases

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Lemma 1

Proof

Lemma 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Linguistic frequent pattern mining using a compressed structure

Abstract

Access this article

Similar content being viewed by others

Efficient Mining of Multiple Fuzzy Frequent Itemsets

Mining Frequent Fuzzy Itemsets Using Node-List

Fuzzy Maximal Frequent Itemset Mining Over Quantitative Databases

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Lemma 1

Proof

Lemma 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation