A Fast Association Rule Mining Algorithm for Corpus

Yan, Shankai; Zhang, Pingjian

doi:10.1007/978-3-642-54927-4_43

Shankai Yan⁴ &
Pingjian Zhang⁴

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 279))

1252 Accesses

Abstract

In this paper, we propose a new algorithm for mining association rules in corpus efficiently. Compared to classical transactional association rule mining problems, corpus contains large amount of items, and what is more, there are by far more item sets in corpus, and traditional association rule mining algorithm cannot handle corpus efficiently. To address this issue, a new algorithm, which combines the techniques of inverted hashing and the advantage of FP-Growth structure, is designed with enough considerations on the characteristic of corpus. Experimental results demonstrate that the new algorithm has gained a great promotion on performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

H JD, Chung SM (2001) Multipass algorithms for mining association rules in text databases. Knowl Inf Syst 3(2):168–183
Article Google Scholar
Holt JD, Chung SM, Li Y (2008) Usage of mined word associations for text retrieval. In: 19th IEEE international conference on tools with artificial intelligence, pp 45–49
Google Scholar
Li G, Zhang X, Yoo I, Zhou X (2009) A text mining method for discovering hidden links. In: IEEE international conference on granular computing, GRC, pp 326–328
Google Scholar
Zhu Z, Wang J-Y (2007) Book recommendation service by improved association rule mining algorithm. In: Proceedings of 2007 international conference on machine learning and cybernetics, vol 1–7, pp 3864–3869
Google Scholar
Qiu J, Tang C, Zeng T, Qiao S, Zuo J, Chen P, Zhu J (2007) A novel text classification approach based on enhanced association rule. Lect Notes Comput Sci 4632:252–263 (LNAI)
Google Scholar
Wong MK, Abidi SSR, Jonsen ID (2011) Mining non-taxonomic concept pairs from unstructured text: a concept correlation search framework. In: WEBIST 2011—proceedings of the 7th international conference on web information systems and technologies, pp 707–716
Google Scholar
Chun-Ling C, Frank T, Tyne L (2009) An integration of fuzzy association rules and WordNet for document clustering. Lect Notes Comput Sci 5476, pp 147–159 (LNAI)
Google Scholar
Liu S-Z, Hu H-P (2007) Text classification using sentential frequent itemsets. J Comput Sci Technol 22(2):334–337
Article Google Scholar
Agrawal R, Srikant R (1994). Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB Conference, pp 487–499
Google Scholar
Park S, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5):813–825
Google Scholar
Kamber M, Han J (2000) Data mining: concepts and techniques. Morgan Kaufmann, USA
Google Scholar
Holt JD, Chung SM (2007) Parallel mining of association rules from text databases. J Supercomput 39(3):273–299
Article Google Scholar
Holt JD, Chung SM (2002) Mining association rules in text databases using multipass with inverted hashing and pruning. In: 14th IEEE international conference on tools with artificial intelligence, proceedings, pp 49-56
Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Article MathSciNet Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
Article Google Scholar
The Sogou corpus, The R&D Center of SOHU. http://www.sogou.com/labs/dl/t.html
Cheung DW, Ng VT, Ada WF (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6):911–922
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Engineering, South China University of Technology, Guangzhou, People’s Republic of China
Shankai Yan & Pingjian Zhang

Authors

Shankai Yan
View author publications
You can also search for this author in PubMed Google Scholar
Pingjian Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shankai Yan .

Editor information

Editors and Affiliations

College of Computer and Software Engineering, Shenzhen University, Shenzhen, China
Zhenkun Wen
School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China
Tianrui Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, S., Zhang, P. (2014). A Fast Association Rule Mining Algorithm for Corpus. In: Wen, Z., Li, T. (eds) Practical Applications of Intelligent Systems. Advances in Intelligent Systems and Computing, vol 279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54927-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-54927-4_43
Published: 19 July 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54926-7
Online ISBN: 978-3-642-54927-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics