Abstract
In this paper, we propose a new algorithm for mining association rules in corpus efficiently. Compared to classical transactional association rule mining problems, corpus contains large amount of items, and what is more, there are by far more item sets in corpus, and traditional association rule mining algorithm cannot handle corpus efficiently. To address this issue, a new algorithm, which combines the techniques of inverted hashing and the advantage of FP-Growth structure, is designed with enough considerations on the characteristic of corpus. Experimental results demonstrate that the new algorithm has gained a great promotion on performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
H JD, Chung SM (2001) Multipass algorithms for mining association rules in text databases. Knowl Inf Syst 3(2):168–183
Holt JD, Chung SM, Li Y (2008) Usage of mined word associations for text retrieval. In: 19th IEEE international conference on tools with artificial intelligence, pp 45–49
Li G, Zhang X, Yoo I, Zhou X (2009) A text mining method for discovering hidden links. In: IEEE international conference on granular computing, GRC, pp 326–328
Zhu Z, Wang J-Y (2007) Book recommendation service by improved association rule mining algorithm. In: Proceedings of 2007 international conference on machine learning and cybernetics, vol 1–7, pp 3864–3869
Qiu J, Tang C, Zeng T, Qiao S, Zuo J, Chen P, Zhu J (2007) A novel text classification approach based on enhanced association rule. Lect Notes Comput Sci 4632:252–263 (LNAI)
Wong MK, Abidi SSR, Jonsen ID (2011) Mining non-taxonomic concept pairs from unstructured text: a concept correlation search framework. In: WEBIST 2011—proceedings of the 7th international conference on web information systems and technologies, pp 707–716
Chun-Ling C, Frank T, Tyne L (2009) An integration of fuzzy association rules and WordNet for document clustering. Lect Notes Comput Sci 5476, pp 147–159 (LNAI)
Liu S-Z, Hu H-P (2007) Text classification using sentential frequent itemsets. J Comput Sci Technol 22(2):334–337
Agrawal R, Srikant R (1994). Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB Conference, pp 487–499
Park S, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5):813–825
Kamber M, Han J (2000) Data mining: concepts and techniques. Morgan Kaufmann, USA
Holt JD, Chung SM (2007) Parallel mining of association rules from text databases. J Supercomput 39(3):273–299
Holt JD, Chung SM (2002) Mining association rules in text databases using multipass with inverted hashing and pruning. In: 14th IEEE international conference on tools with artificial intelligence, proceedings, pp 49-56
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
The Sogou corpus, The R&D Center of SOHU. http://www.sogou.com/labs/dl/t.html
Cheung DW, Ng VT, Ada WF (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6):911–922
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yan, S., Zhang, P. (2014). A Fast Association Rule Mining Algorithm for Corpus. In: Wen, Z., Li, T. (eds) Practical Applications of Intelligent Systems. Advances in Intelligent Systems and Computing, vol 279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54927-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-54927-4_43
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54926-7
Online ISBN: 978-3-642-54927-4
eBook Packages: EngineeringEngineering (R0)