Skip to main content

A Fast Association Rule Mining Algorithm for Corpus

  • Conference paper
  • First Online:
Practical Applications of Intelligent Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 279))

  • 1252 Accesses

Abstract

In this paper, we propose a new algorithm for mining association rules in corpus efficiently. Compared to classical transactional association rule mining problems, corpus contains large amount of items, and what is more, there are by far more item sets in corpus, and traditional association rule mining algorithm cannot handle corpus efficiently. To address this issue, a new algorithm, which combines the techniques of inverted hashing and the advantage of FP-Growth structure, is designed with enough considerations on the characteristic of corpus. Experimental results demonstrate that the new algorithm has gained a great promotion on performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. H JD, Chung SM (2001) Multipass algorithms for mining association rules in text databases. Knowl Inf Syst 3(2):168–183

    Article  Google Scholar 

  2. Holt JD, Chung SM, Li Y (2008) Usage of mined word associations for text retrieval. In: 19th IEEE international conference on tools with artificial intelligence, pp 45–49

    Google Scholar 

  3. Li G, Zhang X, Yoo I, Zhou X (2009) A text mining method for discovering hidden links. In: IEEE international conference on granular computing, GRC, pp 326–328

    Google Scholar 

  4. Zhu Z, Wang J-Y (2007) Book recommendation service by improved association rule mining algorithm. In: Proceedings of 2007 international conference on machine learning and cybernetics, vol 1–7, pp 3864–3869

    Google Scholar 

  5. Qiu J, Tang C, Zeng T, Qiao S, Zuo J, Chen P, Zhu J (2007) A novel text classification approach based on enhanced association rule. Lect Notes Comput Sci 4632:252–263 (LNAI)

    Google Scholar 

  6. Wong MK, Abidi SSR, Jonsen ID (2011) Mining non-taxonomic concept pairs from unstructured text: a concept correlation search framework. In: WEBIST 2011—proceedings of the 7th international conference on web information systems and technologies, pp 707–716

    Google Scholar 

  7. Chun-Ling C, Frank T, Tyne L (2009) An integration of fuzzy association rules and WordNet for document clustering. Lect Notes Comput Sci 5476, pp 147–159 (LNAI)

    Google Scholar 

  8. Liu S-Z, Hu H-P (2007) Text classification using sentential frequent itemsets. J Comput Sci Technol 22(2):334–337

    Article  Google Scholar 

  9. Agrawal R, Srikant R (1994). Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB Conference, pp 487–499

    Google Scholar 

  10. Park S, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5):813–825

    Google Scholar 

  11. Kamber M, Han J (2000) Data mining: concepts and techniques. Morgan Kaufmann, USA

    Google Scholar 

  12. Holt JD, Chung SM (2007) Parallel mining of association rules from text databases. J Supercomput 39(3):273–299

    Article  Google Scholar 

  13. Holt JD, Chung SM (2002) Mining association rules in text databases using multipass with inverted hashing and pruning. In: 14th IEEE international conference on tools with artificial intelligence, proceedings, pp 49-56

    Google Scholar 

  14. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87

    Article  MathSciNet  Google Scholar 

  15. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46

    Article  Google Scholar 

  16. The Sogou corpus, The R&D Center of SOHU. http://www.sogou.com/labs/dl/t.html

  17. Cheung DW, Ng VT, Ada WF (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6):911–922

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shankai Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yan, S., Zhang, P. (2014). A Fast Association Rule Mining Algorithm for Corpus. In: Wen, Z., Li, T. (eds) Practical Applications of Intelligent Systems. Advances in Intelligent Systems and Computing, vol 279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54927-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54927-4_43

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54926-7

  • Online ISBN: 978-3-642-54927-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics