Abstract.
In this paper, we propose two new algorithms for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, Apriori algorithm and Direct Hashing and Pruning (DHP) algorithm, are evaluated in the context of mining text databases, and are compared with the new proposed algorithms named Multipass-Apriori (M-Apriori) and Multipass-DHP (M-DHP). It has been shown that the proposed algorithms have better performance for large text databases.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Received 12 November 1999 / Revised 27 September 2000 / Accepted in revised form 25 October 2000
Rights and permissions
About this article
Cite this article
Holt, J., Chung, S. Multipass Algorithms for Mining Association Rules in Text Databases. Knowledge and Information Systems 3, 168–183 (2001). https://doi.org/10.1007/PL00011664
Issue Date:
DOI: https://doi.org/10.1007/PL00011664