Journal of Computer Science and Technology

, Volume 23, Issue 1, pp 77–102

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

Regular Paper

DOI: 10.1007/s11390-008-9107-1

Cite this article as:
Kunkle, D., Zhang, D. & Cooperman, G. J. Comput. Sci. Technol. (2008) 23: 77. doi:10.1007/s11390-008-9107-1

Abstract

This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified.

Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four.

In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classification-based algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.

Keywords

generalized association rules frequent generalized itemsets redundancy avoidance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

Copyright information

© Science Press, Beijing, China and Springer Science + Business Media, LLC, USA 2008

Authors and Affiliations

  1. 1.College of Computer and Information ScienceNortheastern UniversityBostonU.S.A.

Personalised recommendations