Abstract
Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine leaning technique could make use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose two general heuristic algorithms—Bouncer and Picker—for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the developed solutions are very well suited for the goals we aim at.
Similar content being viewed by others
References
Bonchi F, Lucchese C (2006) On condensed representations of constrained frequent patterns. Knowl Inf Syst 9: 180–201
Borgelt C (2004) Recursion pruning for the apriori algorithm. In: Bayardo RJ Jr, Goethals B, Zaki MJ (eds) FIMI’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, November 1, 2004. CEUR Worshop Proceedings, CEUR-WS.org, vol 126
Boulicaut J-F, Jeudy B (2001) Mining free itemsets under constraints. In: Adiba ME, Collet C, Desai BC (eds) IDEAS, pp 322–329
Bringmann B, Zimmermann A (2005) Tree2—decision trees for tree structured data. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J(eds) PKDD. Springer, Berlin, pp 46–58
Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Elomaa T, Mannila H, Toivonen H(eds) PKDD. Springer, Berlin, pp 74–85
Chandola V, Kumar V (2007) Summarization—compressing data into an informative representation. Knowl Inf Syst 12: 355–378
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5: 845–889
Knobbe AJ, Ho EKY (2006) Pattern teams. In: Fürnkranz J, Scheffer T, Spiliopoulou M(eds) PKDD. Springer, Berlin, pp 577–584
Landwehr N, Passerini A, Raedt LD, Frasconi P (2006) kfoil: learning simple relational kernels. In: AAAI. AAAI Press, Menlo Park
Lavrac N, Gamberger D (2004) Relevancy in constraint-based subgroup discovery. In: Boulicaut J-F, Raedt LD, Mannila H(eds) Constraint-based mining and inductive databases. Springer, Berlin, pp 243–266
Lent B, Swami AN, Widom J (1997) Clustering association rules. In: Gray WA, Larson P-Å (eds) ICDE, pp 220–231
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3): 241–258
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6: 570–594
Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J(eds) SDM. SIAM, Philadelphia
Taouil R, Pasquier N, Bastide Y, Lakhal L (2000) Mining bases for association rules using closed sets. In: ICDE, p 307
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bringmann, B., Zimmermann, A. One in a million: picking the right patterns. Knowl Inf Syst 18, 61–81 (2009). https://doi.org/10.1007/s10115-008-0136-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-008-0136-4