One in a million: picking the right patterns

Bringmann, Björn; Zimmermann, Albrecht

doi:10.1007/s10115-008-0136-4

One in a million: picking the right patterns

Regular Paper
Published: 28 March 2008

Volume 18, pages 61–81, (2009)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Björn Bringmann¹ &
Albrecht Zimmermann¹

122 Accesses
28 Citations
Explore all metrics

Abstract

Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine leaning technique could make use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose two general heuristic algorithms—Bouncer and Picker—for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the developed solutions are very well suited for the goals we aim at.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bonchi F, Lucchese C (2006) On condensed representations of constrained frequent patterns. Knowl Inf Syst 9: 180–201
Article Google Scholar
Borgelt C (2004) Recursion pruning for the apriori algorithm. In: Bayardo RJ Jr, Goethals B, Zaki MJ (eds) FIMI’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, November 1, 2004. CEUR Worshop Proceedings, CEUR-WS.org, vol 126
Boulicaut J-F, Jeudy B (2001) Mining free itemsets under constraints. In: Adiba ME, Collet C, Desai BC (eds) IDEAS, pp 322–329
Bringmann B, Zimmermann A (2005) Tree²—decision trees for tree structured data. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J(eds) PKDD. Springer, Berlin, pp 46–58
Google Scholar
Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Elomaa T, Mannila H, Toivonen H(eds) PKDD. Springer, Berlin, pp 74–85
Google Scholar
Chandola V, Kumar V (2007) Summarization—compressing data into an informative representation. Knowl Inf Syst 12: 355–378
Article Google Scholar
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5: 845–889
MathSciNet Google Scholar
Knobbe AJ, Ho EKY (2006) Pattern teams. In: Fürnkranz J, Scheffer T, Spiliopoulou M(eds) PKDD. Springer, Berlin, pp 577–584
Google Scholar
Landwehr N, Passerini A, Raedt LD, Frasconi P (2006) kfoil: learning simple relational kernels. In: AAAI. AAAI Press, Menlo Park
Lavrac N, Gamberger D (2004) Relevancy in constraint-based subgroup discovery. In: Boulicaut J-F, Raedt LD, Mannila H(eds) Constraint-based mining and inductive databases. Springer, Berlin, pp 243–266
Google Scholar
Lent B, Swami AN, Widom J (1997) Clustering association rules. In: Gray WA, Larson P-Å (eds) ICDE, pp 220–231
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3): 241–258
Article Google Scholar
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6: 570–594
Article Google Scholar
Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J(eds) SDM. SIAM, Philadelphia
Google Scholar
Taouil R, Pasquier N, Bastide Y, Lakhal L (2000) Mining bases for association rules using closed sets. In: ICDE, p 307

Download references

Author information

Authors and Affiliations

Departement Computerwetenschappen, Katholieke Universiteit Leuven, Celestijnenlaan 200a, 3001, Heverlee, Belgium
Björn Bringmann & Albrecht Zimmermann

Authors

Björn Bringmann
View author publications
You can also search for this author in PubMed Google Scholar
Albrecht Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Albrecht Zimmermann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bringmann, B., Zimmermann, A. One in a million: picking the right patterns. Knowl Inf Syst 18, 61–81 (2009). https://doi.org/10.1007/s10115-008-0136-4

Download citation

Received: 29 October 2007
Revised: 24 December 2007
Accepted: 29 January 2008
Published: 28 March 2008
Issue Date: January 2009
DOI: https://doi.org/10.1007/s10115-008-0136-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

One in a million: picking the right patterns

Abstract

Access this article

Similar content being viewed by others

Introduction to Pattern Mining

The minimum description length principle for pattern mining: a survey

Mining and Using Sets of Patterns through Compression

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

One in a million: picking the right patterns

Abstract

Access this article

Similar content being viewed by others

Introduction to Pattern Mining

The minimum description length principle for pattern mining: a survey

Mining and Using Sets of Patterns through Compression

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation