Skip to main content

Advertisement

Log in

One in a million: picking the right patterns

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine leaning technique could make use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose two general heuristic algorithms—Bouncer and Picker—for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the developed solutions are very well suited for the goals we aim at.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bonchi F, Lucchese C (2006) On condensed representations of constrained frequent patterns. Knowl Inf Syst 9: 180–201

    Article  Google Scholar 

  2. Borgelt C (2004) Recursion pruning for the apriori algorithm. In: Bayardo RJ Jr, Goethals B, Zaki MJ (eds) FIMI’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, November 1, 2004. CEUR Worshop Proceedings, CEUR-WS.org, vol 126

  3. Boulicaut J-F, Jeudy B (2001) Mining free itemsets under constraints. In: Adiba ME, Collet C, Desai BC (eds) IDEAS, pp 322–329

  4. Bringmann B, Zimmermann A (2005) Tree2—decision trees for tree structured data. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J(eds) PKDD. Springer, Berlin, pp 46–58

    Google Scholar 

  5. Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Elomaa T, Mannila H, Toivonen H(eds) PKDD. Springer, Berlin, pp 74–85

    Google Scholar 

  6. Chandola V, Kumar V (2007) Summarization—compressing data into an informative representation. Knowl Inf Syst 12: 355–378

    Article  Google Scholar 

  7. Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5: 845–889

    MathSciNet  Google Scholar 

  8. Knobbe AJ, Ho EKY (2006) Pattern teams. In: Fürnkranz J, Scheffer T, Spiliopoulou M(eds) PKDD. Springer, Berlin, pp 577–584

    Google Scholar 

  9. Landwehr N, Passerini A, Raedt LD, Frasconi P (2006) kfoil: learning simple relational kernels. In: AAAI. AAAI Press, Menlo Park

  10. Lavrac N, Gamberger D (2004) Relevancy in constraint-based subgroup discovery. In: Boulicaut J-F, Raedt LD, Mannila H(eds) Constraint-based mining and inductive databases. Springer, Berlin, pp 243–266

    Google Scholar 

  11. Lent B, Swami AN, Widom J (1997) Clustering association rules. In: Gray WA, Larson P-Å (eds) ICDE, pp 220–231

  12. Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3): 241–258

    Article  Google Scholar 

  13. Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6: 570–594

    Article  Google Scholar 

  14. Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J(eds) SDM. SIAM, Philadelphia

    Google Scholar 

  15. Taouil R, Pasquier N, Bastide Y, Lakhal L (2000) Mining bases for association rules using closed sets. In: ICDE, p 307

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Albrecht Zimmermann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bringmann, B., Zimmermann, A. One in a million: picking the right patterns. Knowl Inf Syst 18, 61–81 (2009). https://doi.org/10.1007/s10115-008-0136-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0136-4

Keywords

Navigation