Knowledge and Information Systems

, Volume 18, Issue 1, pp 61–81 | Cite as

One in a million: picking the right patterns

  • Björn Bringmann
  • Albrecht ZimmermannEmail author
Regular Paper


Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine leaning technique could make use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose two general heuristic algorithms—Bouncer and Picker—for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the developed solutions are very well suited for the goals we aim at.


Data mining Post processing Pattern reduction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bonchi F, Lucchese C (2006) On condensed representations of constrained frequent patterns. Knowl Inf Syst 9: 180–201CrossRefGoogle Scholar
  2. 2.
    Borgelt C (2004) Recursion pruning for the apriori algorithm. In: Bayardo RJ Jr, Goethals B, Zaki MJ (eds) FIMI’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, November 1, 2004. CEUR Worshop Proceedings,, vol 126Google Scholar
  3. 3.
    Boulicaut J-F, Jeudy B (2001) Mining free itemsets under constraints. In: Adiba ME, Collet C, Desai BC (eds) IDEAS, pp 322–329Google Scholar
  4. 4.
    Bringmann B, Zimmermann A (2005) Tree2—decision trees for tree structured data. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J(eds) PKDD. Springer, Berlin, pp 46–58Google Scholar
  5. 5.
    Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Elomaa T, Mannila H, Toivonen H(eds) PKDD. Springer, Berlin, pp 74–85Google Scholar
  6. 6.
    Chandola V, Kumar V (2007) Summarization—compressing data into an informative representation. Knowl Inf Syst 12: 355–378CrossRefGoogle Scholar
  7. 7.
    Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5: 845–889MathSciNetGoogle Scholar
  8. 8.
    Knobbe AJ, Ho EKY (2006) Pattern teams. In: Fürnkranz J, Scheffer T, Spiliopoulou M(eds) PKDD. Springer, Berlin, pp 577–584Google Scholar
  9. 9.
    Landwehr N, Passerini A, Raedt LD, Frasconi P (2006) kfoil: learning simple relational kernels. In: AAAI. AAAI Press, Menlo ParkGoogle Scholar
  10. 10.
    Lavrac N, Gamberger D (2004) Relevancy in constraint-based subgroup discovery. In: Boulicaut J-F, Raedt LD, Mannila H(eds) Constraint-based mining and inductive databases. Springer, Berlin, pp 243–266Google Scholar
  11. 11.
    Lent B, Swami AN, Widom J (1997) Clustering association rules. In: Gray WA, Larson P-Å (eds) ICDE, pp 220–231Google Scholar
  12. 12.
    Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3): 241–258CrossRefGoogle Scholar
  13. 13.
    Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6: 570–594CrossRefGoogle Scholar
  14. 14.
    Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J(eds) SDM. SIAM, PhiladelphiaGoogle Scholar
  15. 15.
    Taouil R, Pasquier N, Bastide Y, Lakhal L (2000) Mining bases for association rules using closed sets. In: ICDE, p 307Google Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  1. 1.Departement ComputerwetenschappenKatholieke Universiteit LeuvenHeverleeBelgium

Personalised recommendations