Advertisement

From Local Patterns to Classification Models

  • Björn BringmannEmail author
  • Siegfried Nijssen
  • Albrecht Zimmermann
Chapter

Abstract

Using pattern mining techniques for building a predictive model is currently a popular topic of research. The aim of these techniques is to obtain classifiers of better predictive performance as compared to greedily constructed models, as well as to allow the construction of predictive models for data not represented in attribute-value vectors. In this chapter we provide an overview of recent techniques we developed for integrating pattern mining and classification tasks. The range of techniques spans the entire range from approaches that select relevant patterns from a previously mined set for propositionalization of the data, over inducing patternbased rule sets, to algorithms that integrate pattern mining and model construction. We provide an overview of the algorithms which are most closely related to our approaches in order to put our techniques in a context.

Keywords

Association Rule Class Label Greedy Algorithm Local Pattern Pattern Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27–30 November 2005, Houston, Texas, USA. IEEE Computer Society, 2005.Google Scholar
  2. 2.
    Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, and A. Inkeri Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI/MIT Press, 1996.Google Scholar
  3. 3.
    Stephen D. Bay and Michael J. Pazzani. Detecting change in categorical data: Mining contrast sets. In KDD, pages 302–306, 1999.Google Scholar
  4. 4.
    Karsten Borgwardt, Xifeng Yan, Marisa Thoma, Hong Cheng, Arthur Gretton, Le Song, Alex Smola, Jiawei Han, Philip Yu, and Hans-Peter Kriegel. Combining near-optimal feature selection with gSpan. In Samuel Kaski, S.V.N. Vishwanathan, and Stefan Wrobel, editors, MLG, 2008.Google Scholar
  5. 5.
    Björn Bringmann. Mining Patterns in Structured Data. PhD thesis, K.U.Leuven, September 2009. De Raedt, Luc (supervisor).Google Scholar
  6. 6.
    Björn Bringmann and Albrecht Zimmermann. Tree2 - decision trees for tree structured data. In Alípio Jorge, Luís Torgo, Pavel Brazdil, Rui Camacho, and João Gama, editors, PKDD, volume 3721 of Lecture Notes in Computer Science, pages 46–58. Springer, 2005.Google Scholar
  7. 7.
    Björn Bringmann and Albrecht Zimmermann. One in a million: picking the right patterns. Knowl. Inf. Syst., 18(1):61–81, 2009.CrossRefGoogle Scholar
  8. 8.
    Björn Bringmann, Albrecht Zimmermann, Luc De Raedt, and Siegfried Nijssen. Don’t be afraid of simpler patterns. In Fürnkranz et al. [17], pages 55–66.Google Scholar
  9. 9.
    Loïc Cerf, Dominique Gay, Nazha Selmaoui, and Jean-François Boulicaut. A parameter-free associative classification method. In Il-Yeol Song, Johann Eder, and Tho Manh Nguyen, editors, DaWaK, volume 5182 of Lecture Notes in Computer Science, pages 293–304. Springer, 2008.Google Scholar
  10. 10.
    Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. Discriminative frequent pattern analysis for effective classification. In ICDE, pages 716–725. IEEE, 2007.Google Scholar
  11. 11.
    Hong Cheng, Xifeng Yan, Jiawei Han, and Philip S. Yu. Direct discriminative pattern mining for effective classification. In ICDE, pages 169–178. IEEE, 2008.Google Scholar
  12. 12.
    Frans Coenen and Paul Leng. Obtaining best parameter values for accurate classification. In ICDM [1], pages 597–600.Google Scholar
  13. 13.
    William W. Cohen. Fast effective rule induction. In In Proceedings ofthe Twelfth International Conference on Machine Learning, pages 115–123. Morgan Kaufmann, 1995.Google Scholar
  14. 14.
    Mukund Deshpande and George Karypis. Using conjunction of attribute values for classifica- tion. In CIKM, pages 356–364. ACM, 2002.Google Scholar
  15. 15.
    Mukund Deshpande, Michihiro Kuramochi, Nikil Wale, and George Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng., 17(8):1036–1050, 2005.CrossRefGoogle Scholar
  16. 16.
    Guozhu Dong and Jinyan Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD, pages 43–52, 1999.Google Scholar
  17. 17.
    Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors. Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18–22, 2006, Proceedings, volume 4213 of Lecture Notes in Computer Science. Springer, 2006.Google Scholar
  18. 18.
    Gemma C. Garriga, Petra Kralj, and Nada Lavrac. Closed sets for labeled data. In Fürnkranz etal. [17], pages 163–174.Google Scholar
  19. 19.
    Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In Weidong Chen, Jeffrey F. Naughton, and Philip A. Bernstein, editors, SIGMOD Conference, pages 1–12. ACM, 2000.Google Scholar
  20. 20.
    Jeroen Kazius, Siegfried Nijssen, Joost N. Kok, Thomas Bäck, and Adriaan P. IJzerman. Sub- structure mining using elaborate chemical representation. Journal of Chemical Information and Modeling, 46(2):597–605, 2006.CrossRefGoogle Scholar
  21. 21.
    Arno Knobbe, Bruno Crémilleux, Johannes Fürnkranz, and Martin Scholz. From local patterns to global models: the LeGo approach to data mining. In Johannes Fürnkranz and Arno Knobbe, editors, LeGo’08, Proceedings of the ECML PKDD 2008 Workshop ’From Local Patterns to Global Models’, pages 1–16, 2008.Google Scholar
  22. 22.
    Arno J. Knobbe and Eric K. Y. Ho. Maximally informative k-itemsets and their efficient discovery. In Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos, editors, KDD, pages 237–244. ACM, 2006.Google Scholar
  23. 23.
    Arno J. Knobbe and Eric K. Y. Ho. Pattern teams. In Fürnkranz et al. [17], pages 577–584.Google Scholar
  24. 24.
    Stefan Kramer and Luc De Raedt. Feature construction with version spaces for biochemical applications. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, editors, ICML, pages 258–265. Morgan Kaufmann, 2001.Google Scholar
  25. 25.
    Neal Lesh, Mohammed Javeed Zaki, and Mitsunori Ogihara. Mining features for sequence classification. In KDD, pages 342–346, 1999.Google Scholar
  26. 26.
    Wenmin Li, Jiawei Han, and Jian Pei. Cmar: Accurate and efficient classification based on multiple class-association rules. In Nick Cercone, Tsau Young Lin, and Xindong Wu, editors, ICDM, pages 369–376. IEEE Computer Society, 2001.Google Scholar
  27. 27.
    Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In KDD, pages 80–86, 1998.Google Scholar
  28. 28.
    T.M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.zbMATHGoogle Scholar
  29. 29.
    Shinichi Morishita and Jun Sese. Traversing itemset lattice with statistical metric pruning. In PODS, pages 226–236. ACM, 2000.Google Scholar
  30. 30.
    G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265–294, 1978.zbMATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Siegfried Nijssen and Élisa Fromont. Mining optimal decision trees from itemset lattices. In Pavel Berkhin, Rich Caruana, and Xindong Wu, editors, KDD, pages 530–539. ACM, 2007.Google Scholar
  32. 32.
    Siegfried Nijssen and Elisa Fromont. Optimal constraint-based decision tree induction from itemset lattices. Data Mining and Knowledge Discovery, 2010. (In press).Google Scholar
  33. 33.
    Siegfried Nijssen and Joost N. Kok. Multi-class correlated pattern mining. In Francesco Bonchi and Jean-Francois Boulicaut, editors, KDID, volume 3933 of Lecture Notes in Computer Science, pages 165–187. Springer, 2005.Google Scholar
  34. 34.
    J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
  35. 35.
    Luc De Raedt and Albrecht Zimmermann. Constraint-based pattern set mining. In SDM. SIAM, 2007.Google Scholar
  36. 36.
    Geoffrey I. Webb. Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Machine Learning, 71(2–3):307–323, 2008.CrossRefGoogle Scholar
  37. 37.
    Stefan Wrobel. An algorithm for multi-relational discovery of subgroups. In Henryk Jan Komorowski and Jan M. Zytkow, editors, PKDD, volume 1263 of Lecture Notes in Computer Science, pages 78–87. Springer, 1997.Google Scholar
  38. 38.
    Mohammed Javeed Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. New algorithms for fast discovery of association rules. In KDD, pages 283–286, 1997.Google Scholar
  39. 39.
    Albrecht Zimmermann and Björn Bringmann. CTC - correlating tree patterns for classification. In ICDM [1], pages 833–836.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Björn Bringmann
    • 1
    Email author
  • Siegfried Nijssen
    • 1
  • Albrecht Zimmermann
    • 1
  1. 1.Department of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium

Personalised recommendations