Skip to main content

From Local Patterns to Classification Models

  • Chapter
  • First Online:
Inductive Databases and Constraint-Based Data Mining

Abstract

Using pattern mining techniques for building a predictive model is currently a popular topic of research. The aim of these techniques is to obtain classifiers of better predictive performance as compared to greedily constructed models, as well as to allow the construction of predictive models for data not represented in attribute-value vectors. In this chapter we provide an overview of recent techniques we developed for integrating pattern mining and classification tasks. The range of techniques spans the entire range from approaches that select relevant patterns from a previously mined set for propositionalization of the data, over inducing patternbased rule sets, to algorithms that integrate pattern mining and model construction. We provide an overview of the algorithms which are most closely related to our approaches in order to put our techniques in a context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27–30 November 2005, Houston, Texas, USA. IEEE Computer Society, 2005.

    Google Scholar 

  2. Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, and A. Inkeri Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI/MIT Press, 1996.

    Google Scholar 

  3. Stephen D. Bay and Michael J. Pazzani. Detecting change in categorical data: Mining contrast sets. In KDD, pages 302–306, 1999.

    Google Scholar 

  4. Karsten Borgwardt, Xifeng Yan, Marisa Thoma, Hong Cheng, Arthur Gretton, Le Song, Alex Smola, Jiawei Han, Philip Yu, and Hans-Peter Kriegel. Combining near-optimal feature selection with gSpan. In Samuel Kaski, S.V.N. Vishwanathan, and Stefan Wrobel, editors, MLG, 2008.

    Google Scholar 

  5. Björn Bringmann. Mining Patterns in Structured Data. PhD thesis, K.U.Leuven, September 2009. De Raedt, Luc (supervisor).

    Google Scholar 

  6. Björn Bringmann and Albrecht Zimmermann. Tree2 - decision trees for tree structured data. In Alípio Jorge, Luís Torgo, Pavel Brazdil, Rui Camacho, and João Gama, editors, PKDD, volume 3721 of Lecture Notes in Computer Science, pages 46–58. Springer, 2005.

    Google Scholar 

  7. Björn Bringmann and Albrecht Zimmermann. One in a million: picking the right patterns. Knowl. Inf. Syst., 18(1):61–81, 2009.

    Article  Google Scholar 

  8. Björn Bringmann, Albrecht Zimmermann, Luc De Raedt, and Siegfried Nijssen. Don’t be afraid of simpler patterns. In Fürnkranz et al. [17], pages 55–66.

    Google Scholar 

  9. Loïc Cerf, Dominique Gay, Nazha Selmaoui, and Jean-François Boulicaut. A parameter-free associative classification method. In Il-Yeol Song, Johann Eder, and Tho Manh Nguyen, editors, DaWaK, volume 5182 of Lecture Notes in Computer Science, pages 293–304. Springer, 2008.

    Google Scholar 

  10. Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. Discriminative frequent pattern analysis for effective classification. In ICDE, pages 716–725. IEEE, 2007.

    Google Scholar 

  11. Hong Cheng, Xifeng Yan, Jiawei Han, and Philip S. Yu. Direct discriminative pattern mining for effective classification. In ICDE, pages 169–178. IEEE, 2008.

    Google Scholar 

  12. Frans Coenen and Paul Leng. Obtaining best parameter values for accurate classification. In ICDM [1], pages 597–600.

    Google Scholar 

  13. William W. Cohen. Fast effective rule induction. In In Proceedings ofthe Twelfth International Conference on Machine Learning, pages 115–123. Morgan Kaufmann, 1995.

    Google Scholar 

  14. Mukund Deshpande and George Karypis. Using conjunction of attribute values for classifica- tion. In CIKM, pages 356–364. ACM, 2002.

    Google Scholar 

  15. Mukund Deshpande, Michihiro Kuramochi, Nikil Wale, and George Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng., 17(8):1036–1050, 2005.

    Article  Google Scholar 

  16. Guozhu Dong and Jinyan Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD, pages 43–52, 1999.

    Google Scholar 

  17. Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors. Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18–22, 2006, Proceedings, volume 4213 of Lecture Notes in Computer Science. Springer, 2006.

    Google Scholar 

  18. Gemma C. Garriga, Petra Kralj, and Nada Lavrac. Closed sets for labeled data. In Fürnkranz etal. [17], pages 163–174.

    Google Scholar 

  19. Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In Weidong Chen, Jeffrey F. Naughton, and Philip A. Bernstein, editors, SIGMOD Conference, pages 1–12. ACM, 2000.

    Google Scholar 

  20. Jeroen Kazius, Siegfried Nijssen, Joost N. Kok, Thomas Bäck, and Adriaan P. IJzerman. Sub- structure mining using elaborate chemical representation. Journal of Chemical Information and Modeling, 46(2):597–605, 2006.

    Article  Google Scholar 

  21. Arno Knobbe, Bruno Crémilleux, Johannes Fürnkranz, and Martin Scholz. From local patterns to global models: the LeGo approach to data mining. In Johannes Fürnkranz and Arno Knobbe, editors, LeGo’08, Proceedings of the ECML PKDD 2008 Workshop ’From Local Patterns to Global Models’, pages 1–16, 2008.

    Google Scholar 

  22. Arno J. Knobbe and Eric K. Y. Ho. Maximally informative k-itemsets and their efficient discovery. In Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos, editors, KDD, pages 237–244. ACM, 2006.

    Google Scholar 

  23. Arno J. Knobbe and Eric K. Y. Ho. Pattern teams. In Fürnkranz et al. [17], pages 577–584.

    Google Scholar 

  24. Stefan Kramer and Luc De Raedt. Feature construction with version spaces for biochemical applications. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, editors, ICML, pages 258–265. Morgan Kaufmann, 2001.

    Google Scholar 

  25. Neal Lesh, Mohammed Javeed Zaki, and Mitsunori Ogihara. Mining features for sequence classification. In KDD, pages 342–346, 1999.

    Google Scholar 

  26. Wenmin Li, Jiawei Han, and Jian Pei. Cmar: Accurate and efficient classification based on multiple class-association rules. In Nick Cercone, Tsau Young Lin, and Xindong Wu, editors, ICDM, pages 369–376. IEEE Computer Society, 2001.

    Google Scholar 

  27. Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In KDD, pages 80–86, 1998.

    Google Scholar 

  28. T.M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.

    MATH  Google Scholar 

  29. Shinichi Morishita and Jun Sese. Traversing itemset lattice with statistical metric pruning. In PODS, pages 226–236. ACM, 2000.

    Google Scholar 

  30. G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265–294, 1978.

    Article  MATH  MathSciNet  Google Scholar 

  31. Siegfried Nijssen and Élisa Fromont. Mining optimal decision trees from itemset lattices. In Pavel Berkhin, Rich Caruana, and Xindong Wu, editors, KDD, pages 530–539. ACM, 2007.

    Google Scholar 

  32. Siegfried Nijssen and Elisa Fromont. Optimal constraint-based decision tree induction from itemset lattices. Data Mining and Knowledge Discovery, 2010. (In press).

    Google Scholar 

  33. Siegfried Nijssen and Joost N. Kok. Multi-class correlated pattern mining. In Francesco Bonchi and Jean-Francois Boulicaut, editors, KDID, volume 3933 of Lecture Notes in Computer Science, pages 165–187. Springer, 2005.

    Google Scholar 

  34. J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

    Google Scholar 

  35. Luc De Raedt and Albrecht Zimmermann. Constraint-based pattern set mining. In SDM. SIAM, 2007.

    Google Scholar 

  36. Geoffrey I. Webb. Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Machine Learning, 71(2–3):307–323, 2008.

    Article  Google Scholar 

  37. Stefan Wrobel. An algorithm for multi-relational discovery of subgroups. In Henryk Jan Komorowski and Jan M. Zytkow, editors, PKDD, volume 1263 of Lecture Notes in Computer Science, pages 78–87. Springer, 1997.

    Google Scholar 

  38. Mohammed Javeed Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. New algorithms for fast discovery of association rules. In KDD, pages 283–286, 1997.

    Google Scholar 

  39. Albrecht Zimmermann and Björn Bringmann. CTC - correlating tree patterns for classification. In ICDM [1], pages 833–836.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn Bringmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Bringmann, B., Nijssen, S., Zimmermann, A. (2010). From Local Patterns to Classification Models. In: Džeroski, S., Goethals, B., Panov, P. (eds) Inductive Databases and Constraint-Based Data Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7738-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-7738-0_6

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-7737-3

  • Online ISBN: 978-1-4419-7738-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics