Abstract
Using pattern mining techniques for building a predictive model is currently a popular topic of research. The aim of these techniques is to obtain classifiers of better predictive performance as compared to greedily constructed models, as well as to allow the construction of predictive models for data not represented in attribute-value vectors. In this chapter we provide an overview of recent techniques we developed for integrating pattern mining and classification tasks. The range of techniques spans the entire range from approaches that select relevant patterns from a previously mined set for propositionalization of the data, over inducing patternbased rule sets, to algorithms that integrate pattern mining and model construction. We provide an overview of the algorithms which are most closely related to our approaches in order to put our techniques in a context.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27–30 November 2005, Houston, Texas, USA. IEEE Computer Society, 2005.
Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, and A. Inkeri Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI/MIT Press, 1996.
Stephen D. Bay and Michael J. Pazzani. Detecting change in categorical data: Mining contrast sets. In KDD, pages 302–306, 1999.
Karsten Borgwardt, Xifeng Yan, Marisa Thoma, Hong Cheng, Arthur Gretton, Le Song, Alex Smola, Jiawei Han, Philip Yu, and Hans-Peter Kriegel. Combining near-optimal feature selection with gSpan. In Samuel Kaski, S.V.N. Vishwanathan, and Stefan Wrobel, editors, MLG, 2008.
Björn Bringmann. Mining Patterns in Structured Data. PhD thesis, K.U.Leuven, September 2009. De Raedt, Luc (supervisor).
Björn Bringmann and Albrecht Zimmermann. Tree2 - decision trees for tree structured data. In Alípio Jorge, Luís Torgo, Pavel Brazdil, Rui Camacho, and João Gama, editors, PKDD, volume 3721 of Lecture Notes in Computer Science, pages 46–58. Springer, 2005.
Björn Bringmann and Albrecht Zimmermann. One in a million: picking the right patterns. Knowl. Inf. Syst., 18(1):61–81, 2009.
Björn Bringmann, Albrecht Zimmermann, Luc De Raedt, and Siegfried Nijssen. Don’t be afraid of simpler patterns. In Fürnkranz et al. [17], pages 55–66.
Loïc Cerf, Dominique Gay, Nazha Selmaoui, and Jean-François Boulicaut. A parameter-free associative classification method. In Il-Yeol Song, Johann Eder, and Tho Manh Nguyen, editors, DaWaK, volume 5182 of Lecture Notes in Computer Science, pages 293–304. Springer, 2008.
Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. Discriminative frequent pattern analysis for effective classification. In ICDE, pages 716–725. IEEE, 2007.
Hong Cheng, Xifeng Yan, Jiawei Han, and Philip S. Yu. Direct discriminative pattern mining for effective classification. In ICDE, pages 169–178. IEEE, 2008.
Frans Coenen and Paul Leng. Obtaining best parameter values for accurate classification. In ICDM [1], pages 597–600.
William W. Cohen. Fast effective rule induction. In In Proceedings ofthe Twelfth International Conference on Machine Learning, pages 115–123. Morgan Kaufmann, 1995.
Mukund Deshpande and George Karypis. Using conjunction of attribute values for classifica- tion. In CIKM, pages 356–364. ACM, 2002.
Mukund Deshpande, Michihiro Kuramochi, Nikil Wale, and George Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng., 17(8):1036–1050, 2005.
Guozhu Dong and Jinyan Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD, pages 43–52, 1999.
Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors. Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18–22, 2006, Proceedings, volume 4213 of Lecture Notes in Computer Science. Springer, 2006.
Gemma C. Garriga, Petra Kralj, and Nada Lavrac. Closed sets for labeled data. In Fürnkranz etal. [17], pages 163–174.
Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In Weidong Chen, Jeffrey F. Naughton, and Philip A. Bernstein, editors, SIGMOD Conference, pages 1–12. ACM, 2000.
Jeroen Kazius, Siegfried Nijssen, Joost N. Kok, Thomas Bäck, and Adriaan P. IJzerman. Sub- structure mining using elaborate chemical representation. Journal of Chemical Information and Modeling, 46(2):597–605, 2006.
Arno Knobbe, Bruno Crémilleux, Johannes Fürnkranz, and Martin Scholz. From local patterns to global models: the LeGo approach to data mining. In Johannes Fürnkranz and Arno Knobbe, editors, LeGo’08, Proceedings of the ECML PKDD 2008 Workshop ’From Local Patterns to Global Models’, pages 1–16, 2008.
Arno J. Knobbe and Eric K. Y. Ho. Maximally informative k-itemsets and their efficient discovery. In Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos, editors, KDD, pages 237–244. ACM, 2006.
Arno J. Knobbe and Eric K. Y. Ho. Pattern teams. In Fürnkranz et al. [17], pages 577–584.
Stefan Kramer and Luc De Raedt. Feature construction with version spaces for biochemical applications. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, editors, ICML, pages 258–265. Morgan Kaufmann, 2001.
Neal Lesh, Mohammed Javeed Zaki, and Mitsunori Ogihara. Mining features for sequence classification. In KDD, pages 342–346, 1999.
Wenmin Li, Jiawei Han, and Jian Pei. Cmar: Accurate and efficient classification based on multiple class-association rules. In Nick Cercone, Tsau Young Lin, and Xindong Wu, editors, ICDM, pages 369–376. IEEE Computer Society, 2001.
Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In KDD, pages 80–86, 1998.
T.M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.
Shinichi Morishita and Jun Sese. Traversing itemset lattice with statistical metric pruning. In PODS, pages 226–236. ACM, 2000.
G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265–294, 1978.
Siegfried Nijssen and Élisa Fromont. Mining optimal decision trees from itemset lattices. In Pavel Berkhin, Rich Caruana, and Xindong Wu, editors, KDD, pages 530–539. ACM, 2007.
Siegfried Nijssen and Elisa Fromont. Optimal constraint-based decision tree induction from itemset lattices. Data Mining and Knowledge Discovery, 2010. (In press).
Siegfried Nijssen and Joost N. Kok. Multi-class correlated pattern mining. In Francesco Bonchi and Jean-Francois Boulicaut, editors, KDID, volume 3933 of Lecture Notes in Computer Science, pages 165–187. Springer, 2005.
J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
Luc De Raedt and Albrecht Zimmermann. Constraint-based pattern set mining. In SDM. SIAM, 2007.
Geoffrey I. Webb. Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Machine Learning, 71(2–3):307–323, 2008.
Stefan Wrobel. An algorithm for multi-relational discovery of subgroups. In Henryk Jan Komorowski and Jan M. Zytkow, editors, PKDD, volume 1263 of Lecture Notes in Computer Science, pages 78–87. Springer, 1997.
Mohammed Javeed Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. New algorithms for fast discovery of association rules. In KDD, pages 283–286, 1997.
Albrecht Zimmermann and Björn Bringmann. CTC - correlating tree patterns for classification. In ICDM [1], pages 833–836.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Bringmann, B., Nijssen, S., Zimmermann, A. (2010). From Local Patterns to Classification Models. In: Džeroski, S., Goethals, B., Panov, P. (eds) Inductive Databases and Constraint-Based Data Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7738-0_6
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7738-0_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7737-3
Online ISBN: 978-1-4419-7738-0
eBook Packages: Computer ScienceComputer Science (R0)