Abstract
This chapter presents subgroup discovery (SD) and some of the related supervised descriptive rule induction techniques, including contrast set mining (CSM) and emerging pattern mining (EPM). These descriptive rule learning techniques are presented in a unifying framework named supervised descriptive rule learning. All these techniques aim at discovering patterns in the form of rules induced from labeled data. This chapter contributes to the understanding of these techniques by presenting a unified terminology and by explaining the apparent differences between the learning tasks as variants of a unique supervised descriptive rule learning task. It also shows that various rule learning heuristics used in CSM, EPM, and SD algorithms all aim at optimizing a trade off between rule coverage and precision.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Parts of this chapter are based on Kralj Novak, Lavrač, and Webb (2009).
- 2.
- 3.
Jumping emerging patterns are emerging patterns with support zero in one dataset and greater then zero in the other dataset.
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1995). Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 307–328). Menlo Park, CA: AAAI.
Atzmüller, M., & Puppe, F. (2005). Semi-automatic visual subgroup mining using VIKAMINE. Journal of Universal Computer Science, 11(11), 1752–1765. Special Issue on Visual Data Mining.
Atzmüller, M., & Puppe, F. (2006). SD-Map – A fast algorithm for exhaustive subgroup discovery. In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-06), Berlin, Germany (pp. 6–17). Berlin, Germany: Springer.
Atzmüller, M., Puppe, F., & Buscher, H.-P. (2005a). Exploiting background knowledge for knowledge-intensive subgroup discovery. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05), Edinburgh, UK (pp. 647–652). San Francisco: Morgan Kaufmann.
Atzmüller, M., Puppe, F., & Buscher, H.-P. (2005b). Profiling examiners using intelligent subgroup mining. In Proceedings of the 10th Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP-05) (pp. 46–51) Aberdeen: AIME.
Aumann, Y., & Lindell, Y. (1999). A statistical theory for quantitative association rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), San Diego, CA (pp. 261–270). New York: ACM.
Bay, S. D. (2000). Multivariate discretization of continuous variables for set mining. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000), Boston (pp. 315–319). New York: ACM.
Bay, S. D., & Pazzani, M. J. (2001). Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.
Bayardo, R. J., Jr. (1998). Efficiently mining long patterns from databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD-98), Seattle, WA (pp. 85–93). New York: ACM
Boulesteix, A.-L., Tutz, G., & Strimmer, K. (2003). A CART-based approach to discover emerging patterns in microarray data. Bioinformatics, 19(18), 2465–2472.
Daly, O., & Taniar, D. (2005). Exception rules in data mining. In M. Khosrow-Pour (Ed.), Encyclopedia of information science and technology (Vol. II, pp. 1144–1148). Hershey, PA: Idea Group.
del Jesus, M. J., González, P., Herrera, F., & Mesonero, M. (2007). Evolutionary fuzzy rule induction process for subgroup discovery: A case study in marketing. IEEE Transactions on Fuzzy Systems, 15(4), 578–592.
Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), San Diego, CA (pp. 43–52). New York: ACM
Dong, G., Zhang, X., Wong, L., & Li, J. (1999). CAEP: Classification by aggregating emerging patterns. In Proceedings of the 2nd International Conference on Discovery Science (DS-99), Tokyo, Japan (pp. 30–42). Berlin, Germany/New York: Springer.
Fan, H., Fan, M., Ramamohanarao, K., & Liu, M. (2006). Further improving emerging pattern based classifiers via bagging. In Proceedings of the 10th Pacific-Asia conference on Knowledge Discovery and Data Mining (PAKDD-06), Singapore (pp. 91–96). Berlin, Germany/Heidelberg, Germany/New York: Springer.
Fan, H., & Ramamohanarao, K. (2003a). A Bayesian approach to use emerging patterns for classification. In Proceedings of the 14th Australasian Database Conference (ADC-03), Adelaide, SA (pp. 39–48). Darlinghurst, NSW: Australian Computer Society
Fan, H., & Ramamohanarao, K. (2003b). Efficiently mining interesting emerging patterns. In Proceeding of the 4th International Conference on Web-Age Information Management (WAIM-03), Chengdu, China (pp. 189–201). Berlin, Germany/New York: Springer.
Friedman, J. H., & Fisher, N. I. (1999). Bump hunting in high-dimensional data. Statistics and Computing, 9(2), 123–143.
Gamberger, D., & Lavrač, N. (2002). Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research, 17, 501–527.
Gamberger, D., Lavrač, N., & Wettschereck., D. (2002). Subgroup visualization: A method and application in population screening. In Proceedings of the 7th International Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP-02), Lyon, France (pp. 31–35). Lyon, France: ECAI
Garriga, G. C., Kralj, P., & Lavrač, N. (2006). Closed sets for labeled data. In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-06), Berlin, Germany (pp. 163 – 174). Berlin, Germany/New York: Springer
Hilderman, R. J., & Peckham, T. (2005). A statistically sound alternative approach to mining contrast sets. In Proceedings of the 4th Australia Data Mining Conference (AusDM-05), Sydney, NSW (pp. 157–172).
Jenkole, J., Kralj, P., Lavrač, N., & Sluga, A. (2007). A data mining experiment on manufacturing shop floor data. In Proceedings of the 40th CIRP International Seminar on Manufacturing Systems. Liverpool, UK: University of Liverpool
Kavšek, B., & Lavrač, N. (2006). Apriori-SD: Adapting association rule learning to subgroup discovery. Applied Artificial Intelligence, 20(7), 543–583.
Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 249–271). Menlo Park, CA: AAAI. Chap. 10.
Klösgen, W., & May, M. (2002). Spatial subgroup mining integrated in an object-relational spatial database. In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-02) (pp. 275–286). Berlin, Germany/New York: Springer.
Klösgen, W., May, M., & Petch, J. (2003). Mining census data for spatial effects on mortality. Intelligent Data Analysis, 7(6):521–540.
Kralj, P., Grubešič, A., Toplak, N., Gruden, K., Lavrač, N., & Garriga, G. C. (2006). Application of closed itemset mining for class labeled data in functional genomics. Informatica Medica Slovenica, 11(1), 40–45.
Kralj, P., Lavrač, N., Gamberger, D., & Krstačić, A. (2007a). Contrast set mining for distinguishing between similar diseases. In Proceedings of the 11th Conference on Artificial Intelligence in Medicine (AIME-07), Amsterdam (pp. 109–118). Berlin, Germany: Springer
Kralj, P., Lavrač, N., Gamberger, D., & Krstačić, A. (2007b). Contrast set mining through subgroup discovery applied to brain ischaemia data. In Proceedings of the 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD-07), Nanjing, China (pp. 579–586). Berlin, Germany/New York: Springer
Kralj, P., Lavrač, N., & Zupan, B. (2005). Subgroup visualization. In Proceedings of the 8th International Multiconference Information Society (IS-05), Ljubljana, Slovenia (pp. 228–231). Ljubljana, Slovenia: Institut Jožef Stefan.
Kralj Novak, P., Lavrač, N., & Webb, G. I. (2009). Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403.
Lavrač, N., Cestnik, B., Gamberger, D., & Flach, P. A. (2004). Decision support through subgroup discovery: Three case studies and the lessons learned. Machine Learning, 57(1–2):115–143. Special issue on Data Mining Lessons Learned.
Lavrač, N., Kavšek, B., Flach, P., & Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188.
Lavrač, N., Kralj, P., Gamberger, D., & Krstačić, A. (2007). Supporting factors to improve the explanatory potential of contrast set mining: Analyzing brain ischaemia data. In Proceedings of the 11th Mediterranean Conference on Medical and Biological Engineering and Computing (MEDICON-07), Ljubljana, Slovenia (pp. 157–161). Berlin, Germany: Springer.
Li, J., Dong, G., & Ramamohanarao, K. (2000). Instance-based classification by emerging patterns. In Proceedings of the 14th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), Lyon, France (pp. 191–200). Berlin, Germany/New York: Springer.
Li, J., Dong, G., & Ramamohanarao, K. (2001). Making use of the most expressive jumping emerging patterns for classification. Knowledge and Information Systems, 3(2), 1–29.
Li, J., Liu, H., Downing, J. R., Yeoh, A. E.-J., & Wong, L. (2003). Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients. Bioinformatics, 19(1), 71–78.
Li, J., & Wong, L. (2002b). Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 18(10), 1406–1407.
Lin, J., & Keogh, E. (2006). Group SAX: Extending the notion of contrast sets to time series and multimedia data. In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-06), Berlin, Germany (pp. 284–296). Berlin, Germany/New York: Springer.
Liu, B., Hsu, W., Han, H.-S., & Xia, Y. (2000). Mining changes for real-life applications. In Proceedings of the 2nd International Conference on Data Warehousing and Knowledge Discovery (DaWaK-2000), London (pp. 337–346). Berlin, Germany: Springer.
Liu, B., Hsu, W., & Ma, Y. (2001). Discovering the set of fundamental rule changes. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), San Francisco (pp. 335–340). New York: ACM.
May, M., & Ragia, L. (2002). Spatial subgroup discovery applied to the analysis of vegetation data. In Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management (PAKM-2002), Vienna (pp. 49–61). Berlin, Germany/New York: Springer.
Simeon, M., & Hilderman, R. J. (2007). Exploratory quantitative contrast set mining: A discretization approach. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-07), Patras, Greece (Vol.2, pp. 124–131). Los Alamitos, CA: IEEE.
Siu, K., Butler, S., Beveridge, T., Gillam, J., Hall, C., & Kaye, A., et al. (2005). Identifying markers of pathology in SAXS data of malignant tissues of the brain. Nuclear Instruments and Methods in Physics Research A, 548, 140–146.
Song, H. S., Kimb, J. K., & Kima, S. H. (2001). Mining the change of customer behavior in an internet shopping mall. Expert Systems with Applications, 21(3), 157–168.
Soulet, A., Crémilleux, B., & Rioult, F. (2004). Condensed representation of emerging patterns. In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-04), Sydney, NSW (pp. 127–132). Berlin, Germany/New York: Springer.
Suzuki, E. (2006). Data mining methods for discovering interesting exceptions from an unsupervised table. Journal of Universal Computer Science, 12(6), 627–653.
Wang, K., Zhou, S., Fu, A. W.-C., & Yu, J. X. (2003). Mining changes of classification by correspondence tracing. In Proceedings of the 3rd SIAM International Conference on Data Mining (SDM-03) (pp. 95–106). Philadelphia: SIAM
Webb, G. I. (1995). OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 5, 431–465.
Webb, G. I. (2001). Discovering associations with numeric variables. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), San Francisco (pp. 383–388). New York: ACM.
Webb, G. I. (2007). Discovering significant patterns. Machine Learning, 68(1), 1–33.
Webb, G. I., Butler, S. M., & Newlands, D. (2003). On detecting differences between groups. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03), Washington, DC (pp. 256–265). New York: ACM.
Wettschereck, D. (2002). A KDDSE-independent PMML visualizer. In Proceedings of 2nd Workshop on Integration Aspects of Data Mining, Decision Support and Meta-Learning (IDDM-02) (pp. 150–155). Helsinki, Finland: Helsinki University
Wong, T.-T., & Tseng, K.-L. (2005). Mining negative contrast sets from data with discrete attributes. Expert Systems with Applications, 29(2), 401–407.
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD-97) (pp. 78–87). Berlin, Germany: Springer.
Wrobel, S. (2001). Inductive logic programming for knowledge discovery in databases. In S. Džeroski & N. Lavrač (Eds.), Relational data mining (pp. 74–101). Berlin, Germany/New York: Springer.
Zelezný, F., & Lavrač, N. (2006). Propositionalization-based relational subgroup discovery with RSD. Machine Learning, 62, 33–63.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fürnkranz, J., Gamberger, D., Lavrač, N. (2012). Supervised Descriptive Rule Learning. In: Foundations of Rule Learning. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75197-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-75197-7_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75196-0
Online ISBN: 978-3-540-75197-7
eBook Packages: Computer ScienceComputer Science (R0)