Abstract
Rule learning systems use features as the main building blocks for rules. A feature can be a simple attribute-value test or a test for the validity of a complex domain knowledge relationship. Most existing concept learning systems generate features in the rule construction process. In contrast, this chapter shows that the separation of the feature construction and rule construction process has several theoretical and practical advantages. In particular, explicit usage of features enables a unifying framework of both propositional and relational rule learning. We demonstrate procedures for generating a set of simple features that—in domains with no contradictory examples—enable the construction of complete and consistent rule sets, and do not include obviously irrelevant features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Parts of this chapter are based on Lavrač, Fürnkranz, and Gamberger (2010).
- 2.
- 3.
Lavrač et al. (1999) used the term coverage of pn-pairs instead of pn-pair discrimination.
- 4.
A similar result was first observed by Fayyad and Irani (1992) for discretization of numerical values.
- 5.
One reason for choosing the mean value between two neighboring points of opposite classes is that this point maximizes themargin, i.e., the buffer towards the decision boundary between the two classes.
- 6.
The Nemenyi test (Nemenyi, 1963) is a post hoc test to the Friedman test for rank differences (Friedman, 1937). It computes the length of the critical distance, which is the minimum difference in average rank from which one can conclude statistical significance of the observed average ranks. This is typically visualized in a graph that shows the average ranks of various methods connecting those that are within the same critical distance range. For its use in machine learning we refer to Demšar (2006).
References
Bergadano, F., Matwin, S., Michalski, R. S., & Zhang, J. (1992). Learning two-tiered descriptions of flexible concepts: The POSEIDON system. Machine Learning, 8, 5–43.
Bruha, I., & Franek, F. (1996). Comparison of various routines for unknown attribute value processing: The covering paradigm. International Journal of Pattern Recognition and Artificial Intelligence, 10(8), 939–955.
Cai, Y., Cercone, N., & Han, J. (1991). Attribute-oriented induction in relational databases. In G. Piatetsky-Shapiro & W. J. Frawley (Eds.), Knowledge discovery in databases (pp. 213–228). Menlo Park, CA: MIT.
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283.
Cloete, I., & Van Zyl, J. (2006). Fuzzy rule induction in a set covering framework. IEEE Transactions on Fuzzy Systems, 14(1), 93–110.
Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th International Conference on Machine Learning (ML-95), Lake Tahoe, CA (pp. 115–123). San Francisco: Morgan Kaufmann.
Cohen, W. W. (1996). Learning trees and rules with set-valued features. In Proceedings of the 13th National Conference on Artificial Intelligene (AAAI-96) (pp. 709–716). Menlo Park, CA: AAAI.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Dubois, D., & Prade, H. (1980). Fuzzy sets and systems. New York: Academic.
Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(2), 87–102.
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675–701.
Gamberger, D., & Lavrač, N. (2002). Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research, 17, 501–527.
Gamberger, D., Lavrač, N., & Fürnkranz, J. (2008). Handling unknown and imprecise attribute values in propositional rule learning: A feature-based approach. In T.-B. Ho & Z.-H. Zhou (Eds.), Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence (PRICAI-08), Hanoi, Vietnam (pp. 636–645). Berlin, Germany/Heidelberg, Germany: Springer.
Gamberger, D., Lavrač, N., Zelezny, F., & Tolar, J. (2004). Induction of comprehensible models for gene expression datasets by subgroup discovery methodology. Journal of Biomedical Informatics, 37(4), 269–284.
Han, J., Cai, Y., & Cercone, N. (1992). Knowledge discovery in databases: An attribute-oriented approach. In Proceedings of the 18th Conference on Very Large Data Bases (VLDB-92), Vancouver, BC (pp. 547–559). San Mateo, CA: Morgan Kaufmann Publishers.
Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann Publishers.
Hühn, J., & Hüllermeier, E. (2009b). Furia: An algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery, 19(3), 293–319.
Hüllermeier, E. (2011). Fuzzy sets in machine learning and data mining. Applied Soft Computing, 11(2), 1493–1505.
Lavrač, N., & Džeroski, S. (1994a). Inductive logic programming: Techniques and applications. New York: Ellis Horwood.
Lavrač, N., Džeroski, S., & Grobelnik, M. (1991). Learning nonrecursive definitions of relations with LINUS. In Proceedings of the 5th European Working Session on Learning (EWSL-91), Porto, Portugal (pp. 265–281). Berlin, Germany: Springer.
Lavrač, N., & Flach, P. (2001). An extended transformation approach to inductive logic programming. ACM Transactions on Computational Logic, 2(4), 458–494.
Lavrač, N., Fürnkranz, J., & Gamberger, D. (2010). Explicit feature construction and manipulation for covering rule learning algorithms. In J. Koronacki, Z. Ras, S. T. Wierzchon, & J. Kacprzyk (Eds.), advances in machine learning II—Dedicated to the memory of Professor Ryszard S. Michalski (pp. 121–146). Berlin, Germany/Heidelberg, Germany: Springer.
Lavrač, N., Gamberger, D., & Jovanoski, V. (1999). A sudy of relevance for learning in deductive databases. The Journal of Logic Programming, 40(2/3), 215–249.
Michalski, R. S. (1973). AQVAL/1—Computer implementation of a variable-valued logic system VL1 and examples of its application to pattern recognition. In Proceedings of the 1st International Joint Conference on Pattern Recognition, Washington, DC (pp. 3–17). Northridge, CA: IEEE
Michalski, R. S. (1980). Pattern recognition and rule-guided inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 349–361.
Michalski, R. S., Mozetič, I., Hong, J., & Lavrač, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proceedings of the 5th National Conference on Artificial Intelligence (AAAI-86), Philadelphia (pp. 1041–1045). Menlo Park, CA: AAAI.
Nemenyi, P. (1963). Distribution-free multiple comparisons. Ph.D. thesis, Princeton University.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Theron, H., & Cloete, I. (1996). BEXA: A covering algorithm for learning propositional concept descriptions. Machine Learning, 24, 5–40.
Wohlrab, L., & Fürnkranz, J. (2011). A review and comparison of strategies for handling missing values in separate-and-conquer rule learning. Journal of Intelligent Information Systems, 36(1), 73–98.
Yang, Y., Webb, G. I., & Wu, X. (2005). Discretization methods. In O. Maimon & L. Rokach (Eds.), The data mining and knowledge discovery handbook (pp. 113–130). New York: Springer.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fürnkranz, J., Gamberger, D., Lavrač, N. (2012). Features. In: Foundations of Rule Learning. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75197-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-75197-7_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75196-0
Online ISBN: 978-3-540-75197-7
eBook Packages: Computer ScienceComputer Science (R0)