Abstract
In knowledge discovery from uncertain data we usually wish to obtain models that have good predictive properties when applied to unseen objects. In several applications, it is also desirable to synthesize models that in addition have good descriptive properties. The ultimate goal therefore, is to maximize both properties, i.e. to obtain models that are amenable to human inspection and that have high predictive performance. Models consisting of decision or classification rules, such as those produced with rough sets [19], can exhibit both properties. In practice, however, the induced models are often too large to be inspected. This paper reports on two basic approaches to obtaining manageable rule-based models that do not sacrifice their predictive qualities: a priori and a posteriori pruning. The methods are discussed in the context of rough sets, but several of the results are applicable to rule-based models in general. Algorithms realizing these approaches have been implemented in the Rosetta system. Predictive performance of the models has been estimated using accuracy and receiver operating characteristics (ROC). The methods has been tested on real-world data sets, with encouraging results.
Chapter PDF
Similar content being viewed by others
Keywords
- Receiver Operating Characteristic
- Knowledge Discovery
- Encourage Result
- Predictive Performance
- Receiver Operating Characteristic
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ågotnes, T.: Filtering large propositional rule sets while retaining classifier performance. Master’s thesis, Department of Computer and Information Science, Norwegian University of Science and Technology (1999)
Bazan, J.G., Skowron, A., Synak, P.: Dynamic reducts as a tool for extracting laws from decision tables. In: Proc. International Symposium on Methodologies for Intelligent Systems. LNCS (LNAI), vol. 869, pp. 346–355. Springer, Heidelberg (1994)
Bruha, I.: Quality of decision rules: Definitions and classification schemes for mulit-ple rules. In: Nakhaeizadeh, G., Taylor, C.C. (eds.) Machine Learning and Statistics, The Interface, ch. 5. John Wiley and Sons, Inc., Chichester (1997)
Carlin, U., Komorowski, J., Øhrn, A.: Rough set analysis of medical datasets in a case of patients with suspected acute appendicitis. In: Proc. EGAI 1998 Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IBAMAP 1998), pp. 18–28 (1998)
Greco, S., Matarazzo, B., Słowiński, R.: New developments in the rough set approach to multi-attribute decision analysis. Bulletin of International Rough Set Society 2(2/3), 57–87 (1998)
Hallan, S., Åsberg, A., Edna, T.-H.: Estimating the probability of acute appendicitis using clinical criteria of a structured record sheet: The physician against the computer. European Journal of Surgery 163(6), 427–432 (1997)
Hanley, J.A., McNeil, B.J.: A method for comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148, 839–843 (1983)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)
Kohavi, R., Prasca, B.: Useful feature subsets and rough set reducts. In: Lin, T.Y., Wildberger, A.M. (eds.) 3rd International Workshop on Rough Sets and Soft Commuting (RSSC 1994), San Jose, USA (1994)
Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: A tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough Fuzzy Hybridization - A New Trend in Becision-Making, pp. 3–98. Springer, Heidelberg (1999)
Komorowski, J., Øhrn, A., Skowron, A.: ROSETTA and other software systems for rough sets. In: Klosgen, W., Zytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, Oxford University Press, Oxford (2000)
Kowalczyk, W.: Rough data modelling: a new technique for analyzing data. In: Rough Sets and Knowledge Discovery 1: Methodology and Applications [21], ch. 20, pp. 400–421. Physica-Verlag, Heidelberg (1998)
Løken, T.: Rough modeling: Extracting compact models from large databases. Master’s thesis, Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway (1999)
Murphy, P.M., Aha, D. W.: UCI Repository of Machine Learning Databases. Machine-readable collection, Dept of Information and Computer Science, University of California, Irvine (1995), Available by anonymous ftp from ics.uci.edu in directory pub/machine-learning-databases
Nguyen, H.S.: Data regularity analysis and applications in data mining. PhD thesis, Warsaw University (1999)
Øhrn, A., Ohno-Machado, L., Rowland, T.: Building manageable rough set classifiers. In: Proc. AMIA Annual Fall Symposium, Orlando, FL, USA, pp. 543–547 (1998)
Øhrn, A., Komorowski, J.: Diagnosing acute appendicitis with very simple classification rules. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 462–467. Springer, Heidelberg (1999)
Øhrn, A., Komorowski, J., Skowron, A., Synak, P.: The design and implementation of a knowledge discovery toolkit based on rough sets: The ROSETTA system. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 1: Methodology and Applications. Studies in Fuzziness and Soft Computing, vol. 18, ch. 19, pp. 376–399. Physica-Verlag, Heidelberg (1998)
Pawlak, Z.: Rough sets. International Journal of Information and Computer Science ll(5), 341–356 (1982)
Piasta, Z., Lenarcik, A.: Rule induction with probabilistic rough classifiers. Technical report, Warszaw University of Technology ICS Research Report 24/96 (1996)
Polkowski, L., Skowron, A. (eds.): Rough Sets in Knowledge Discovery 1: Methodology and Applications. Studies in Fuzziness and Soft Computing, vol. 1. Physica-Verlag, Heidelberg (1998)
Skowron, A., Polkowski, L., Komorowski, J.: Learning tolerance relations by boolean descriptors, automatic feature extraction from data tables. In: Tsumoto, S., Kobayashi, S., Yokomori, T., Tanaka, H., Nakamura, A. (eds.) Proceedings of the Fourth International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery, RSFD 1996, Tokyo, Japan, pp. 11–17 (1996)
Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support Systems — Handbook of Applications and advances in Rough Set Theory, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1991)
Słowiński, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. In: Wang, P.P. (ed.) Advances in Machine Intelligence & Soft Computing, vol. 4, pp. 17–33. Duke University Press (1997)
Swets, J.A.: Measuring the accuracy of diagnostic systems. Science 240, 1285–1293 (1988)
Vinterbo, S.: Finding minimal cost hitting sets: A genetic approach. Technical report, Department of Computer and Information Science, Norwegian University of Science and Technology (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ågotnes, T., Komorowski, J., Løken, T. (1999). Taming Large Rule Models in Rough Set Approaches. In: Żytkow, J.M., Rauch, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1999. Lecture Notes in Computer Science(), vol 1704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48247-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-48247-5_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66490-1
Online ISBN: 978-3-540-48247-5
eBook Packages: Springer Book Archive