# Data transformation and rough sets

## Abstract

Knowledge discovery and data mining systems have to face several difficulties, in particular related to the huge amount of input data. This problem is especially related to inductive logic programming systems, which employ algorithms that are computationally complex. Learning time can be reduced by feeding the ILP algorithm only a well-chosen portion of the original input data. Such transformation of the input data should throw away unimportant clauses but leave ones that are potentially necessary to obtain proper results. In this paper two approaches to data reduction problem are proposed. Both are based on rough set theory. Rough set techniques serve as data reduction tools to reduce the size of input data fed to more time-expensive (search-intensive) ILP techniques. First approach transforms input clauses into decision table form, then uses reducts to select only meaningful data. Second approach introduces a special kind of approximation space. When properly used, iterated lower and upper approximations of target concept have the ability to preferably select facts that are more relevant to the problem, at the same time throwing out the facts that are totally unimportant.

## References

- 1.Dzeroski S.: Inductive Logic Programming and Knowledge Discovery in Databases, (eds.) U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery & Data Mining, The MIT Press, 1996, pp. 117–152.Google Scholar
- 2.Lavrac N., Dzeroski S., Grobelnik, M.: Learning Non-Recursive Definitions of Relations with LINUS, Proceedings of Fifth European Working Session on Learning, 1991, pp. 265–281.Google Scholar
- 3.Lavrac N., Gamberger D., Turney P.: A Relevancy Filter for Constructive Induction, IEEE Intelligent Systems and Their Applications, 13(2), March/April 1998, pp. 50–56.CrossRefGoogle Scholar
- 4.Martienne E., Quafafou M.: Learning Logical Descriptions for Document Understanding: a Rough Sets-based Approach, Proceedings of the International Conference on Rough Sets and Current Trends in Computing, Warsaw, Poland, June 22–26, 1998, Lecture Notes in Artificial Intelligence 1424, Springer Verlag, pp. 202–209.Google Scholar
- 5.Martienne E., Quafafou M.: Vagueness and Data Reduction in Concept Learning, Proceedings of the 13th European Conference on Artificial Intelligence (ECAI-98), Brighton, UK, August 23–28, 1998.Google Scholar
- 6.Muggleton S.: Inverse Entailment and Progol, New Generation Computing, 13, 1995, pp. 245–286.CrossRefGoogle Scholar
- 7.Ohrn A., Komorowski J, Skowron A., Synak P.: The Design and Implementation of a Knowledge Discovery Toolkit Based on Rough Sets—The Rosetta System. (eds.) L. Polkowski, A. Skowron, Rough Sets in Knowledge Discovery Physica-Verlag, Heidelberg 1998.Google Scholar
- 8.Pawlak Z.: Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, 1991.Google Scholar
- 9.Quinlan J.R.: Learning Logical Definitions from Relations, Machine Learning, 5, 1990, pp. 239–266.Google Scholar
- 10.Skowron A., Stepaniuk J.: Generalized Approximation Spaces, Proceedings of the Third International Workshop on Rough Sets and Soft Computing, San Jose, November 10–12, 1994, pp. 156–163.Google Scholar
- 11.Skowron A., Stepaniuk J.: Tolerance Approximation Spaces, Fundamenta Informaticae, 27, 1996, pp. 245–253.MATHMathSciNetGoogle Scholar
- 12.Stepaniuk J.: Approximation Spaces, Reducts and Representatives, (eds.) L. Polkowski, A. Skowron, Rough Sets in Knowledge Discovery, Physica-Verlag, Heidelberg 1998.Google Scholar
- 13.Tsumoto S.: Extraction of Experts’ Decision Process from Clinical Databases Using Rough Set Model, PKDD’97, Trondheim, Norway, June 1997, Lecture Notes in Artificial Intelligence 1263, Springer Verlag, pp. 58–67.Google Scholar
- 14.Ziarko W., Shan N.: KDD-R: A Comprehensive System for Knowledge Discovery in Databases Using Rough Sets, Proceedings of the Third International Workshop on Rough Sets and Soft Computing, San Jose, November 10–12, 1994, pp. 164–173.Google Scholar