Principles of Data Mining and Knowledge Discovery

Volume 1510 of the series Lecture Notes in Computer Science pp 441-449


Data transformation and rough sets

  • Jaroslaw StepaniukAffiliated withInstitute of Computer Science, Bialystok University of Technology
  • , Marcin MajAffiliated withInstitute of Computer Science, Bialystok University of Technology

* Final gross prices may vary according to local VAT.

Get Access


Knowledge discovery and data mining systems have to face several difficulties, in particular related to the huge amount of input data. This problem is especially related to inductive logic programming systems, which employ algorithms that are computationally complex. Learning time can be reduced by feeding the ILP algorithm only a well-chosen portion of the original input data. Such transformation of the input data should throw away unimportant clauses but leave ones that are potentially necessary to obtain proper results. In this paper two approaches to data reduction problem are proposed. Both are based on rough set theory. Rough set techniques serve as data reduction tools to reduce the size of input data fed to more time-expensive (search-intensive) ILP techniques. First approach transforms input clauses into decision table form, then uses reducts to select only meaningful data. Second approach introduces a special kind of approximation space. When properly used, iterated lower and upper approximations of target concept have the ability to preferably select facts that are more relevant to the problem, at the same time throwing out the facts that are totally unimportant.