Automated Discovery of Rules and Exceptions from Distributed Databases Using Aggregates

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Large amounts of data pose special problems for Knowledge Discovery in Databases. More efficient means are required to ease this problem, and one possibility is the use of sufficient statistics or “aggregates”, rather than low level data. This is especially true for Knowledge Discovery from distributed databases. The data of interest is of a similar type to that found in OLAP data cubes and the Data Warehouse. This data is numerical and is described in terms of a number of categorical attributes (Dimensions). Few algorithms to date carry out knowledge discovery on such data. Using aggregate data and accompanying meta-data returned from a number of distributed databases, we use statistical models to identify and highlight relationships between a single numerical attribute and a number of Dimensions. These are initially presented to the user via a graphical interactive middle-ware, which allows drilling down to a more detailed level. On the basis of these relationships, we induce rules in conjunctive normal form. Finally, exceptions to these rules are discovered.