Technical Papers Optimisation/Performance Issues

Advances in Databases

Volume 1094 of the series Lecture Notes in Computer Science pp 124-133

Date:

Speeding up knowledge discovery in large relational databases by means of a new discretization algorithm

  • Alex Alves FreitasAffiliated withDept. of Computer Science, University of Essex
  • , Simon H. LavingtonAffiliated withDept. of Computer Science, University of Essex

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Most of the KDD (Knowledge Discovery in Databases) algorithms proposed in the literature have been applied to relatively small datasets and do not permit any integration with a DBMS. Hence, the application of these algorithms to the huge amounts of data found in current databases and data warehouses faces serious scalability problems, particularly the problem of excessive learning time. This paper investigates a way of improving the scalability of KDD algorithms, via discretization of ordinal or continuous attributes. This work has two novel aspects. First, we map a generic discretization primitive into an SQL query. Second, we propose a new discretization algorithm for classification tasks. We show how the new discretization algorithm can be implemented with good effect via the SQL primitive.