Multivariate Discretization for Set Mining

Bay, Stephen D.

doi:10.1007/PL00011680

Multivariate Discretization for Set Mining

Regular Paper
Published: November 2001

Volume 3, pages 491–512, (2001)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Stephen D. Bay¹

553 Accesses
58 Citations
3 Altmetric
Explore all metrics

Abstract.

Many algorithms in data mining can be formulated as a set-mining problem where the goal is to find conjunctions (or disjunctions) of terms that meet user-specified constraints. Set-mining techniques have been largely designed for categorical or discrete data where variables can only take on a fixed number of values. However, many datasets also contain continuous variables and a common method of dealing with these is to discretize them by breaking them into ranges. Most discretization methods are univariate and consider only a single feature at a time (sometimes in conjunction with a class variable). We argue that this is a suboptimal approach for knowledge discovery as univariate discretization can destroy hidden patterns in data. Discretization should consider the effects on all variables in the analysis and that two regions X and Y should only be in the same interval after discretization if the instances in those regions have similar multivariate distributions (F _x∼F _y) across all variables and combinations of variables. We present a bottom-up merging algorithm to discretize continuous variables based on this rule. Our experiments indicate that the approach is feasible, that it will not destroy hidden patterns and that it will generate meaningful intervals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

Department of Information and Computer Science, University of California, Irvine, California, USA, , , , , , US
Stephen D. Bay

Authors

Stephen D. Bay
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Received 14 November 2000 / Revised 1 February 2001 / Accepted in revised form 1 May 2001

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bay, S. Multivariate Discretization for Set Mining. Knowledge and Information Systems 3, 491–512 (2001). https://doi.org/10.1007/PL00011680

Download citation

Issue Date: November 2001
DOI: https://doi.org/10.1007/PL00011680

Keywords: Data mining; Multivariate discretization; Set mining

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multivariate Discretization for Set Mining

Abstract.

Access this article

Similar content being viewed by others

Using discretization for extending the set of predictive features

A two-stage discretization algorithm based on information entropy

Graph clustering-based discretization of splitting and merging methods (GraphS and GraphM)

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Multivariate Discretization for Set Mining

Abstract.

Access this article

Similar content being viewed by others

Using discretization for extending the set of predictive features

A two-stage discretization algorithm based on information entropy

Graph clustering-based discretization of splitting and merging methods (GraphS and GraphM)

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation