The Lorenz Dominance Order as a Measure of Interestingness in KDD

Hilderman, Robert J.

doi:10.1007/3-540-47887-6_17

Robert J. Hilderman⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2112 Accesses
1 Citations

Abstract

Ranking summaries generated from databases is useful within the context of descriptive data mining tasks where a single data set can be generalized in many different ways and to many levels of granularity. Our approach to generating summaries is based upon a data structure, associated with an attribute, called a domain generalization graph (DGG). A DGG for an attribute is a directed graph where each node represents a domain of values created by partitioning the original domain for the attribute, and each edge represents a generalization relation between these domains. Given a set of DGGs associated with a set of attributes, a generalization space can be defined as all possible combinations of domains, where one domain is selected from each DGG for each combination. This generalization space describes, then, all possible summaries consistent with the DGGs that can be generated from the selected attributes. When the number of attributes to be generalized is large or the DGGs associated with the attributes are complex, the generalization space can be very large, resulting in the generation of many summaries. The number of summaries can easily exceed the capabilities of a domain expert to identify interesting results. In this paper, we show that the Lorenz dominance order can be used to rank the summaries prior to presentation to the domain expert. The Lorenz dominance order defines a partial order on the summaries, in most cases, and in some cases, defines a total order. The rank order of the summaries represents an objective evaluation of their relative interestingness and provides the domain expert with a starting point for further subjective evaluation of the summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

B.C. Arnold. Majorization and the Lorenz Order: A Brief Introduction. Springer-Verlag, 1987.
Google Scholar
H. Dalton. The measurement of the inequality of incomes. Economic Journal, 30:348–361, 1920.
Article Google Scholar
J. Han, Y. Cai, and N. Cercone. Data-driven discovery of quantitative rules in relational databases. IEEE Transactions on Knowledge and Data Engineering, 5(1):29–40, February 1993.
Google Scholar
R.J. Hilderman and H. J. Hamilton. Principles for mining summaries using objective measures of interestingness. In Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pages 72–81, Vancouver, BC, November 2000.
Google Scholar
R.J. Hilderman and H.J. Hamilton. Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, 2002.
Google Scholar
R.J. Hilderman, Liangchun Li, and H.J. Hamilton. Visualizing data mining results with domain generalization graphs. In U. Fayyad, G.G. Grinstein, and A. Wierse, editors, Information Visualization in Data Mining and Knowledge Discovery, pages 251–270. Morgan Kaufmann Publishers, 2002.
Google Scholar
A.W. Marshall and I. Olkin. Inequalities: Theory of Majorization and its Applications. Academic Press, 1979.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada, S4S 0A2
Robert J. Hilderman

Authors

Robert J. Hilderman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EE Department, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan, ROC
Ming-Syan Chen
IBM Thomas J. Watson Research Center, 30 Sawmill River Road, Hawthorne, NY, 10532, USA
Philip S. Yu
School of Computing, National University of Singapore, Lower Kent Ridge Road, Singapore, 119260
Bing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hilderman, R.J. (2002). The Lorenz Dominance Order as a Measure of Interestingness in KDD. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_17

Download citation

DOI: https://doi.org/10.1007/3-540-47887-6_17
Published: 29 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics