Abstract
Ranking summaries generated from databases is useful within the context of descriptive data mining tasks where a single data set can be generalized in many different ways and to many levels of granularity. Our approach to generating summaries is based upon a data structure, associated with an attribute, called a domain generalization graph (DGG). A DGG for an attribute is a directed graph where each node represents a domain of values created by partitioning the original domain for the attribute, and each edge represents a generalization relation between these domains. Given a set of DGGs associated with a set of attributes, a generalization space can be defined as all possible combinations of domains, where one domain is selected from each DGG for each combination. This generalization space describes, then, all possible summaries consistent with the DGGs that can be generated from the selected attributes. When the number of attributes to be generalized is large or the DGGs associated with the attributes are complex, the generalization space can be very large, resulting in the generation of many summaries. The number of summaries can easily exceed the capabilities of a domain expert to identify interesting results. In this paper, we show that the Lorenz dominance order can be used to rank the summaries prior to presentation to the domain expert. The Lorenz dominance order defines a partial order on the summaries, in most cases, and in some cases, defines a total order. The rank order of the summaries represents an objective evaluation of their relative interestingness and provides the domain expert with a starting point for further subjective evaluation of the summaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
B.C. Arnold. Majorization and the Lorenz Order: A Brief Introduction. Springer-Verlag, 1987.
H. Dalton. The measurement of the inequality of incomes. Economic Journal, 30:348–361, 1920.
J. Han, Y. Cai, and N. Cercone. Data-driven discovery of quantitative rules in relational databases. IEEE Transactions on Knowledge and Data Engineering, 5(1):29–40, February 1993.
R.J. Hilderman and H. J. Hamilton. Principles for mining summaries using objective measures of interestingness. In Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pages 72–81, Vancouver, BC, November 2000.
R.J. Hilderman and H.J. Hamilton. Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, 2002.
R.J. Hilderman, Liangchun Li, and H.J. Hamilton. Visualizing data mining results with domain generalization graphs. In U. Fayyad, G.G. Grinstein, and A. Wierse, editors, Information Visualization in Data Mining and Knowledge Discovery, pages 251–270. Morgan Kaufmann Publishers, 2002.
A.W. Marshall and I. Olkin. Inequalities: Theory of Majorization and its Applications. Academic Press, 1979.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hilderman, R.J. (2002). The Lorenz Dominance Order as a Measure of Interestingness in KDD. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_17
Download citation
DOI: https://doi.org/10.1007/3-540-47887-6_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive