Heuristic Measures of Interestingness
The tuples in a generalized relation (i.e., a summary generated from a database) are unique, and therefore, can be considered to be a population with a structure that can be described by some frequency or probability distribution based upon the values contained in the derived Count attribute. In this chapter, we describe sixteen diversity measures that evaluate the frequency or probability distribution of the values in the derived Count attribute in a summary to assign a single real-valued index that represents its interestingness relative to other summaries generated from the same database. The measures are well-known measures of dispersion, dominance, inequality, and concentration that have previously been successfully and frequently applied in several areas of the physical, social, ecological, management, information, and computer sciences. Their use for ranking summaries generated from databases is a new application area.
KeywordsDiversity Measure Gini Coefficient Shannon Index Lorenz Curve Proportional Distribution
Unable to display preview. Download preview PDF.