An Appropriate Abstraction for an Attribute-Oriented Induction
An attribute-oriented induction is a useful data mining method that generalizes databases under an appropriate abstraction hierarchy to extract meaningful knowledge. The hierarchy is well designed so as to exclude meaningless rules from a particular point of view. However, there may exist several ways of generalizing databases according to user’s intention. It is therefore important to provide a multi-layered abstraction hierarchy under which several generalizations are possible and are well controlled. In fact, too-general or too-specific databases are inappropriate for mining algorithms to extract significant rules. From this viewpoint, this paper proposes a generalization method based on an information theoretical measure to select an appropriate abstraction hierarchy. Furthermore, we present a system, called ITA (Information Theoretical Abstraction), based on our method and an attribute-oriented induction. We perform some practical experiments in which ITA discovers meaningful rules from a census database US Census Bureau and discuss the validity of ITA based on the experimental results.
Unable to display preview. Download preview PDF.
- 1.Adriaans, P. and Zantinge, D.: Data Mining, Addison Wesley Longman Ltd., 1996.Google Scholar
- 2.Arimoto, S: Probability, Information, Entropy, Morikita Shuppan, 1980 (in Japanese)Google Scholar
- 3.Fayyad, U.N., Piatetsky-Shapiro, G., Smyth, P. and Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996.Google Scholar
- 4.Fayyad, U.N., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery: an Overview. In Piatetsky-Shapiro, G., Smyth, P. and Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996 , pp.1–33Google Scholar
- 5.Han, J., Cai, Y. and Cercone, N.: Knowledge Discovery in Databases: An Attribute-Oriented Approach, Proceeding of VLDB’92, Canada, pp. 547–559, 1992.Google Scholar
- 6.Han, J. and Fu, Y.: Attribute-Oriented Induction in Data Mining. In Piatetsky-Shapiro, G., Smyth, P. and Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996 , pp.399–421Google Scholar
- 7.Matsumoto, K., Morita, C. and Tsukimoto, H. Generalized Rule Discovery in Databases by Finding Similarities In: SIG-J-9401-15, pp.111–118, Japanese Society for Artificial Intelligence, 1994.Google Scholar
- 9.Murphy, P.M. and Aha, D.W.: UCI Repository of machine learning databases, http://www.ics.uci.edu/mlearn/MLRepository.html.Google Scholar
- 10.Quinlan, J.R.: C4.5-Programs for Machine Learning, Morgan Kaufmann, 1993.Google Scholar