Advertisement

A comparison of atttribute selection strategies for attribute-oriented generalization

  • Brock Barber
  • Howard J. Hamilton
Communications Session 1B Learning and Discovery Systems
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1325)

Abstract

Attribute-oriented generalization (AOG) is a knowledge discovery method that uses generalization to simplify the descriptions of patterns in database data. AOG repeatedly replaces specific values for an attribute with more general concepts according to domain expert defined concept hierarchies. The degree of generalization is controlled by 2 userdefined thresholds. As presented by other researchers, the AOG process does not consider how interesting the results will be to the user. Given a relation retrieved from a database, many different relations can be created by generalization, some of which will be more interesting to the user than others. The attribute selection strategy, the method of choosing the next attribute for generalization, determines which of the many possible relations will be generated and thus can be used to direct the user towards the most interesting relations. We evaluate the performance of ten previously proposed and new attribute selection strategies by applying them to a 10,000 tuple public domain database and an 8,000,000 tuple commercial database. The strategies are compared using criteria that consider their ability to efficiently produce interesting results. We use measures of interestingness that consider the structure of the hierarchies that are used to guide generalization. Based on the comparison of the experimental results, a strategy that considers the complexity of the concept hierarchies was found to provide efficient and effective guidance towards interesting results.

Keywords

Learning and knowledge discovery applications 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7.0 References

  1. [1]
    D.B. Barber and H.J. Hamilton, “Attribute Selection Strategies for Attribute-Oriented Generalization,” Proceedings ofthe Canadian AI Conference (AI '96), 429–441.Google Scholar
  2. [2]
    C. L. Carter, H. J. Hamilton and N, Cercone, “The Software Architecture of DBLEARN,” Technical Report CS-94-04,” University of Regina, 1994.Google Scholar
  3. [3]
    H. J. Hamilton and D. F. Fudger, “Estimating DBLEARN's Potential for Knowledge Discovery in Databases,” Computational Intelligence, 11(2), 1995, 1–18.Google Scholar
  4. [4]
    H. J. Hamilton, R.J. Hilderman and N. Cercone, “Attribute-oriented Induction Using Domain Generalization Graphs,” Proceedings of the Eighth International Conference on Tools with Artificial Intelligence, Toulouse, France, 1996, 246–253.Google Scholar
  5. [5]
    J. Han, Y. Cai and N. Cercone, “Knowledge Discovery in Databases: An Attribute-Oriented Approach,” Proceedings of the 18th VLDB Conference, Vancouver, British Columbia, 1992, 547–559.Google Scholar
  6. [6]
    M. Klemettimin, H. Mannila, P. Ronkainen, H. Toivonen and A. I. Verkamo, “Finding Interesting Rules from Large Sets of Discovered Association Rules,” in: Adams N.R., Bhargava B.K. and Yesha Y., Eds., Third International Conference on Information and Knowledge Management, ACM Press, Gaitersburg, Maryland, Nov.-Dec., 1994, 401–407.Google Scholar
  7. [7]
    G.H. John, R. Kohavi G.H. John and L. Pfleger, “Irrelevant Features and the Subset Selection Problem,” in: W.W. Cohen and H. Hirsh, Eds., Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann, San Francisco, CA., 1994, 121–129.Google Scholar
  8. [8]
    J. A. Major and J. J. Mangano, “Selecting Among Rules Induced from a Hurricane Database,” Knowledge Discovery in Databases: Papers from the 1993 Workshop, Technical Report WS-93-02, AAAI Press, Menlo Park, CA., 1993, 1–13.Google Scholar
  9. [9]
    C. J. Matheus, G. Piatetsky-Shapiro and D. McNeill, “Selecting and Reporting What is Interesting: The KEFIR Application to Healthcare Data,” in: U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, Eds., Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Menlo Park, CA., 1995, 401–419.Google Scholar
  10. [10]
    T.M. Mitchell, Version Spaces: An Approach to Concept Learning, Ph.d thesis, Stanford University, 1978.Google Scholar
  11. [11]
    G. Piatetsky-Shapiro; “Discovery, Analysis and Presentation of Strong Rules,” in: G. Piatetsky-Shapiro and W. J. Frawley, Eds., Knowledge Discovery in Databases, AAAI/MIT Press, Menlo Park, CA, 1991, 229–248.Google Scholar
  12. [12]
    J.R. Quinlan, C4.5: Programs forMachine Learning, Morgan Kaufmann, Los Altos, CA, 1993.Google Scholar
  13. [13]
    N. Shan, H. J. Hamilton and N. Cercone, “GRG: Knowledge Discovery Using Information Generalization, Information Reduction and Rule Generation,” 7th IEEE International Conference on Tools with Artificial Intelligence, Washington, D.C., November, 1995., 372–379.Google Scholar
  14. [14]
    A. Silberschatz and A. Tuzhilin, “On Subjective Measures of Interestingness in Knowledge Discovery,” Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada, August, 1995, 275–281.Google Scholar
  15. [15]
    P. Smyth and R. M. Goodman, “Rule Induction using Information Theory,” in: Piatetsky-Shapiro G. and Frawley W.J., Eds., Knowledge Discovery in Databases, AAAI/MIT Press, Menlo Park, CA, 1991, 159–176.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Brock Barber
    • 1
  • Howard J. Hamilton
    • 1
  1. 1.Department of Computer ScienceUniversity of ReginaReginaCanada

Personalised recommendations