Advertisement

Machine Learning

, Volume 27, Issue 3, pp 287–312 | Cite as

A Multistrategy Approach to Relational Knowledge Discovery in Databases

  • Katharina Morik
  • Peter Brockhausen
Article
  • 254 Downloads

Abstract

When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on real-world databases. The opposite extreme is to select a small data set, thereby being able to learn very expressive (first-order logic) hypotheses. A multistrategy approach allows one to include most of these advantages and exclude most of the disadvantages. Simpler learning algorithms detect hierarchies which are used to structure the hypothesis space for a more complex learning algorithm. The better structured the hypothesis space is, the better learning can prune away uninteresting or losing hypotheses and the faster it becomes.

We have combined inductive logic programming (ILP) directly with a relational database management system. The ILP algorithm is controlled in a model-driven way by the user and in a data-driven way by structures that are induced by three simple learning algorithms.

Knowledge discovery in databases inductive logic programming functional dependencies numerical intervals background knowledge 

References

  1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1996). Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining. Cambridge, MA: AAAI Press/MIT Press.Google Scholar
  2. Beeri, C., Dowd, M., Fagin, R., & Statman, R. (1984). On the structure of Armstrong relations for functional dependencies. Journal of the ACM, 31, 30–46.Google Scholar
  3. Bell, S. (1995). Discovery and maintenance of functional dependencies by independencies. In U. M. Fayyad & R. Uthurusamy (Eds.), Proceedings of the First International Conference on Knowledge Discovery and Data Mining (pp. 27–32). Cambridge, MA: AAAI Press/MIT Press.Google Scholar
  4. Bergadano, F., Giordana, A., & Saitta, L. (1991). Machine learning: An integrated framework and its applications. New York: Ellis Horwood.Google Scholar
  5. Brachman, R. J., & Anand, T. (1996). The process of knowledge discovery in databases: A human-centered approach. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining. Cambridge, MA: AAAI Press/MIT Press.Google Scholar
  6. Cai, Y., Cercone, N., & Han, J. (1991). Attribute–oriented induction in relational databases. In G. Piatetsky-Shapiro & W. Frawley (Eds.), Knowledge discovery in databases. Cambridge, MA: AAAI Press/MIT Press.Google Scholar
  7. De Raedt, L. (1992). Interactive theory revision: An inductive logic programming approach. New York: Academic Press.Google Scholar
  8. De Raedt, L., & Bruynooghe, M. (1992). An overview of the interactive concept–learner and theory revisor CLINT. In S. Muggleton (Ed.), Inductive logic programming. London: Academic Press.Google Scholar
  9. De Raedt, L., & Bruynooghe, M. (1993). A theory of clausal discovery. In R. Bajcy (Ed.), Proceedings of the 13th International Joint Conference on Artificial Intelligence, (pp. 1058–1063). San Mateo, CA: Morgan Kaufmann.Google Scholar
  10. Džeroski, S. (1996). Inductive logic programming and knowledge discovery in databases. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining. Cambridge, MA: AAAI Press/MIT Press.Google Scholar
  11. Flach, P. A. (1992). A framework for inductive logic programming. In S. Muggleton (Ed.), Inductive logic programming. London: Academic Press.Google Scholar
  12. Flach, P. A. (1993). Predicate invention in inductive data engineering. In P. Brazdil (Ed.), Procs. of the European Conf. on Machine Learning (ECML-93), volume 667 of Lecture Notes in Artificial Intelligence, (pp. 83–94). Vienna, Austria: Springer Verlag.Google Scholar
  13. Helft, N. (1987). Inductive generalisation: A logical framework. In Procs. of the 2nd European Working Session on Learning.Google Scholar
  14. Kanellakis, P. (1990). Elements of relational database theory. In J. van Leeuwen (Ed.), Formal models and semantics, handbook of theoretical computer science. Amsterdam: Elsevier.Google Scholar
  15. Kietz, J.-U. (1988). Incremental and reversible acquisition of taxonomies. In J. M. Boose (Ed.), Proceedings of EKAW–88, (pp. 1–11). Sankt Augustin: GMD. Also as KIT-Report 66, Technical University Berlin.Google Scholar
  16. Kietz, J.-U. (1996). Induktive Analyse relationaler Daten. PhD thesis, Technische Universität Berlin.Google Scholar
  17. Kietz, J.-U., & Džeroski, S. (1994). Inductive logic programming and learnability. SIGART–Bulletin, 5, 22–32.Google Scholar
  18. Kietz, J.-U., & Wrobel, S. (1992). Controlling the complexity of learning in logic through syntactic and task– oriented models. In S. Muggleton (Ed.), Inductive logic programming. London: Academic Press.Google Scholar
  19. Lavrač, N., & Džeroski, S. (1994). Inductive logic programming - techniques and applications. New York: Ellis Horwood.Google Scholar
  20. Lindner, G., & Morik, K. (1995). Coupling a relational learning lagorithm with a database system. In Y. Kodratoff, G. Nakhaeizadeh, & C. Taylor (Eds.), Statistics, Machine Learning, and Knowledge Discovery in Databases, MLnet Familiarization Workshops, (pp. 163–168). MLnet.Google Scholar
  21. Mannila, H. (1995). Aspects of data mining. In Y. Kodratoff, G. Nakhaeizadeh, & C. Taylor (Eds.), Statistics, Machine Learning, and Knowledge Discovery in Databases, MLnet Familiarization Workshops, (pp. 1–6). MLnet.Google Scholar
  22. Mannila, H., & Räihä, K. (1994). Algorithms for inferring functional dependencies from relations. Data and Knowledge Engineering, 12, 83–99.Google Scholar
  23. Mannila, H., & Toivonen, H. (1996). On an algorithm for finding all interesting sentences. In R. Trappl (Ed.), '96 (EMCSR 1996), (pp. 973–978). Vienna: Austrian Society for Cybernetic Studies.Google Scholar
  24. Michalski, R. S. (1983). A theory and methodology of inductive learning. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol. 1). Palo Alto, CA: Morgan Kaufmann.Google Scholar
  25. Michalski, R. S. (1994). Inferential theory of learning: Developing foundations for multistrategy learning. In R. S. Michalski & G. Tecuci (Eds.), Machine learning: A multistrategy approach (Vol. 4). San Francisco, CA: Morgan Kaufmann.Google Scholar
  26. Morik, K., Wrobel, S., Kietz, J.-U., & Emde, W. (1993). Knowledge acquisition and machine learning: Theory, methods, and applications. London: Academic Press.Google Scholar
  27. Muggleton, S. (1995). Inverse entailment and progol. New Generation Computing, 13, 245–286.Google Scholar
  28. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20, 629–679.CrossRefGoogle Scholar
  29. Pazzani, M. J. (1995). An iterative improvement approach for the discretization of numeric attributes in Bayesian classifiers. In U. M. Fayyad & R. Uthurusamy (Eds.), Proceedings of the First International Conference on Knowledge Discovery and Data Mining (pp. 228–233). Cambridge, MA: AAAI Press/MIT Press.Google Scholar
  30. Pazzani, M. J., & Kibler, D. (1992). The utility of knowledge in inductive learning. Machine Learning, 9, 57–94.Google Scholar
  31. Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro & W. Frawley (Eds.), Knowledge discovery in databases. Cambridge, MA: AAAI Press/MIT Press.Google Scholar
  32. Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.CrossRefGoogle Scholar
  33. Savnik, I., & Flach, P. A. (1993). Bottom-up induction of functional dependencies from relations. In G. Piatetsky-Shapiro (Ed.), Proceedings of the AAAI-93 Workshop on Knowledge Discovery in Databases, (pp. 174–185). Menlo Park, CA: AAAI Press.Google Scholar
  34. Ullman, J. D. (1988). Principles of database and knowledge–base systems (Vol. 1). Rockville, MD: Computer Science Press.Google Scholar
  35. Wettscherek, D., & Dietterich, T. G.(1995). An experimental comparison of the nearest-neighbour and nearest-hyperrectangle algorithms. Machine Learning, 19, 5–27.Google Scholar
  36. Wrobel, S., Wettscherek, D., Sommer, E., & Emde, W. (1996). Extensibility in data mining systems. In E. Simoudis & J.W. Han (Eds.), 2nd Int. Conference on Knowledge Discovery and Data Mining, (pp. 214–219). Menlo Park, CA: AAAI Press.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Katharina Morik
    • 1
  • Peter Brockhausen
    • 1
  1. 1.Computer Science Department, LS VIIIUniv. DortmundDortmund

Personalised recommendations