Data Mining and Knowledge Discovery

, Volume 1, Issue 3, pp 241–258 | Cite as

Levelwise Search and Borders of Theories in Knowledge Discovery

  • Heikki Mannila
  • Hannu Toivonen
Article

Abstract

One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S \( \subseteq \) L determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.

theory of knowledge discovery association rules episodes integrity constraints hypergraph transversals 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '93).Washington, D.C., pp. 207–216.Google Scholar
  2. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, (Eds.), Menlo Park, CA: AAAI Press. pp. 307–328.Google Scholar
  3. Bell, S. 1995. Discovery and maintenance of functional dependencies by independencies. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 27–32.Google Scholar
  4. Berge, C. 1973. Hypergraphs. Combinatorics of Finite Sets (third edition). Amsterdam: North-Holland.Google Scholar
  5. Brin, S., Motwani, R., and Silverstein, S. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '97). Tucson, AZ., pp. 265–276.Google Scholar
  6. Chang, C.C. and Keisler, H.J. 1973. Model Theory (third edition, 1990). Amsterdam: North-Holland.Google Scholar
  7. De Raedt, L. and Bruynooghe, M. 1993. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93). Chambéry, France, pp. 1058–1053.Google Scholar
  8. De Raedt, L. and Džeroski, S. 1994. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392.Google Scholar
  9. Eiter, T. and Gottlob, G. 1995. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304.Google Scholar
  10. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (Eds.), 1996. Advances in Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press.Google Scholar
  11. Fredman, M.L. and Khachiyan, L. 1996. On the complexity of dualization of monotone disjunctive normal forms. Journal of Algorithms, 21(3):618–628.Google Scholar
  12. Fukuda, T., Morimoto, Y., Morishita, S., and Tokuyama, T. 1996. Mining optimized association rules for numeric attributes. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '96). Montréal, Canada, pp. 182–191.Google Scholar
  13. Gunopulos, D., Khardon, R., Mannila, H., and Toivonen, H. 1997. Data mining, hypergraph transversals, and machine learning. In Proceedings of the SixteenthACMSIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '97). Tucson, AZ., pp. 209–216.Google Scholar
  14. Gurvich, V. and Khachiyan, L. 1995. On generating the irredundant conjunctive and disjunctive normal forms of monotone boolean functions. Technical Report LCSR-TR-251, Rutgers University.Google Scholar
  15. Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 420–431.Google Scholar
  16. Holsheimer, M., Kersten, M., Mannila, H., and Toivonen, H. 1995. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 150–155.Google Scholar
  17. Houtsma, M. and Swami, A. 1993. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, CA.Google Scholar
  18. Imielinski, T. and Mannila, H. 1996. A database perspective on knowledge discovery. Communications of the ACM, 39(11):58–64.Google Scholar
  19. Kantola, M., Mannila, H., Räihä, K.-J., and Siirtola, H. 1992. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7(7):591–607.Google Scholar
  20. Kietz, J.-U. and Wrobel, S. 1992. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton ( Ed.), Inductive Logic Programming. London: Academic Press, pp. 335–359.Google Scholar
  21. Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A.I. 1994. Finding interesting rules from large sets of discovered association rules. In Proceedings of the Third International Conference on Information and Knowledge Management (CIKM '94). Gaithersburg, MD, pp. 401–407.Google Scholar
  22. Kloesgen, W. 1995. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69.Google Scholar
  23. Knobbe, A.J. and Adriaans, P.W. 1996. Discovering foreign key relations in relational databases. In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research. Vienna, Austria, pp. 961–966.Google Scholar
  24. Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  25. Mannila, H. and Räihä, K.-J. 1986. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141.Google Scholar
  26. Mannila, H. and Räihä, K.-J. 1992a. Design of Relational Databases. Wokingham, UK: Addison-Wesley.Google Scholar
  27. Mannila, H. and Räihä, K.-J. 1992b. On the complexity of dependency inference. Discrete Applied Mathematics, 40:237–243.Google Scholar
  28. Mannila, H. and Räihä, K.-J. 1994. Algorithms for inferring functional dependencies. Data and Knowledge Engineering, 12(1):83–99.Google Scholar
  29. Mannila, H., Toivonen, H., and Verkamo, A.I. 1995. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 210–215.Google Scholar
  30. Mannila, H., Toivonen, H., and Verkamo, A.I. 1997. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289.Google Scholar
  31. Mishra, N. and Pitt, L. 1995. On bounded-degree hypergraph transversal. Manuscript.Google Scholar
  32. Mitchell, T.M. 1982. Generalization as search. Artificial Intelligence, 18:203–226.Google Scholar
  33. Park, J.S., Chen, M.-S., and Yu, P.S. 1995. An effective hash-based algorithm for mining association rules. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '95). San Jose, CA, pp. 175–186.Google Scholar
  34. Pfahringer, B. and Kramer, S. 1995. Compression-based evaluation of partial determinations. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 234–239.Google Scholar
  35. Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley (Eds.), Menlo Park, CA: AAAI Press, pp. 229–248.Google Scholar
  36. Piatetsky-Shapiro, G. and Frawley, W.J. (Eds.), 1991. Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.Google Scholar
  37. Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 432–444.Google Scholar
  38. Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 407–419.Google Scholar
  39. Srikant, R. and Agrawal, R. 1996. Mining quantitative association rules in large relational tables. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '96). Montréal, Canada, pp. 1–12.Google Scholar
  40. Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96). Mumbay, India, pp. 134–145.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Heikki Mannila
    • 1
    • 1
  • Hannu Toivonen
    • 1
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations