# Levelwise Search and Borders of Theories in Knowledge Discovery

- 480 Downloads
- 369 Citations

## Abstract

One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S \( \subseteq \) L determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.

## Preview

Unable to display preview. Download preview PDF.

### References

- Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '93).Washington, D.C., pp. 207–216.Google Scholar
- Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, (Eds.), Menlo Park, CA: AAAI Press. pp. 307–328.Google Scholar
- Bell, S. 1995. Discovery and maintenance of functional dependencies by independencies. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 27–32.Google Scholar
- Berge, C. 1973. Hypergraphs. Combinatorics of Finite Sets (third edition). Amsterdam: North-Holland.Google Scholar
- Brin, S., Motwani, R., and Silverstein, S. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '97). Tucson, AZ., pp. 265–276.Google Scholar
- Chang, C.C. and Keisler, H.J. 1973. Model Theory (third edition, 1990). Amsterdam: North-Holland.Google Scholar
- De Raedt, L. and Bruynooghe, M. 1993. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93). Chambéry, France, pp. 1058–1053.Google Scholar
- De Raedt, L. and Džeroski, S. 1994. First-order
*jk*-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392.Google Scholar - Eiter, T. and Gottlob, G. 1995. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304.Google Scholar
- Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (Eds.), 1996. Advances in Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press.Google Scholar
- Fredman, M.L. and Khachiyan, L. 1996. On the complexity of dualization of monotone disjunctive normal forms. Journal of Algorithms, 21(3):618–628.Google Scholar
- Fukuda, T., Morimoto, Y., Morishita, S., and Tokuyama, T. 1996. Mining optimized association rules for numeric attributes. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '96). Montréal, Canada, pp. 182–191.Google Scholar
- Gunopulos, D., Khardon, R., Mannila, H., and Toivonen, H. 1997. Data mining, hypergraph transversals, and machine learning. In Proceedings of the SixteenthACMSIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '97). Tucson, AZ., pp. 209–216.Google Scholar
- Gurvich, V. and Khachiyan, L. 1995. On generating the irredundant conjunctive and disjunctive normal forms of monotone boolean functions. Technical Report LCSR-TR-251, Rutgers University.Google Scholar
- Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 420–431.Google Scholar
- Holsheimer, M., Kersten, M., Mannila, H., and Toivonen, H. 1995. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 150–155.Google Scholar
- Houtsma, M. and Swami, A. 1993. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, CA.Google Scholar
- Imielinski, T. and Mannila, H. 1996. A database perspective on knowledge discovery. Communications of the ACM, 39(11):58–64.Google Scholar
- Kantola, M., Mannila, H., Räihä, K.-J., and Siirtola, H. 1992. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7(7):591–607.Google Scholar
- Kietz, J.-U. and Wrobel, S. 1992. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton ( Ed.), Inductive Logic Programming. London: Academic Press, pp. 335–359.Google Scholar
- Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A.I. 1994. Finding interesting rules from large sets of discovered association rules. In Proceedings of the Third International Conference on Information and Knowledge Management (CIKM '94). Gaithersburg, MD, pp. 401–407.Google Scholar
- Kloesgen, W. 1995. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69.Google Scholar
- Knobbe, A.J. and Adriaans, P.W. 1996. Discovering foreign key relations in relational databases. In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research. Vienna, Austria, pp. 961–966.Google Scholar
- Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
- Mannila, H. and Räihä, K.-J. 1986. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141.Google Scholar
- Mannila, H. and Räihä, K.-J. 1992a. Design of Relational Databases. Wokingham, UK: Addison-Wesley.Google Scholar
- Mannila, H. and Räihä, K.-J. 1992b. On the complexity of dependency inference. Discrete Applied Mathematics, 40:237–243.Google Scholar
- Mannila, H. and Räihä, K.-J. 1994. Algorithms for inferring functional dependencies. Data and Knowledge Engineering, 12(1):83–99.Google Scholar
- Mannila, H., Toivonen, H., and Verkamo, A.I. 1995. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 210–215.Google Scholar
- Mannila, H., Toivonen, H., and Verkamo, A.I. 1997. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289.Google Scholar
- Mishra, N. and Pitt, L. 1995. On bounded-degree hypergraph transversal. Manuscript.Google Scholar
- Mitchell, T.M. 1982. Generalization as search. Artificial Intelligence, 18:203–226.Google Scholar
- Park, J.S., Chen, M.-S., and Yu, P.S. 1995. An effective hash-based algorithm for mining association rules. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '95). San Jose, CA, pp. 175–186.Google Scholar
- Pfahringer, B. and Kramer, S. 1995. Compression-based evaluation of partial determinations. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 234–239.Google Scholar
- Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley (Eds.), Menlo Park, CA: AAAI Press, pp. 229–248.Google Scholar
- Piatetsky-Shapiro, G. and Frawley, W.J. (Eds.), 1991. Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.Google Scholar
- Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 432–444.Google Scholar
- Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 407–419.Google Scholar
- Srikant, R. and Agrawal, R. 1996. Mining quantitative association rules in large relational tables. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '96). Montréal, Canada, pp. 1–12.Google Scholar
- Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96). Mumbay, India, pp. 134–145.Google Scholar