Discovering all most specific sentences by randomized algorithms extended abstract

Gunopulos, Dimitrios; Mannila, Heikki; Saluja, Sanjeev

doi:10.1007/3-540-62222-5_47

Dimitrios Gunopulos¹,
Heikki Mannila² &
Sanjeev Saluja¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1186))

Included in the following conference series:

International Conference on Database Theory

148 Accesses
22 Citations

Abstract

Data mining can in many instances be viewed as the task of computing a representation of a theory of a model or a database. In this paper we present a randomized algorithm that can be used to compute the representation of a theory in terms of the most specific sentences of that theory. In addition to randomization, the algorithm uses a generalization of the concept of hypergraph transversal. We apply the general algorithm, for discovering maximal frequent sets in 0/1 data, and for computing minimal keys in relations. We present some empirical results on the performance of these methods on real data. We also show some complexity theoretic evidence of the hardness of these problems.

Work supported by Alexander von Humbold-Stiftung and the Academy of Finland.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207–216, May 1993.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
R. Agrawal and R. Srikant. Mining sequential patterns. In International Conference on Data Engineering, Mar. 1995.
Google Scholar
S. Bell. Deciding distinctness of query results by discovered constraints. Manuscript.
Google Scholar
S. Bell and P. Brockhausen. Discovery of data dependencies in relational databases. Technical Report LS-8 14, Universität Dortmund, Fachbereich Informatik, Lehrstuhl VIII, Künstliche Intelligenz, 1995.
Google Scholar
C. Berge. Hypergraphs. Combinatorics of Finite Sets. North-Holland Publishing Company, Amsterdam, 1989.
Google Scholar
C. C. Chang and H. J. Keisler. Model Theory. North-Holland, Amsterdam, 1973. 3rd ed., 1990.
Google Scholar
L. De Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), pages 1058–1053, Chambéry, France, 1993. Morgan Kaufmann.
Google Scholar
L. De Raedt and S. Džeroski. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392, 1994.
Article Google Scholar
T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.
Google Scholar
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
M. Garey and D. Johnson. Computers and Intractability — A Guide to the Theory of NP-Completeness W.H. Freeman, New York, 1979.
Google Scholar
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 420–431, Zurich, Swizerland, 1995.
Google Scholar
M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, California, October 1993.
Google Scholar
D. S. Johnson, M. Yannakakis, and C. H. Papadimitriou. On generating all maximal independent sets. Information Processing Letters, 27:119–123, 1988.
Google Scholar
J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335–359. Academic Press, London, 1992.
Google Scholar
W. Kloesgen. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69, 1995.
Google Scholar
A. J. Knobbe and P. W. Adriaans. Discovering foreign key relations in relational databases. In Workshop Notes of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pages 94–99, Heraklion, Crete, Greece, Apr. 1995.
Google Scholar
H. Mannila. Aspects of data mining. In Workshop Notes of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pages 1–6, Heraklion, Crete, Greece, Apr. 1995.
Google Scholar
H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the 8th International Conference on Scientific and Statistical Database Management, Stockholm, 1996. To appear.
Google Scholar
H. Mannila and K.-J. Räihä. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141, 1986.
Google Scholar
H. Mannila and K.-J. Räihä. Algorithms for inferring functional dependencies. Data & Knowledge Engineering, 12(1):83–99, Feb. 1994.
Google Scholar
H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In Cybernetics and Systems Research '96, Vienna, Austria, Apr. 1996. To appear.
Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210–215, Montreal, Canada, Aug. 1995.
Google Scholar
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 432–444, Zurich, Swizerland, 1995.
Google Scholar
J. Schlimmer. Using learned dependencies to automatically construct sufficient and sensible editing views. In Knowledge Discovery in Databases, Papers from the 1993 AAAI Workshop (KDD'93), pages 186–196, Washington, D.C., 1993.
Google Scholar
J. D. Ullman. Principles of Database and Knowledge-Base Systems, volume I. Computer Science Press, Rockville, MD, 1988.
Google Scholar
L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410–421, 1979.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck-Insitut Informatik, Im Stadtwald, 66123, Saarbrücken, Germany
Dimitrios Gunopulos & Sanjeev Saluja
Dept. of Computer Science, University of Helsinki, FIN-00014, Helsinki, Finland
Heikki Mannila

Authors

Dimitrios Gunopulos
View author publications
You can also search for this author in PubMed Google Scholar
Heikki Mannila
View author publications
You can also search for this author in PubMed Google Scholar
Sanjeev Saluja
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Foto Afrati Phokion Kolaitis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gunopulos, D., Mannila, H., Saluja, S. (1996). Discovering all most specific sentences by randomized algorithms extended abstract. In: Afrati, F., Kolaitis, P. (eds) Database Theory — ICDT '97. ICDT 1997. Lecture Notes in Computer Science, vol 1186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62222-5_47

Download citation

DOI: https://doi.org/10.1007/3-540-62222-5_47
Published: 03 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62222-2
Online ISBN: 978-3-540-49682-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics