Skip to main content

Discovering all most specific sentences by randomized algorithms extended abstract

  • Contributed Papers
  • Conference paper
  • First Online:
Database Theory — ICDT '97 (ICDT 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1186))

Included in the following conference series:

Abstract

Data mining can in many instances be viewed as the task of computing a representation of a theory of a model or a database. In this paper we present a randomized algorithm that can be used to compute the representation of a theory in terms of the most specific sentences of that theory. In addition to randomization, the algorithm uses a generalization of the concept of hypergraph transversal. We apply the general algorithm, for discovering maximal frequent sets in 0/1 data, and for computing minimal keys in relations. We present some empirical results on the performance of these methods on real data. We also show some complexity theoretic evidence of the hardness of these problems.

Work supported by Alexander von Humbold-Stiftung and the Academy of Finland.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207–216, May 1993.

    Google Scholar 

  2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  3. R. Agrawal and R. Srikant. Mining sequential patterns. In International Conference on Data Engineering, Mar. 1995.

    Google Scholar 

  4. S. Bell. Deciding distinctness of query results by discovered constraints. Manuscript.

    Google Scholar 

  5. S. Bell and P. Brockhausen. Discovery of data dependencies in relational databases. Technical Report LS-8 14, Universität Dortmund, Fachbereich Informatik, Lehrstuhl VIII, Künstliche Intelligenz, 1995.

    Google Scholar 

  6. C. Berge. Hypergraphs. Combinatorics of Finite Sets. North-Holland Publishing Company, Amsterdam, 1989.

    Google Scholar 

  7. C. C. Chang and H. J. Keisler. Model Theory. North-Holland, Amsterdam, 1973. 3rd ed., 1990.

    Google Scholar 

  8. L. De Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), pages 1058–1053, Chambéry, France, 1993. Morgan Kaufmann.

    Google Scholar 

  9. L. De Raedt and S. Džeroski. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392, 1994.

    Article  Google Scholar 

  10. T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.

    Google Scholar 

  11. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  12. M. Garey and D. Johnson. Computers and Intractability — A Guide to the Theory of NP-Completeness W.H. Freeman, New York, 1979.

    Google Scholar 

  13. J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 420–431, Zurich, Swizerland, 1995.

    Google Scholar 

  14. M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, California, October 1993.

    Google Scholar 

  15. D. S. Johnson, M. Yannakakis, and C. H. Papadimitriou. On generating all maximal independent sets. Information Processing Letters, 27:119–123, 1988.

    Google Scholar 

  16. J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335–359. Academic Press, London, 1992.

    Google Scholar 

  17. W. Kloesgen. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69, 1995.

    Google Scholar 

  18. A. J. Knobbe and P. W. Adriaans. Discovering foreign key relations in relational databases. In Workshop Notes of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pages 94–99, Heraklion, Crete, Greece, Apr. 1995.

    Google Scholar 

  19. H. Mannila. Aspects of data mining. In Workshop Notes of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pages 1–6, Heraklion, Crete, Greece, Apr. 1995.

    Google Scholar 

  20. H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the 8th International Conference on Scientific and Statistical Database Management, Stockholm, 1996. To appear.

    Google Scholar 

  21. H. Mannila and K.-J. Räihä. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141, 1986.

    Google Scholar 

  22. H. Mannila and K.-J. Räihä. Algorithms for inferring functional dependencies. Data & Knowledge Engineering, 12(1):83–99, Feb. 1994.

    Google Scholar 

  23. H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In Cybernetics and Systems Research '96, Vienna, Austria, Apr. 1996. To appear.

    Google Scholar 

  24. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210–215, Montreal, Canada, Aug. 1995.

    Google Scholar 

  25. A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 432–444, Zurich, Swizerland, 1995.

    Google Scholar 

  26. J. Schlimmer. Using learned dependencies to automatically construct sufficient and sensible editing views. In Knowledge Discovery in Databases, Papers from the 1993 AAAI Workshop (KDD'93), pages 186–196, Washington, D.C., 1993.

    Google Scholar 

  27. J. D. Ullman. Principles of Database and Knowledge-Base Systems, volume I. Computer Science Press, Rockville, MD, 1988.

    Google Scholar 

  28. L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410–421, 1979.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Foto Afrati Phokion Kolaitis

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gunopulos, D., Mannila, H., Saluja, S. (1996). Discovering all most specific sentences by randomized algorithms extended abstract. In: Afrati, F., Kolaitis, P. (eds) Database Theory — ICDT '97. ICDT 1997. Lecture Notes in Computer Science, vol 1186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62222-5_47

Download citation

  • DOI: https://doi.org/10.1007/3-540-62222-5_47

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62222-2

  • Online ISBN: 978-3-540-49682-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics