Discovering all most specific sentences by randomized algorithms extended abstract

  • Dimitrios Gunopulos
  • Heikki Mannila
  • Sanjeev Saluja
Contributed Papers Session 4: New Applications
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1186)

Abstract

Data mining can in many instances be viewed as the task of computing a representation of a theory of a model or a database. In this paper we present a randomized algorithm that can be used to compute the representation of a theory in terms of the most specific sentences of that theory. In addition to randomization, the algorithm uses a generalization of the concept of hypergraph transversal. We apply the general algorithm, for discovering maximal frequent sets in 0/1 data, and for computing minimal keys in relations. We present some empirical results on the performance of these methods on real data. We also show some complexity theoretic evidence of the hardness of these problems.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207–216, May 1993.Google Scholar
  2. 2.
    R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  3. 3.
    R. Agrawal and R. Srikant. Mining sequential patterns. In International Conference on Data Engineering, Mar. 1995.Google Scholar
  4. 4.
    S. Bell. Deciding distinctness of query results by discovered constraints. Manuscript.Google Scholar
  5. 5.
    S. Bell and P. Brockhausen. Discovery of data dependencies in relational databases. Technical Report LS-8 14, Universität Dortmund, Fachbereich Informatik, Lehrstuhl VIII, Künstliche Intelligenz, 1995.Google Scholar
  6. 6.
    C. Berge. Hypergraphs. Combinatorics of Finite Sets. North-Holland Publishing Company, Amsterdam, 1989.Google Scholar
  7. 7.
    C. C. Chang and H. J. Keisler. Model Theory. North-Holland, Amsterdam, 1973. 3rd ed., 1990.Google Scholar
  8. 8.
    L. De Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), pages 1058–1053, Chambéry, France, 1993. Morgan Kaufmann.Google Scholar
  9. 9.
    L. De Raedt and S. Džeroski. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392, 1994.CrossRefGoogle Scholar
  10. 10.
    T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.Google Scholar
  11. 11.
    U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  12. 12.
    M. Garey and D. Johnson. Computers and Intractability — A Guide to the Theory of NP-Completeness W.H. Freeman, New York, 1979.Google Scholar
  13. 13.
    J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 420–431, Zurich, Swizerland, 1995.Google Scholar
  14. 14.
    M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, California, October 1993.Google Scholar
  15. 15.
    D. S. Johnson, M. Yannakakis, and C. H. Papadimitriou. On generating all maximal independent sets. Information Processing Letters, 27:119–123, 1988.Google Scholar
  16. 16.
    J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335–359. Academic Press, London, 1992.Google Scholar
  17. 17.
    W. Kloesgen. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69, 1995.Google Scholar
  18. 18.
    A. J. Knobbe and P. W. Adriaans. Discovering foreign key relations in relational databases. In Workshop Notes of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pages 94–99, Heraklion, Crete, Greece, Apr. 1995.Google Scholar
  19. 19.
    H. Mannila. Aspects of data mining. In Workshop Notes of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pages 1–6, Heraklion, Crete, Greece, Apr. 1995.Google Scholar
  20. 20.
    H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the 8th International Conference on Scientific and Statistical Database Management, Stockholm, 1996. To appear.Google Scholar
  21. 21.
    H. Mannila and K.-J. Räihä. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141, 1986.Google Scholar
  22. 22.
    H. Mannila and K.-J. Räihä. Algorithms for inferring functional dependencies. Data & Knowledge Engineering, 12(1):83–99, Feb. 1994.Google Scholar
  23. 23.
    H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In Cybernetics and Systems Research '96, Vienna, Austria, Apr. 1996. To appear.Google Scholar
  24. 24.
    H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210–215, Montreal, Canada, Aug. 1995.Google Scholar
  25. 25.
    A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 432–444, Zurich, Swizerland, 1995.Google Scholar
  26. 26.
    J. Schlimmer. Using learned dependencies to automatically construct sufficient and sensible editing views. In Knowledge Discovery in Databases, Papers from the 1993 AAAI Workshop (KDD'93), pages 186–196, Washington, D.C., 1993.Google Scholar
  27. 27.
    J. D. Ullman. Principles of Database and Knowledge-Base Systems, volume I. Computer Science Press, Rockville, MD, 1988.Google Scholar
  28. 28.
    L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410–421, 1979.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 1997

Authors and Affiliations

  • Dimitrios Gunopulos
    • 1
  • Heikki Mannila
    • 2
  • Sanjeev Saluja
    • 1
  1. 1.Max-Planck-Insitut InformatikSaarbrückenGermany
  2. 2.Dept. of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations