Methods and problems in data mining

  • Heikki Mannila
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1186)

Abstract

Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We also discuss possibilities for compiling data mining queries into algorithms, and look at the use of sampling in data mining. We conclude by listing several open research problems in data mining and knowledge discovery.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207–216, May 1993.Google Scholar
  2. 2.
    R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  3. 3.
    R. Agrawal and K. Shim. Developing tightly-coupled data mining applications on a relational database system. In Proc. of the 2nd Int'l Conference on Knowledge Discovery in Databases and Data Mining, pages 287–290, 1996.Google Scholar
  4. 4.
    S. Berchtold, D. A. Keim, and H. P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), pages 28–29, Mumbay, India, 1996. Morgan Kaufmann.Google Scholar
  5. 5.
    P. A. Boncz, W. Quak, and M. L. Kersten. Monet and its geographical extensions: a novel approach to high-performance GIS processing. In P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Advances in Database TechnologyEDBT'96, pages 147–166, 1996.Google Scholar
  6. 6.
    L. De Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), pages 1058–1053, Chambéry, France, 1993. Morgan Kaufmann.Google Scholar
  7. 7.
    L. De Raedt and S. Džeroski. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392, 1994.CrossRefGoogle Scholar
  8. 8.
    T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.Google Scholar
  9. 9.
    U. M. Fayyad, S. G. Djorgovski, and N. Weir. Automating the analysis and cataloging of sky surveys. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 471–494. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  10. 10.
    U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 1–34. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  11. 11.
    U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  12. 12.
    T. Fukuda et al. Data mining using two-dimensional optimized association rules: Scheme, algorithms, visualization. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'96), pages 13–23, 1996.Google Scholar
  13. 13.
    T. Fukuda et al. Mining optimized association rules for numeric attributes. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'96), 1996.Google Scholar
  14. 14.
    Z. Galil and E. Ukkonen, editors. 6th Annual Symposium on Combinatorial Patttern Matching (CPM 95), volume 937 of Lecture Notes in Computer Science, Berlin, 1995. Springer.Google Scholar
  15. 15.
    J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In 12th International Conference on Data Engineering (ICDE'96), pages 152–159, New Orleans, Louisiana, Feb. 1996.Google Scholar
  16. 16.
    J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 420–431, Zurich, Swizerland, 1995.Google Scholar
  17. 17.
    K. Hätönen, M. Klemettinen, H. Mannila, P. Ronkainen, and H. Toivonen. Knowledge discovery from telecommunication network alarm databases. In 12th International Conference on Data Engineering (ICDE'96), pages 115–122, New Orleans, Louisiana, Feb. 1996.Google Scholar
  18. 18.
    M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 150–155, Montreal, Canada, Aug. 1995.Google Scholar
  19. 19.
    M. Holsheimer, M. Kersten, and A. Siebes. Data surveyor: Searching the nuggets in parallel. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 447–467. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  20. 20.
    T. Imielinski. A database view on data mining. Invited talk at the KDD'95 conference.Google Scholar
  21. 21.
    T. Imielinski and H. Mannila. Database mining: a new frontier. Communications of the ACM, 1996. To appear.Google Scholar
  22. 22.
    T. Imielinski and A. Virmani. M-sql: Query language for database mining. Technical report, Rutgers University, January 1996.Google Scholar
  23. 23.
    M. Jaeger, H. Mannila, and E. Weydert. Data mining as selective theory extraction in probabilistic logic. In R. Ng, editor, SIGMOD'96 Data Mining Workshop, The University of British Columbia, Department of Computer Science, TR 96-08, pages 41–46, 1996.Google Scholar
  24. 24.
    M. Kantola, H. Mannila, K.-J. Räihä, and H. Siirtola. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7(7):591–607, Sept. 1992.Google Scholar
  25. 25.
    D. Keim and H. Kriegel. Visualization techniques for mining large databases: A comparison. IEEE Transactions on Knowledge and Data Engineering, 1996. to appear.Google Scholar
  26. 26.
    J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335–359. Academic Press, London, 1992.Google Scholar
  27. 27.
    J. Kivinen and H. Mannila. The power of sampling in knowledge discovery. In Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'94), pages 77–85, Minneapolis, MN, May 1994.Google Scholar
  28. 28.
    J. Kivinen and H. Mannila. Approximate dependency inference from relations. Theoretical Computer Science, 149(1):129–149, 1995.Google Scholar
  29. 29.
    M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proceedings of the Third International Conference on Information and Knowledge Mantagement (CIKM'94), pages 401–407, Gaithersburg, MD, Nov. 1994. ACM.Google Scholar
  30. 30.
    W. Kloesgen. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69, 1995.Google Scholar
  31. 31.
    H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the 8th International Conference on Scientific and Statistical Database Management, Stockholm, pages 1–6, 1996.Google Scholar
  32. 32.
    H. Mannila and K.-J. Räihä. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141, 1986.Google Scholar
  33. 33.
    H. Mannila and K.-J. Räihä. Design of Relational Databases. Addison-Wesley Publishing Company, Wokingham, UK, 1992.Google Scholar
  34. 34.
    H. Mannila and K.-J. Räihä. On the complexity of dependency inference. Discrete Applied Mathematics, 40:237–243, 1992.Google Scholar
  35. 35.
    H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 146–151, Portland, Oregon, Aug. 1996. AAAI Press.Google Scholar
  36. 36.
    H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 189–194, Portland, Oregon, Aug. 1996. AAAI Press.Google Scholar
  37. 37.
    H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research, pages 973–978, Vienna, Austria, Apr. 1996.Google Scholar
  38. 38.
    H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210–215, Montreal, Canada, Aug. 1995.Google Scholar
  39. 39.
    C. J. Matheus, G. Piatetsky-Shapiro, and D. McNeill. Selecting and reporting what is interesting. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 495–515. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  40. 40.
    R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), 1996. To appear.Google Scholar
  41. 41.
    K. Mulmuley. Computational Geometry: An Introduction Through Randomized Algorithms. Prentice Hall, New York, 1993.Google Scholar
  42. 42.
    B. Padmanabhan and A. Tuzhilin. Pattern discovery in temporal databases: A temporal logic approach. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 351–354, 1996.Google Scholar
  43. 43.
    A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 432–444, Zurich, Swizerland, 1995.Google Scholar
  44. 44.
    J. Schlimmer. Using learned dependencies to automatically construct sufficient and sensible editing views. In Knowledge Discovery in Databases, Papers from the 1993 AAAI Workshop (KDD'93), pages 186–196, Washington, D.C., 1993.Google Scholar
  45. 45.
    W. Shen, K. Ong, B. Mitbander, and C. Zaniolo. Metaqueries for data mining. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 375–398. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  46. 46.
    M. Siegel. Automatic rule derivation for semantic query optimization. Technical Report BUCS Tech Report # 86–013, Boston University, Computer Science Department, Dec. 1986.Google Scholar
  47. 47.
    R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'96), pages 1–12, Montreal, Canada, 1996.Google Scholar
  48. 48.
    H. Toivonen. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), pages 134–145, Mumbay, India, Sept. 1996. Morgan Kaufmann.Google Scholar
  49. 49.
    D. A. White and R. Jain. Algorithms and strategies for similarity retrieval. Technical Report VCL-96-101, Visual Computing Laboratory, University of California, San Diego, 9500 Gilman Drive, Mail Code 0407, La Jolla, CA 92093-0407, July 1996.Google Scholar

Copyright information

© Springer-Verlag 1997

Authors and Affiliations

  • Heikki Mannila
    • 1
  1. 1.Department of Computer ScienceUiversity of HelsinkiHelsinkiFinland

Personalised recommendations