Skip to main content

Methods and problems in data mining

  • Conference paper
  • First Online:
Book cover Database Theory — ICDT '97 (ICDT 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1186))

Included in the following conference series:

Abstract

Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We also discuss possibilities for compiling data mining queries into algorithms, and look at the use of sampling in data mining. We conclude by listing several open research problems in data mining and knowledge discovery.

Part of this work was done while the author was visiting the Max Planck Institut für Informatik in Saarbrücken, Germany. Work supported by the Academy of Finland and by the Alexander von Humboldt Stiftung.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207–216, May 1993.

    Google Scholar 

  2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  3. R. Agrawal and K. Shim. Developing tightly-coupled data mining applications on a relational database system. In Proc. of the 2nd Int'l Conference on Knowledge Discovery in Databases and Data Mining, pages 287–290, 1996.

    Google Scholar 

  4. S. Berchtold, D. A. Keim, and H. P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), pages 28–29, Mumbay, India, 1996. Morgan Kaufmann.

    Google Scholar 

  5. P. A. Boncz, W. Quak, and M. L. Kersten. Monet and its geographical extensions: a novel approach to high-performance GIS processing. In P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Advances in Database TechnologyEDBT'96, pages 147–166, 1996.

    Google Scholar 

  6. L. De Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), pages 1058–1053, Chambéry, France, 1993. Morgan Kaufmann.

    Google Scholar 

  7. L. De Raedt and S. Džeroski. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392, 1994.

    Article  Google Scholar 

  8. T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.

    Google Scholar 

  9. U. M. Fayyad, S. G. Djorgovski, and N. Weir. Automating the analysis and cataloging of sky surveys. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 471–494. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  10. U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 1–34. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  11. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  12. T. Fukuda et al. Data mining using two-dimensional optimized association rules: Scheme, algorithms, visualization. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'96), pages 13–23, 1996.

    Google Scholar 

  13. T. Fukuda et al. Mining optimized association rules for numeric attributes. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'96), 1996.

    Google Scholar 

  14. Z. Galil and E. Ukkonen, editors. 6th Annual Symposium on Combinatorial Patttern Matching (CPM 95), volume 937 of Lecture Notes in Computer Science, Berlin, 1995. Springer.

    Google Scholar 

  15. J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In 12th International Conference on Data Engineering (ICDE'96), pages 152–159, New Orleans, Louisiana, Feb. 1996.

    Google Scholar 

  16. J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 420–431, Zurich, Swizerland, 1995.

    Google Scholar 

  17. K. Hätönen, M. Klemettinen, H. Mannila, P. Ronkainen, and H. Toivonen. Knowledge discovery from telecommunication network alarm databases. In 12th International Conference on Data Engineering (ICDE'96), pages 115–122, New Orleans, Louisiana, Feb. 1996.

    Google Scholar 

  18. M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 150–155, Montreal, Canada, Aug. 1995.

    Google Scholar 

  19. M. Holsheimer, M. Kersten, and A. Siebes. Data surveyor: Searching the nuggets in parallel. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 447–467. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  20. T. Imielinski. A database view on data mining. Invited talk at the KDD'95 conference.

    Google Scholar 

  21. T. Imielinski and H. Mannila. Database mining: a new frontier. Communications of the ACM, 1996. To appear.

    Google Scholar 

  22. T. Imielinski and A. Virmani. M-sql: Query language for database mining. Technical report, Rutgers University, January 1996.

    Google Scholar 

  23. M. Jaeger, H. Mannila, and E. Weydert. Data mining as selective theory extraction in probabilistic logic. In R. Ng, editor, SIGMOD'96 Data Mining Workshop, The University of British Columbia, Department of Computer Science, TR 96-08, pages 41–46, 1996.

    Google Scholar 

  24. M. Kantola, H. Mannila, K.-J. Räihä, and H. Siirtola. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7(7):591–607, Sept. 1992.

    Google Scholar 

  25. D. Keim and H. Kriegel. Visualization techniques for mining large databases: A comparison. IEEE Transactions on Knowledge and Data Engineering, 1996. to appear.

    Google Scholar 

  26. J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335–359. Academic Press, London, 1992.

    Google Scholar 

  27. J. Kivinen and H. Mannila. The power of sampling in knowledge discovery. In Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'94), pages 77–85, Minneapolis, MN, May 1994.

    Google Scholar 

  28. J. Kivinen and H. Mannila. Approximate dependency inference from relations. Theoretical Computer Science, 149(1):129–149, 1995.

    Google Scholar 

  29. M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proceedings of the Third International Conference on Information and Knowledge Mantagement (CIKM'94), pages 401–407, Gaithersburg, MD, Nov. 1994. ACM.

    Google Scholar 

  30. W. Kloesgen. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69, 1995.

    Google Scholar 

  31. H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the 8th International Conference on Scientific and Statistical Database Management, Stockholm, pages 1–6, 1996.

    Google Scholar 

  32. H. Mannila and K.-J. Räihä. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141, 1986.

    Google Scholar 

  33. H. Mannila and K.-J. Räihä. Design of Relational Databases. Addison-Wesley Publishing Company, Wokingham, UK, 1992.

    Google Scholar 

  34. H. Mannila and K.-J. Räihä. On the complexity of dependency inference. Discrete Applied Mathematics, 40:237–243, 1992.

    Google Scholar 

  35. H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 146–151, Portland, Oregon, Aug. 1996. AAAI Press.

    Google Scholar 

  36. H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 189–194, Portland, Oregon, Aug. 1996. AAAI Press.

    Google Scholar 

  37. H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research, pages 973–978, Vienna, Austria, Apr. 1996.

    Google Scholar 

  38. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210–215, Montreal, Canada, Aug. 1995.

    Google Scholar 

  39. C. J. Matheus, G. Piatetsky-Shapiro, and D. McNeill. Selecting and reporting what is interesting. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 495–515. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  40. R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), 1996. To appear.

    Google Scholar 

  41. K. Mulmuley. Computational Geometry: An Introduction Through Randomized Algorithms. Prentice Hall, New York, 1993.

    Google Scholar 

  42. B. Padmanabhan and A. Tuzhilin. Pattern discovery in temporal databases: A temporal logic approach. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 351–354, 1996.

    Google Scholar 

  43. A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 432–444, Zurich, Swizerland, 1995.

    Google Scholar 

  44. J. Schlimmer. Using learned dependencies to automatically construct sufficient and sensible editing views. In Knowledge Discovery in Databases, Papers from the 1993 AAAI Workshop (KDD'93), pages 186–196, Washington, D.C., 1993.

    Google Scholar 

  45. W. Shen, K. Ong, B. Mitbander, and C. Zaniolo. Metaqueries for data mining. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 375–398. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  46. M. Siegel. Automatic rule derivation for semantic query optimization. Technical Report BUCS Tech Report # 86–013, Boston University, Computer Science Department, Dec. 1986.

    Google Scholar 

  47. R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'96), pages 1–12, Montreal, Canada, 1996.

    Google Scholar 

  48. H. Toivonen. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), pages 134–145, Mumbay, India, Sept. 1996. Morgan Kaufmann.

    Google Scholar 

  49. D. A. White and R. Jain. Algorithms and strategies for similarity retrieval. Technical Report VCL-96-101, Visual Computing Laboratory, University of California, San Diego, 9500 Gilman Drive, Mail Code 0407, La Jolla, CA 92093-0407, July 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Foto Afrati Phokion Kolaitis

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mannila, H. (1996). Methods and problems in data mining. In: Afrati, F., Kolaitis, P. (eds) Database Theory — ICDT '97. ICDT 1997. Lecture Notes in Computer Science, vol 1186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62222-5_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-62222-5_35

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62222-2

  • Online ISBN: 978-3-540-49682-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics