Skip to main content

Inductive Databases and Multiple Uses of Frequent Itemsets: The cInQ Approach

  • Chapter
Database Support for Data Mining Applications

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2682))

Abstract

Inductive databases (IDBs) have been proposed to afford the problem of knowledge discovery from huge databases. With an IDB the user/analyst performs a set of very different operations on data using a query language, powerful enough to perform all the required elaborations, such as data preprocessing, pattern discovery and pattern post-processing. We present a synthetic view on important concepts that have been studied within the cInQ European project when considering the pattern domain of itemsets. Mining itemsets has been proved useful not only for association rule mining but also feature construction, classification, clustering, etc. We introduce the concepts of pattern domain, evaluation functions, primitive constraints, inductive queries and solvers for itemsets. We focus on simple high-level definitions that enable to forget about technical details that the interested reader will find, among others, in cInQ publications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings SIGMOD 1993, Washington, USA, May 1993, pp. 207–216. ACM Press, New York (1993)

    Google Scholar 

  2. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)

    Google Scholar 

  3. Baralis, E., Psaila, G.: Incremental refinement of mining queries. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 173–182. Springer, Heidelberg (1999)

    Google Scholar 

  4. Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD Explorations 2(2), 66–75 (2000)

    Article  MATH  Google Scholar 

  6. Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proceedings SIGMOD 1998, Seattle, USA, May 1998, pp. 85–93. ACM Press, New York (1998)

    Google Scholar 

  7. Bayardo, R.J., Agrawal, R.: Mining the most interesting rules. In: Proceedings SIGKDD 1999, San Diego, USA, August 1999, pp. 145–154. ACM Press, New York (1999)

    Google Scholar 

  8. Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.-F., Gandrillon, O.: Strong association rule mining for large gene expression data analysis: a case study on human SAGE data. Genome Biology 3(12) (December 2002)

    Google Scholar 

  9. Botta, M., Boulicaut, J.-F., Masson, C., Meo, R.: A comparison between query languages for the extraction of association rules. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 1–10. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  10. Botta, M., Boulicaut, J.-F., Masson, C., Meo, R.: Query languages supporting descriptive rule mining: a comparative study. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Botta, M., Meo, R., Sapino, M.-L.: Incremental execution of the MINE RULE operator. Technical Report RT 66/2002, Dipartimento di Informatica, Università degli Studi di Torino, Corso Svizzera 185, I-10149 Torino, Italy (May 2002)

    Google Scholar 

  12. Boulicaut, J.-F., Bykowski, A.: Frequent closures as a concise representation for binary data mining. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 62–73. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  13. Boulicaut, J.-F., Bykowski, A., Jeudy, B.: Mining association rules with negations. Technical Report 2000-14, INSA Lyon, LISI, Batiment Blaise Pascal, F-69621 Villeurbanne, France (November 2000)

    Google Scholar 

  14. Boulicaut, J.-F., Bykowski, A., Jeudy, B.: Towards the tractable discovery of association rules with negations. In: Proceedings FQAS 2000, Warsaw, PL, October 2000. Advances in Soft Computing series, pp. 425–434. Springer, Heidelberg (2000)

    Google Scholar 

  15. Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by mean of free-sets. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  16. Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery journal 7(1), 5–22 (2003)

    Article  MathSciNet  Google Scholar 

  17. Boulicaut, J.-F., Crémilleux, B.: Delta-strong classification rules for predicting collagen diseases. In: Proceedings of the ECML-PKDD 2001 Discovery Challenge on Thrombosis Data, Freiburg, D, September 2001, pp. 29–38 (2001) Available on line

    Google Scholar 

  18. Boulicaut, J.-F., Jeudy, B.: Using constraint for itemset mining: should we prune or not? In: Proceedings BDA 2000, Blois, F, October 2000, pp. 221–237 (2000)

    Google Scholar 

  19. Boulicaut, J.-F., Jeudy, B.: Mining free-sets under constraints. In: Proceedings IDEAS 2001, Grenoble, F, July 2001, pp. 322–329. IEEE Computer Society, Los Alamitos (2001)

    Google Scholar 

  20. Boulicaut, J.-F., Klemettinen, M., Mannila, H.: Modeling KDD processes within the inductive database framework. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 293–302. Springer, Heidelberg (1999)

    Google Scholar 

  21. Braga, D., Campi, A., Ceri, S., Klemettinen, M., Lanzi, P.L.: Discovering interesting information in XML data with association rules. In: Proceedings SAC 2003 Data Mining track, Melbourne, USA. ACM Press, New York (2003)

    Google Scholar 

  22. Braga, D., Campi, A., Klemettinen, M., Lanzi, P.L.: Mining association rules from XML data. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 21–30. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  23. Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proceedings SIGMOD 1997, Tucson, USA, May 1997, pp. 265–276. ACM Press, New York (1997)

    Google Scholar 

  24. Bykowski, A.: Condensed representations of frequent sets: application to descriptive pattern discovery. PhD thesis, Institut National des Sciences Appliquées de Lyon, LISI, F-69621 Villeurbanne cedex, France (October 2002)

    Google Scholar 

  25. Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proceedings PODS 2001, May 2001, pp. 267–273. ACM Press, New York (2001)

    Google Scholar 

  26. Calders, T., Goethals, B.: Mining all non derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–83. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  27. Capelle, M., Masson, C., Boulicaut, J.-F.: Mining frequent sequential patterns under a similarity constraint. In: Yin, H., Allinson, N.M., Freeman, R., Keane, J.A., Hubbard, S. (eds.) IDEAL 2002. LNCS, vol. 2412, pp. 1–6. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  28. Crémilleux, B., Boulicaut, J.-F.: Simplest rules characterizing classes generated by delta-free sets. In: Proceedings ES 2002, Cambridge, UK, December 2002, pp. 33–46. Springer, Heidelberg (2002)

    Google Scholar 

  29. de Raedt, L.: A logical view of inductive databases. Technical report, Institut fur Informatik, Albert-Ludwigs-Universitat, Georges-Kohler-Allee, Gebaude 079, D-79110 Freiburg, Germany, p. 13 (May 2002)

    Google Scholar 

  30. de Raedt, L.: Query evaluation and optimization for inductive database using version spaces (extended abstract). In: Proceedings DTDM 2002 co-located with EDBT 2002, Praha, CZ, March 2002, pp. 19–28 (2002), An extended version appears in this volume

    Google Scholar 

  31. de Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering (extended abstract). In: Proceedings ICDM 2002, Maebashi City, Japan, December 2002, pp. 123–130. IEEE Computer Press, Los Alamitos (2002)

    Google Scholar 

  32. de Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. In: Proceedings IJCAI 2001, Seattle, USA, August 2001, pp. 853–862. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  33. Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings SIGKDD 1999, San Diego, USA, August 1999, pp. 43–52. ACM Press, New York (1999)

    Google Scholar 

  34. Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceedings VLDB 1999, Edinburgh, UK, September 1999, pp. 223–234. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  35. Giacommetti, A., Laurent, D., Diop, C.T.: Condensed representations for sets of mining queries. In: Proceedings KDID 2002 co-located with ECML-PKDD 2002, Helinski, FIN (August 2002), An extended version appears in this volume

    Google Scholar 

  36. Goethals, B., den Bussche, J.V.: On supporting interactive association rule mining. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 307–316. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  37. Goethals, B., van den Bussche, J.: A priori versus a posteriori filtering of association rules. In: Proceedings SIGMOD Workshop DMKD 1999, Philadelphia, USA (May 1999)

    Google Scholar 

  38. Han, J., Kamber, M.: Data Mining: Concepts and techniques, p. 533. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  39. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings ACM SIGMOD 2000, Dallas, Texas, USA, May 2000, pp. 1–12. ACM Press, New York (2000)

    Chapter  Google Scholar 

  40. Hirsh, H.: Theoretical underpinnings of version spaces. In: Proceedings IJCAI 1991, Sydney, Australia, August 1991, pp. 665–670. Morgan Kaufmann, San Francisco (1991)

    Google Scholar 

  41. Hirsh, H.: Generalizing version spaces. Machine Learning 17(1), 5–46 (1994)

    MATH  Google Scholar 

  42. Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)

    Article  Google Scholar 

  43. Imielinski, T., Virmani, A.: MSQL: A query language for database mining. Data Mining and Knowledge Discovery 3(4), 373–408 (1999)

    Article  Google Scholar 

  44. Jeudy, B.: Extraction de motifs sous contraintes: application à l’évaluation de requêtes inductives. PhD thesis, Institut National des Sciences Appliquées de Lyon, LISI, F-69621 Villeurbanne cedex, France (December 2002) (in french)

    Google Scholar 

  45. Jeudy, B., Boulicaut, J.-F.: Constraint-based discovery and inductive queries: application to association rule mining. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 110–124. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  46. Jeudy, B., Boulicaut, J.-F.: Optimization of association rule mining queries. Intelligent Data Analysis journal 6, 341–357 (2002)

    MATH  Google Scholar 

  47. Jeudy, B., Boulicaut, J.-F.: Using condensed representations for interactive association rule mining. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 225–236. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  48. Kramer, S.: Demand-driven construction of structural features in ILP. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 132–141. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  49. Kramer, S., de Raedt, L.: Feature construction with version spaces for biochemical applications. In: Proceedings ICML 2001, William College, USA, July 2001, pp. 258–265. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  50. Kramer, S., de Raedt, L., Helma, C.: Molecular feature mining in HIV data. In: Proceedings SIGKDD 2001, San Francisco, USA, August 2001, pp. 136–143. ACM Press, New York (2001)

    Google Scholar 

  51. Lakshmanan, L.V., Ng, R., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proceedings SIGMOD 1999, Philadelphia, USA, pp. 157–168. ACM Press, New York (1999)

    Google Scholar 

  52. Lee, S.D., de Raedt, L.: Constraint-based mining of first order sequences in SEQLOG. In: Proceedings KDID 2002 co-located with ECML-PKDD 2002, Helsinki, FIN (August 2002), An extended version appears in this volume

    Google Scholar 

  53. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings KDD 1998, New York, USA, pp. 80–86. AAAI Press, Menlo Park (1998)

    Google Scholar 

  54. Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Proceedings KDD 1996, Portland, USA, August 1996, pp. 189–194. AAAI Press, Menlo Park (1996)

    Google Scholar 

  55. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)

    Article  Google Scholar 

  56. Masson, C., Jacquenet, F.: Mining frequent logical sequences with SPIRIT-LoG. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 166–182. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  57. Mellish, C.: The description identification problem. Artificial Intelligence 52(2), 151–168 (1992)

    Article  MATH  Google Scholar 

  58. Meo, R.: Optimization of a language for data mining. In: Proceedings of the 18th Symposium on Applied Computing SAC 2003 Data Mining track, Melbourne, USA. ACM Press, New York (2003)

    Google Scholar 

  59. Meo, R., Psaila, G., Ceri, S.: An extension to SQL for mining association rules. Data Mining and Knowledge Discovery 2(2), 195–224 (1998)

    Article  Google Scholar 

  60. Mitchell, T.: Generalization as search. Artificial Intelligence 18(2), 203–226 (1980)

    Article  MathSciNet  Google Scholar 

  61. Moen, P.: Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining. PhD thesis, Department of Computer Science, P.O. Box 26, FIN-00014 University of Helsinki (January 2000)

    Google Scholar 

  62. Nag, B., Deshpande, P.M., DeWitt, D.J.: Using a knowledge cache for interactive discovery of association rules. In: Proceedings SIGKDD 1999, San Diego, USA, August 1999, pp. 244–253. ACM Press, New York (1999)

    Google Scholar 

  63. Nedellec, C., Rouveirol, C., Ade, H., Bergadano, F.: Declarative bias in inductive logic programming. In: de Raedt, L. (ed.) Advances in Logic Programming, pp. 82–103. IOS Press, Amsterdam (1996)

    Google Scholar 

  64. Ng, R., Lakshmanan, L.V., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained associations rules. In: Proceedings SIGMOD 1998, Seattle, USA, pp. 13–24. ACM Press, New York (1998)

    Google Scholar 

  65. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24(1), 25–46 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  66. Pei, J., Dong, G., Zou, W., Han, J.: On computing condensed frequent pattern bases. In: Proceedings ICDM 2002, Maebashi City, JP, December 2002, pp. 378–385. IEEE Computer Press, Los Alamitos (2002)

    Google Scholar 

  67. Pei, J., Han, J.: Constrained frequent pattern mining:a pattern-growth view. SIGKDD Explorations 4(1), 31–39 (2002)

    Article  Google Scholar 

  68. Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. In: Proceedings ICDE 2001, Heidelberg, D, April 2001, pp. 433–442. IEEE Computer Press, Los Alamitos (2001)

    Google Scholar 

  69. Pei, J., Han, J., Mao, R.: CLOSET an efficient algorithm for mining frequent closed itemsets. In: Proceedings SIGMOD Workshop DMKD 2000, Dallas, USA (May 2000)

    Google Scholar 

  70. Scheffer, T.: Finding association rules that trade support optimally against confidence. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 424–435. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  71. Sese, J., Morishita, S.: Answering the most correlated N association rules efficiently. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 410–422. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  72. Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering 4(4), 301–316 (1992)

    Article  Google Scholar 

  73. Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proceedings KDD 1997, Newport Beach, USA, pp. 67–73. AAAI Press, Menlo Park (1997)

    Google Scholar 

  74. Toivonen, H.: Sampling large databases for association rules. In: Proceedings VLDB 1996, Mumbay, India, September 1996, pp. 134–145. Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  75. Zaki, M.J.: Generating non-redundant association rules. In: Proceedings SIGKDD 2000, Boston, USA, August 2000, pp. 34–43. ACM Press, New York (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Boulicaut, JF. (2004). Inductive Databases and Multiple Uses of Frequent Itemsets: The cInQ Approach. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-44497-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22479-2

  • Online ISBN: 978-3-540-44497-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics