Inductive Databases and Multiple Uses of Frequent Itemsets: The cInQ Approach

Boulicaut, Jean-François

doi:10.1007/978-3-540-44497-8_1

Jean-François Boulicaut⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2682))

387 Accesses
8 Citations

Abstract

Inductive databases (IDBs) have been proposed to afford the problem of knowledge discovery from huge databases. With an IDB the user/analyst performs a set of very different operations on data using a query language, powerful enough to perform all the required elaborations, such as data preprocessing, pattern discovery and pattern post-processing. We present a synthetic view on important concepts that have been studied within the cInQ European project when considering the pattern domain of itemsets. Mining itemsets has been proved useful not only for association rule mining but also feature construction, classification, clustering, etc. We introduce the concepts of pattern domain, evaluation functions, primitive constraints, inductive queries and solvers for itemsets. We focus on simple high-level definitions that enable to forget about technical details that the interested reader will find, among others, in cInQ publications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings SIGMOD 1993, Washington, USA, May 1993, pp. 207–216. ACM Press, New York (1993)
Google Scholar
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)
Google Scholar
Baralis, E., Psaila, G.: Incremental refinement of mining queries. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 173–182. Springer, Heidelberg (1999)
Google Scholar
Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000)
Chapter Google Scholar
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD Explorations 2(2), 66–75 (2000)
Article MATH Google Scholar
Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proceedings SIGMOD 1998, Seattle, USA, May 1998, pp. 85–93. ACM Press, New York (1998)
Google Scholar
Bayardo, R.J., Agrawal, R.: Mining the most interesting rules. In: Proceedings SIGKDD 1999, San Diego, USA, August 1999, pp. 145–154. ACM Press, New York (1999)
Google Scholar
Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.-F., Gandrillon, O.: Strong association rule mining for large gene expression data analysis: a case study on human SAGE data. Genome Biology 3(12) (December 2002)
Google Scholar
Botta, M., Boulicaut, J.-F., Masson, C., Meo, R.: A comparison between query languages for the extraction of association rules. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 1–10. Springer, Heidelberg (2002)
Chapter Google Scholar
Botta, M., Boulicaut, J.-F., Masson, C., Meo, R.: Query languages supporting descriptive rule mining: a comparative study. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, Springer, Heidelberg (2004)
Chapter Google Scholar
Botta, M., Meo, R., Sapino, M.-L.: Incremental execution of the MINE RULE operator. Technical Report RT 66/2002, Dipartimento di Informatica, Università degli Studi di Torino, Corso Svizzera 185, I-10149 Torino, Italy (May 2002)
Google Scholar
Boulicaut, J.-F., Bykowski, A.: Frequent closures as a concise representation for binary data mining. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 62–73. Springer, Heidelberg (2000)
Chapter Google Scholar
Boulicaut, J.-F., Bykowski, A., Jeudy, B.: Mining association rules with negations. Technical Report 2000-14, INSA Lyon, LISI, Batiment Blaise Pascal, F-69621 Villeurbanne, France (November 2000)
Google Scholar
Boulicaut, J.-F., Bykowski, A., Jeudy, B.: Towards the tractable discovery of association rules with negations. In: Proceedings FQAS 2000, Warsaw, PL, October 2000. Advances in Soft Computing series, pp. 425–434. Springer, Heidelberg (2000)
Google Scholar
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by mean of free-sets. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)
Chapter Google Scholar
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery journal 7(1), 5–22 (2003)
Article MathSciNet Google Scholar
Boulicaut, J.-F., Crémilleux, B.: Delta-strong classification rules for predicting collagen diseases. In: Proceedings of the ECML-PKDD 2001 Discovery Challenge on Thrombosis Data, Freiburg, D, September 2001, pp. 29–38 (2001) Available on line
Google Scholar
Boulicaut, J.-F., Jeudy, B.: Using constraint for itemset mining: should we prune or not? In: Proceedings BDA 2000, Blois, F, October 2000, pp. 221–237 (2000)
Google Scholar
Boulicaut, J.-F., Jeudy, B.: Mining free-sets under constraints. In: Proceedings IDEAS 2001, Grenoble, F, July 2001, pp. 322–329. IEEE Computer Society, Los Alamitos (2001)
Google Scholar
Boulicaut, J.-F., Klemettinen, M., Mannila, H.: Modeling KDD processes within the inductive database framework. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 293–302. Springer, Heidelberg (1999)
Google Scholar
Braga, D., Campi, A., Ceri, S., Klemettinen, M., Lanzi, P.L.: Discovering interesting information in XML data with association rules. In: Proceedings SAC 2003 Data Mining track, Melbourne, USA. ACM Press, New York (2003)
Google Scholar
Braga, D., Campi, A., Klemettinen, M., Lanzi, P.L.: Mining association rules from XML data. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 21–30. Springer, Heidelberg (2002)
Chapter Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proceedings SIGMOD 1997, Tucson, USA, May 1997, pp. 265–276. ACM Press, New York (1997)
Google Scholar
Bykowski, A.: Condensed representations of frequent sets: application to descriptive pattern discovery. PhD thesis, Institut National des Sciences Appliquées de Lyon, LISI, F-69621 Villeurbanne cedex, France (October 2002)
Google Scholar
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proceedings PODS 2001, May 2001, pp. 267–273. ACM Press, New York (2001)
Google Scholar
Calders, T., Goethals, B.: Mining all non derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–83. Springer, Heidelberg (2002)
Chapter Google Scholar
Capelle, M., Masson, C., Boulicaut, J.-F.: Mining frequent sequential patterns under a similarity constraint. In: Yin, H., Allinson, N.M., Freeman, R., Keane, J.A., Hubbard, S. (eds.) IDEAL 2002. LNCS, vol. 2412, pp. 1–6. Springer, Heidelberg (2002)
Chapter Google Scholar
Crémilleux, B., Boulicaut, J.-F.: Simplest rules characterizing classes generated by delta-free sets. In: Proceedings ES 2002, Cambridge, UK, December 2002, pp. 33–46. Springer, Heidelberg (2002)
Google Scholar
de Raedt, L.: A logical view of inductive databases. Technical report, Institut fur Informatik, Albert-Ludwigs-Universitat, Georges-Kohler-Allee, Gebaude 079, D-79110 Freiburg, Germany, p. 13 (May 2002)
Google Scholar
de Raedt, L.: Query evaluation and optimization for inductive database using version spaces (extended abstract). In: Proceedings DTDM 2002 co-located with EDBT 2002, Praha, CZ, March 2002, pp. 19–28 (2002), An extended version appears in this volume
Google Scholar
de Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering (extended abstract). In: Proceedings ICDM 2002, Maebashi City, Japan, December 2002, pp. 123–130. IEEE Computer Press, Los Alamitos (2002)
Google Scholar
de Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. In: Proceedings IJCAI 2001, Seattle, USA, August 2001, pp. 853–862. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings SIGKDD 1999, San Diego, USA, August 1999, pp. 43–52. ACM Press, New York (1999)
Google Scholar
Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceedings VLDB 1999, Edinburgh, UK, September 1999, pp. 223–234. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Giacommetti, A., Laurent, D., Diop, C.T.: Condensed representations for sets of mining queries. In: Proceedings KDID 2002 co-located with ECML-PKDD 2002, Helinski, FIN (August 2002), An extended version appears in this volume
Google Scholar
Goethals, B., den Bussche, J.V.: On supporting interactive association rule mining. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 307–316. Springer, Heidelberg (2000)
Chapter Google Scholar
Goethals, B., van den Bussche, J.: A priori versus a posteriori filtering of association rules. In: Proceedings SIGMOD Workshop DMKD 1999, Philadelphia, USA (May 1999)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and techniques, p. 533. Morgan Kaufmann Publishers, San Francisco (2000)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings ACM SIGMOD 2000, Dallas, Texas, USA, May 2000, pp. 1–12. ACM Press, New York (2000)
Chapter Google Scholar
Hirsh, H.: Theoretical underpinnings of version spaces. In: Proceedings IJCAI 1991, Sydney, Australia, August 1991, pp. 665–670. Morgan Kaufmann, San Francisco (1991)
Google Scholar
Hirsh, H.: Generalizing version spaces. Machine Learning 17(1), 5–46 (1994)
MATH Google Scholar
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)
Article Google Scholar
Imielinski, T., Virmani, A.: MSQL: A query language for database mining. Data Mining and Knowledge Discovery 3(4), 373–408 (1999)
Article Google Scholar
Jeudy, B.: Extraction de motifs sous contraintes: application à l’évaluation de requêtes inductives. PhD thesis, Institut National des Sciences Appliquées de Lyon, LISI, F-69621 Villeurbanne cedex, France (December 2002) (in french)
Google Scholar
Jeudy, B., Boulicaut, J.-F.: Constraint-based discovery and inductive queries: application to association rule mining. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 110–124. Springer, Heidelberg (2002)
Chapter Google Scholar
Jeudy, B., Boulicaut, J.-F.: Optimization of association rule mining queries. Intelligent Data Analysis journal 6, 341–357 (2002)
MATH Google Scholar
Jeudy, B., Boulicaut, J.-F.: Using condensed representations for interactive association rule mining. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 225–236. Springer, Heidelberg (2002)
Chapter Google Scholar
Kramer, S.: Demand-driven construction of structural features in ILP. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 132–141. Springer, Heidelberg (2001)
Chapter Google Scholar
Kramer, S., de Raedt, L.: Feature construction with version spaces for biochemical applications. In: Proceedings ICML 2001, William College, USA, July 2001, pp. 258–265. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Kramer, S., de Raedt, L., Helma, C.: Molecular feature mining in HIV data. In: Proceedings SIGKDD 2001, San Francisco, USA, August 2001, pp. 136–143. ACM Press, New York (2001)
Google Scholar
Lakshmanan, L.V., Ng, R., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proceedings SIGMOD 1999, Philadelphia, USA, pp. 157–168. ACM Press, New York (1999)
Google Scholar
Lee, S.D., de Raedt, L.: Constraint-based mining of first order sequences in SEQLOG. In: Proceedings KDID 2002 co-located with ECML-PKDD 2002, Helsinki, FIN (August 2002), An extended version appears in this volume
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings KDD 1998, New York, USA, pp. 80–86. AAAI Press, Menlo Park (1998)
Google Scholar
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Proceedings KDD 1996, Portland, USA, August 1996, pp. 189–194. AAAI Press, Menlo Park (1996)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Article Google Scholar
Masson, C., Jacquenet, F.: Mining frequent logical sequences with SPIRIT-LoG. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 166–182. Springer, Heidelberg (2003)
Chapter Google Scholar
Mellish, C.: The description identification problem. Artificial Intelligence 52(2), 151–168 (1992)
Article MATH Google Scholar
Meo, R.: Optimization of a language for data mining. In: Proceedings of the 18th Symposium on Applied Computing SAC 2003 Data Mining track, Melbourne, USA. ACM Press, New York (2003)
Google Scholar
Meo, R., Psaila, G., Ceri, S.: An extension to SQL for mining association rules. Data Mining and Knowledge Discovery 2(2), 195–224 (1998)
Article Google Scholar
Mitchell, T.: Generalization as search. Artificial Intelligence 18(2), 203–226 (1980)
Article MathSciNet Google Scholar
Moen, P.: Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining. PhD thesis, Department of Computer Science, P.O. Box 26, FIN-00014 University of Helsinki (January 2000)
Google Scholar
Nag, B., Deshpande, P.M., DeWitt, D.J.: Using a knowledge cache for interactive discovery of association rules. In: Proceedings SIGKDD 1999, San Diego, USA, August 1999, pp. 244–253. ACM Press, New York (1999)
Google Scholar
Nedellec, C., Rouveirol, C., Ade, H., Bergadano, F.: Declarative bias in inductive logic programming. In: de Raedt, L. (ed.) Advances in Logic Programming, pp. 82–103. IOS Press, Amsterdam (1996)
Google Scholar
Ng, R., Lakshmanan, L.V., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained associations rules. In: Proceedings SIGMOD 1998, Seattle, USA, pp. 13–24. ACM Press, New York (1998)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24(1), 25–46 (1999)
Article MathSciNet MATH Google Scholar
Pei, J., Dong, G., Zou, W., Han, J.: On computing condensed frequent pattern bases. In: Proceedings ICDM 2002, Maebashi City, JP, December 2002, pp. 378–385. IEEE Computer Press, Los Alamitos (2002)
Google Scholar
Pei, J., Han, J.: Constrained frequent pattern mining:a pattern-growth view. SIGKDD Explorations 4(1), 31–39 (2002)
Article Google Scholar
Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. In: Proceedings ICDE 2001, Heidelberg, D, April 2001, pp. 433–442. IEEE Computer Press, Los Alamitos (2001)
Google Scholar
Pei, J., Han, J., Mao, R.: CLOSET an efficient algorithm for mining frequent closed itemsets. In: Proceedings SIGMOD Workshop DMKD 2000, Dallas, USA (May 2000)
Google Scholar
Scheffer, T.: Finding association rules that trade support optimally against confidence. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 424–435. Springer, Heidelberg (2001)
Chapter Google Scholar
Sese, J., Morishita, S.: Answering the most correlated N association rules efficiently. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 410–422. Springer, Heidelberg (2002)
Chapter Google Scholar
Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering 4(4), 301–316 (1992)
Article Google Scholar
Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proceedings KDD 1997, Newport Beach, USA, pp. 67–73. AAAI Press, Menlo Park (1997)
Google Scholar
Toivonen, H.: Sampling large databases for association rules. In: Proceedings VLDB 1996, Mumbay, India, September 1996, pp. 134–145. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Zaki, M.J.: Generating non-redundant association rules. In: Proceedings SIGKDD 2000, Boston, USA, August 2000, pp. 34–43. ACM Press, New York (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut National des Sciences Appliquées de Lyon, LIRIS CNRS FRE 2672, Bâtiment Blaise Pascal, F-69621 cedex, Villeurbanne, France
Jean-François Boulicaut

Authors

Jean-François Boulicaut
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Informatica, Università di Torino, Italy
Rosa Meo
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy
Pier Luca Lanzi
Nokia Research Center, Nokia Group, P.O.Box 407, FIN-00045, Finland
Mika Klemettinen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Boulicaut, JF. (2004). Inductive Databases and Multiple Uses of Frequent Itemsets: The cInQ Approach. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-44497-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics