Constraint-Based Data Mining

Boulicaut, Jean-Francois; Jeudy, Baptiste

doi:10.1007/0-387-25465-X_18

Jean-Francois Boulicaut² &
Baptiste Jeudy³

20k Accesses
37 Citations

Abstract

Knowledge Discovery in Databases (KDD) is a complex interactive process. The promising theoretical framework of inductive databases considers this is essentially a querying process. It is enabled by a query language which can deal either with raw data or patterns which hold in the data. Mining patterns turns to be the so-called inductive query evaluation process for which constraint-based Data Mining techniques have to be designed. An inductive query specifies declara-tively the desired constraints and algorithms are used to compute the patterns satisfying the constraints in the data. We survey important results of this active research domain. This chapter emphasizes a real breakthrough for hard problems concerning local pattern mining under various constraints and it points out the current directions of research as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996.
Google Scholar
H. Albert-Lorincz and J.-F. Boulicaut. Mining frequent sequential patterns under regular expressions: a highly adaptative strategy for pushing constraints. In Proc. SIAMDM’03, pages 316–320, 2003.
Google Scholar
Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In Proc. CL 2000, volume 1861 of LNCS, pages 972–986. Springer-Verlag, 2000.
Google Scholar
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with counting inference. SIGKDD Explorations, 2(2):66–75, 2000.
Google Scholar
R. J. Bayardo. Efficiently mining long patterns from databases. In Proc. ACM SIGMOD’98, pages 85–93, 1998.
Google Scholar
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Adaptive constraint pushing in frequent pattern mining. In Proc. PKDD’03, volume 2838 of LNAI, pages 47–58. Springer-Verlag, 2003A.
Google Scholar
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Examiner: Optimized level-wise frequent pattern mining with monotone constraints. In Proc. IEEEICDMW, pages 11–18, 2003B.
Google Scholar
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Exante: Anticipated data reduction in constrained pattern mining. In Proc. PKDD’03, volume 2838 of LNAI, pages 59–70. Springer-Verlag, 2003C.
Google Scholar
F. Bonchi and C. Lucchese. On closed constrained frequent pattern mining. In Proc. IEEE ICDM’04 (In Press), 2004.
Google Scholar
J.-F. Boulicaut. Inductive databases and multiple uses of frequent itemsets: the clnQ approach. In Database Technologies for Data Mining-Discovering Knowledge with Inductive Queries, volume 2682 of LNCS, pages 1–23. Springer-Verlag, 2004.
MATH Google Scholar
J.-F. Boulicaut and A. Bykowski. Frequent closures as a concise representation for binary Data Mining. In Proc. PAKDDW, volume 1805 of LNAI, pages 62–73. Springer-Verlag, 2000.
Google Scholar
J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by mean of free-sets. In Proc. PKDD’00, volume 1910 of LNAI, pages 75–85. Springer-Verlag, 2000.
Google Scholar
J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery, 7(1):5–22, 2003.
Article MathSciNet Google Scholar
J.-F. Boulicaut and B. Jeudy. Using constraint for itemset mining: should we prune or not? In Proc. BDA’00, pages 221–237, 2000.
Google Scholar
J.-F. Boulicaut and B. Jeudy. Mining free-sets under constraints. In Proc. IEEE IDEAS’01, pages 322–329, 2001.
Google Scholar
C. Bucila, J. E. Gehrke, D. Kifer, and W. White. Dualminer: A dual-pruning algorithm for itemsets with constraints. Data Mining and Knowledge Discovery, 7(4):241–272, 2003.
Article MathSciNet Google Scholar
D. Burdickj M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. IEEE ICDE’01, pages 443–452, 2001.
Google Scholar
A. Bykowski and C. Rigotti. DBC: a condensed representation of frequent patterns for efficient mining. Information Systems, 28(8):949–977, 2003.
Article Google Scholar
T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In Proc. PKDD’02, volume 2431 of LNAI, pages 74–85. Springer-Verlag, 2002.
Google Scholar
B. Crémilleux and J.-F. Boulicaut. Simplest rules characterizing classes generated by delta-free sets. In Proc. ES 2002, pages 33–46. Springer-Verlag, 2002.
Google Scholar
L. De Raedt. A perspective on inductive databases. SIGKDD Explorations, 4(2):69–77, 2003.
Google Scholar
L. De Raedt, M. Jaeger, S. Lee, and H. Mannila. A theory of inductive query answering. In Proc. IEEE ICDM’02, pages 123–130, 2002.
Google Scholar
L. De Raedt and S. Kramer. The levelwise version space algorithm and its application to molecular fragment finding. In Proc. IJCA’01, pages 853–862, 2001.
Google Scholar
M. M. Garofalakis and R. Rastogi. Scalable Data Mining with model constraints. SIGKDD Explorations, 2(2):39–48, 2000.
Google Scholar
M. M. Garofalakis, R. Rastogi, and K. Shim. SPIRIT: Sequential pattern mining with regular expression constraints. In Proc. VLDB’99, pages 223–234, 1999.
Google Scholar
B. Goethals and M. J. Zaki, editors. Proc. of the IEEE ICDM 2003 Workshop on Frequent Itemset Mining Implementations, volume 90 of CEUR Workshop Proceedings, 2003.
Google Scholar
D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharm. Discovering all most specific sentences. ACM Transactions on Database Systems, 28(2): 140–174, 2003.
Article Google Scholar
T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communications of the ACM, 39(ll):58–64, 1996.
Article Google Scholar
B. Jeudy and J.-F. Boulicaut. Optimization of association rule mining queries. Intelligent Data Analysis, 64:341–357, 2002.
Google Scholar
D. Kifer, J. E. Gehrke, C. Bucila, and W. White. How to quickly find a witness. In Proc. ACMPODS’03, pages 272–283, 2003.
Google Scholar
S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in HIV data. In Proc. ACMSIGKDD’O1, pages 136–143, 2001.
Google Scholar
L. V. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of constrained frequent set queries with 2-variable constraints. In Proc. ACM SIGMOD’99, pages 157–168, 1999.
Google Scholar
D.-I. Lin and Z. M. Kedem. Pincer search: An efficient algorithm for discovering the maximum frequent sets. IEEE Transactions on Knowledge and Data Engineering, 14(3):553–566, 2002.
Article Google Scholar
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proc. KDD’96, pages 189–194. AAAI Press, 1996.
Google Scholar
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3): 241–258, 1997.
Article Google Scholar
C. Mellish. The description identification problem. Artificial Intelligence, 52(2): 151–168, 1992.
Article Google Scholar
R. Meo. Optimization of a language for Data Mining. In Proc. ACM SAC’03-Data Mining Track, pages 437–444, 2003.
Google Scholar
R. Meo, G. Psaila, and S. Ceri. An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, 2(2): 195–224, 1998.
Article Google Scholar
T. Mitchell. Generalization as search. Artificial Intelligence, 18(2):203–226, 1980.
Article Google Scholar
R. Ng, L. V. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proc. ACM SIGMOD’98, pages 13–24, 1998.
Google Scholar
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1):25–26, 1999.
Article MathSciNet Google Scholar
J. Pei, G. Dong, W. Zou, and J. Han. On computing condensed frequent pattern bases. In Proc. IEEE ICDM’02, pages 378–385, 2002.
Google Scholar
J. Pei, J. Han, and L. V. S. Lakshmanan. Mining frequent itemsets with convertible constraints. In Proc. IEEEICDE’O1, pages 433–442, 2001.
Google Scholar
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc. itc. ACMSIGKDD’97, pages 67–73, 1997.
Google Scholar
M. J. Zaki. Sequence mining in categorical domains: incorporating constraints. In Proc. ACM CIKM’00, pages 422–429, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

INSA Lyon, LIR1S CNRS FRE 2672 69621, Villeurbanne cedex, France
Jean-Francois Boulicaut
University of Saint-Etienne, EURISE, 42023, Saint-Etienne Cedex 2, France
Baptiste Jeudy

Authors

Jean-Francois Boulicaut
View author publications
You can also search for this author in PubMed Google Scholar
Baptiste Jeudy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Industrial Engineering, Tel-Aviv University, 69978, Ramat-Aviv, Israel
Oded Maimon & Lior Rokach &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Boulicaut, JF., Jeudy, B. (2005). Constraint-Based Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_18

Download citation

DOI: https://doi.org/10.1007/0-387-25465-X_18
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics