Constrained pattern mining in the new era

Silva, Andreia; Antunes, Cláudia

doi:10.1007/s10115-015-0860-5

Constrained pattern mining in the new era

Survey Paper
Published: 23 July 2015

Volume 47, pages 489–516, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Andreia Silva¹ &
Cláudia Antunes¹

590 Accesses
9 Citations
Explore all metrics

Abstract

Twenty years of research on frequent itemset mining, or pattern mining, has led to the existence of a set of efficient algorithms for identifying different types of patterns, from transactional to sequential. Despite the great advances in this field, big data brought a completely new context to operate, with new challenges arising from the growth in data size, dynamics and complexity. These challenges include the shift not only from static to dynamic data, but also from tabular to complex data sources, such as social networks (expressed as graphs) and data warehouses (expressed as multi-relational models). In this new context, and more than ever, users need effective ways to control the large number of discovered patterns, and to be able to choose what patterns to consider at each time. The most accepted and common approach to minimize these drawbacks has been to capture and represent the semantics of the domain through constraints, and use them not only to reduce the number of results, but also to focus the algorithms in areas where it is more likely to gain information and return more interesting results. The use of constraints in pattern mining has been widely studied, and there are a lot of proposed types of constraints and pushing strategies. In this paper, we present a new global view of the work done on the incorporation of constraints in the pattern mining process. In particular, we propose a new framework for constrained pattern mining, that allows us to organize and analyze existing algorithms and strategies, based on the different types and properties of constraints, and on the data sources they are able to handle.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends and Future Perspective Challenges in Big Data

Big Data Analytics: A Literature Review Paper

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Notes

In this work we adopt and extend the notation presented by Ng et al. [53].
Prefix-monotone constraints were first proposed with the name of convertible constraints [58]. Since we can convert other constraints using several approaches (like using relaxations), we use the term prefix-monotone to designate the constraints that are convertible due to the order of items.
A first draft of this algorithm was proposed in [57], with the name CFG (Constrained Frequent pattern-Growth).

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB 94). Morgan Kaufmann, San Francisco, pp 487–499
Ahmed C, Tanbeer S, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Article Google Scholar
Albert-Lorincz H, Boulicaut JF (2003) Mining frequent sequential patterns under regular expressions: a highly adaptive strategy for pushing constraints. In: Proceedings of the 3rd SIAM international conference on data mining (SDM 03). Springer, San Francisco, pp 316–320
Antunes C (2007) Onto4ar: a framework for mining association rules. In: Workshop on constraint-based mining and learning in the international conference on principles and practice of knowledge discovery in databases (PKDDW-CMILE 07). Springer, Warsaw, p 37
Antunes C (2008) An ontology-based framework for mining patterns in the presence of background knowledge. In: Proceedings of international conference on advanced intelligence (ICAI 08). Post and Telecom Press, Beijing, pp 163–168
Antunes C (2009) Mining patterns in the presence of domain knowledge. In: Proceedings of the 11th international conference on enterprise information systems (ICEIS 09). Springer, Milan, pp 188–193
Antunes C (2009) Pattern mining over star schemas in the onto4ar framework. In: Proceedings of the 2009 international workshop on semantic aspects in data mining (SADM 09). IEEE Computer Society, Washington, pp 453–458
Antunes C, Oliveira A (2002) Inference of sequential association rules guided by context-free grammars. In: Proceedings of 6th international conference on grammatical inference (ICGI 2002). Springer, Amsterdam, pp 289–293
Antunes C, Oliveira A (2003) Generalization of pattern-growth methods for sequential pattern mining with gap constraints. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition (MLDM 03). Springer, Leipzig, pp 239–251
Antunes C, Oliveira A (2005) Constraint relaxations for discovering unknown sequential patterns. In: Knowledge discovery in inductive databases: 3rd international workshop, KDID 2004 (Revised Selected and Invited Papers), pp 11–32
Antunes C, Oliveira AL (2004) Sequential pattern mining with approximated constraints. In: Proceedings of IADIS international applied computing conference (AC 04). IADIS Press, Lisbon, pp 131–138
Bayardo RJ (2005) The hows, whys, and whens of constraints in itemset and rule discovery. In: Proceedings of the 2004 European conference on constraint-based mining and inductive databases. Springer, Hinterzarten, pp 1–13
Bayardo RJ, Agrawal R (1999) Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 99). ACM, San Diego, pp 145–154
Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2003) Adaptive constraint pushing in frequent pattern mining. In: Proceedings of the 7th conference on principles and practice of knowledge discovery in databases (PKDD 03). Springer, Berlin, pp 47–58
Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Exante: a preprocessing method for frequent-pattern mining. IEEE Intell Syst 20(3):25–31
Article Google Scholar
Boulicaut JF (2004) Inductive databases and multiple uses of frequent itemsets: the cinq approach. In: Database support for data mining applications. Springer, Berlin, pp 1–23
Boulicaut JF, Jeudy B (2000) Using constraints for itemset mining: Should we prune or not? In: Actes des 16èmes Journées Bases de Données Avancées (BDA 00). Blois, France
Boulicaut JF, Jeudy B (2005) Constraint-based data mining. In: Maimon O, Rokach L (eds) The data mining and knowledge discovery handbook. Springer, Berlin, pp 399–416
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. SIGMOD Rec 26(2):265–276
Article Google Scholar
Bucila C, Gehrke J, Kifer D, White WM (2003) Dualminer: a dual-pruning algorithm for itemsets with constraints. Data Min Knowl Discov 7(3):241–272
Article MathSciNet Google Scholar
Cao L, Luo D, Zhang C (2007) Knowledge actionability: satisfying technical and business interestingness. Int J Bus Intell Data Min 2(4):496–514
Article Google Scholar
Capelle M, Masson C, Boulicaut JF (2002) Mining frequent sequential patterns under a similarity constraint. In: Proceedings of the third international conference on intelligent data engineering and automated learning (IDEAL 02). Springer, London, pp 1–6
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Third IEEE international conference on data mining (ICDM 03). IEEE, pp 19–26
De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 08). ACM, New York, pp 204–212
De Raedt L, Jaeger M, Lee S, Mannila H (2010) A theory of inductive query answering. In: Džeroski S, Goethals B, Panov P (eds) Inductive databases and constraint-based data mining. Springer, New York, pp 79–103
Chapter Google Scholar
De Raedt L, Kramer S (2001) The levelwise version space algorithm and its application to molecular fragment finding. In: Proceedings of the 17th international joint conference on artificial intelligence—Volume 2 (IJCAI 01). Morgan Kaufmann Publishers Inc., Seattle, pp 853–859
Dong G, Li, J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 99). ACM, San Diego, pp 43–52
Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1):1–16
Article Google Scholar
Frawley WJ, Piatetsky-Shapiro G, Matheus CJ (1992) Knowledge discovery in databases: an overview. AI Mag 13(3):57–70
Google Scholar
Garofalakis MN, Rastogi R, Shim K (1999) Spirit: sequential pattern mining with regular expression constraints. In: Proceedings of the 25th international conference on very large data bases (VLDB 99). Morgan Kaufmann Publishers Inc., San Francisco, pp 223–234
Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds) Data mining: next generation challenges and future directions. AAAI/MIT Press
Grahne G, Lakshmanan LVS, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceedings of 16th international conference on data engineering, pp 512–521
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86
Article MathSciNet Google Scholar
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier, Amsterdam
Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD. ACM, New York, pp 1–12
Jaroszewicz S, Scheffer T (2005) Fast discovery of unexpected patterns in data, relative to a bayesian network. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining (KDD 05). ACM, Chicago, pp 118–127
Jaroszewicz S, Simovici DA (2004) Interestingness of frequent itemsets using bayesian networks as background knowledge. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 04). ACM, Seattle, pp 178–186
Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proceedings of the 13th international conference on data engineering (ICDE 97). IEEE Computer Society, Birmingham, pp 220–231
Leung CKS, Brajczuk DA (2009) Efficient algorithms for mining constrained frequent patterns from uncertain data. In: Proceedings of the 1st ACM SIGKDD workshop on knowledge discovery from uncertain data (U 09). ACM, Paris, pp 9–18
Leung CKS, Hao B, Brajczuk D (2010) Mining uncertain data for frequent itemsets that satisfy aggregate constraints. In: Proceedings of the 2010 ACM symposium on applied computing (SAC 10). ACM, Sierre, pp 1034–1038
Leung CKS, Khan Q (2006) Efficient mining of constrained frequent patterns from streams. In: Proceedings of the 10th international database engineering and applications symposium (IDEAS 06), vol 0. IEEE Computer Society, Delhi, pp 61–68
Leung CKS, Lakshmanan L, Ng R (2002) Exploiting succinct constraints using fp-trees. SIGKDD Explor Newsl 4(1):40–49
Article Google Scholar
Leung CKS, Sun L (2012) A new class of constraints for constrained frequent pattern mining. In: Proceedings of the 27th annual ACM symposium on applied computing (SAC 12). ACM, Trento, pp 199–204
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Article Google Scholar
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 1998 international conference on knowledge discovery and data mining (KDD 98). AAAI Press, New York, pp 80–86
Liu H, Lin Y, Han J (2011) Methods for mining frequent items in data streams: an overview. Knowl Inf Syst 26(1):1–30
Article Google Scholar
Liu Y, Keng Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD 05). Springer, Berlin, pp 689–695
Mabroukeh N, Ezeife C (2009) Semantic-rich markov models for web prefetching. In: Proceedings of the IEEE international conference on data mining workshops (ICDMW 09). Miami, pp 465–470
Mabroukeh N, Ezeife C (2009) Using domain ontology for semantic web usage mining and next page prediction. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM 09). ACM, Hong Kong, pp 1677–1680
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases (VLDB 02). Morgan Kaufman, Hong Kong, pp 346–357
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3):241–258
Article Google Scholar
Mannila H, Toivonen H, Inkeri Verkamo A (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289
Article Google Scholar
Ng R, Lakshmanan L, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data. ACM, Seattle, pp 13–24
Nijssen S, Jiménez A, Guns T (2011) Constraint-based pattern mining in multi-relational databases. In: ICDM workshops. IEEE Computer Society, Vancouver, pp 1120–1127
Özden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: Proceedings of the 14th international conference on data engineering (ICDE 98). IEEE Computer Society, Washington, pp 412–421
Padmanabhan B, Tuzhilin A (1998) A belief-driven method for discovering unexpected patterns. In: Proceedings of the 4th international conference on knowledge discovery in data mining (KDD 98). AAAI Press, pp 94–100
Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 00). ACM, Boston, pp 350–354
Pei J, Han J (2002) Constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor Newsl 4(1):31–39
Article Google Scholar
Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proceedings of the 17th international conference on data engineering (ICDE 01). IEEE Computer Society, Washington, pp 433–442
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: mining sequential patterns by prefix-projected growth. In: Proceedings of the 17th international conference on data engineering (ICDE 01). IEEE Computer Society, Washington, pp 215–224
Pei J, Han J, Wang W (2002) Mining sequential patterns with constraints in large databases. In: Proceedings of the 2002 ACM international conference on information and knowledge management (CIKM 02). McLean, pp 18–25
Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160
Article Google Scholar
Silva A, Antunes C (2010) Pattern mining on stars with fp-growth. In: Proceedings of the 7th international conference on modeling decisions for artificial intelligence (MDAI 10). Springer, Perpignan, pp 175–186
Silva A, Antunes C (2013) Pushing constraints into a pattern tree. In: Proceedings of the 10th international conference on modeling decisions for artificial intelligence (MDAI 13). Springer, Barcelona
Silva A, Antunes C (2013) Pushing constraints into data streams. In: 2nd international workshop on big data, streams and heterogeneous source mining (BigMine 13). ACM, London, pp 79–86
Silva A, Antunes C (2013) Towards the integration of constrained mining with star schemas. In: 13th IEEE international conference on data mining workshops—domain driven data mining (DDDM 13). IEEE Computer Society, pp 413–420
Soulet A, Crmilleux B (2005) An efficient framework for mining flexible constraints. In: Ho T, Cheung D, Liu H (eds) Advances in knowledge discovery and data mining, Lecture Notes in Computer Science, vol 3518. Springer, Berlin, pp 661–671
Chapter Google Scholar
Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21th international conference on very large data bases (VLDB 95). Morgan Kaufmann Publishers Inc., San Francisco, pp 407–419
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology (EDBT 96). Springer, London, pp 3–17
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD 97). AAAI Press, California, pp 67–73
Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 10). ACM, London, pp 253–262
Wang K, Jiang Y, Lakshmanan LVS (2003) Mining unexpected rules by pushing user dynamics. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 03). ACM, Washington, pp 246–255
Wang K, Jiang Y, Yu JX, Dong G, Han J (2005) Divide-and-approximate: a novel constraint push strategy for iceberg cube mining. IEEE Trans Knowl Data Eng 17(3):354–368
Article Google Scholar
Wu CW, Lin YF, Yu PS, Tseng VS (2013) Mining high utility episodes in complex event sequences. In: Proceedings of 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 13). ACM, London, pp 536–544
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(4):597–604
Article Google Scholar
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the fourth SIAM international conference on data mining (ICDM 04), pp 482–486
Yin J, Zheng Z, Cao L (2012) Uspan: An efficient algorithm for mining high utility sequential patterns. In: Proceedings of 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 12). ACM, London, pp 660–668
Yun U, Leggett JJ (2005) Wfim: Weighted frequent itemset mining with a weight range and a minimum weight. In: SDM
Zaki M (2000) Sequence mining in categorical domains: Incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management (CIKM 00). ACM, McLean, pp 422–429
Zhang X, Chou PL, Dong G (2007) Efficient computation of iceberg cubes by bounding aggregate functions. IEEE Trans Knowl Data Eng 19(7):903–918
Article Google Scholar
Zhu F, Yan X, Han J, Yu PS (2007) gprune: a constraint pushing framework for graph pattern mining. In: Proceedings of the 11th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD 07). Springer, Nanjing, pp 388–400

Download references

Author information

Authors and Affiliations

Department of Computer Science, Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal
Andreia Silva & Cláudia Antunes

Authors

Andreia Silva
View author publications
You can also search for this author in PubMed Google Scholar
Cláudia Antunes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreia Silva.

Additional information

This work was partially supported by FCT—Fundação para a Ciência e a Tecnologia, under Project D2PM (PTDC/EIA-EIA/110074/2009) and Ph.D. Grant SFRH/BD/64108/2009.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silva, A., Antunes, C. Constrained pattern mining in the new era. Knowl Inf Syst 47, 489–516 (2016). https://doi.org/10.1007/s10115-015-0860-5

Download citation

Received: 11 April 2014
Revised: 14 March 2015
Accepted: 06 July 2015
Published: 23 July 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10115-015-0860-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained pattern mining in the new era

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Big Data Analytics: A Literature Review Paper

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constrained pattern mining in the new era

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Big Data Analytics: A Literature Review Paper

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation