Skip to main content
Log in

Constrained pattern mining in the new era

  • Survey Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Twenty years of research on frequent itemset mining, or pattern mining, has led to the existence of a set of efficient algorithms for identifying different types of patterns, from transactional to sequential. Despite the great advances in this field, big data brought a completely new context to operate, with new challenges arising from the growth in data size, dynamics and complexity. These challenges include the shift not only from static to dynamic data, but also from tabular to complex data sources, such as social networks (expressed as graphs) and data warehouses (expressed as multi-relational models). In this new context, and more than ever, users need effective ways to control the large number of discovered patterns, and to be able to choose what patterns to consider at each time. The most accepted and common approach to minimize these drawbacks has been to capture and represent the semantics of the domain through constraints, and use them not only to reduce the number of results, but also to focus the algorithms in areas where it is more likely to gain information and return more interesting results. The use of constraints in pattern mining has been widely studied, and there are a lot of proposed types of constraints and pushing strategies. In this paper, we present a new global view of the work done on the incorporation of constraints in the pattern mining process. In particular, we propose a new framework for constrained pattern mining, that allows us to organize and analyze existing algorithms and strategies, based on the different types and properties of constraints, and on the data sources they are able to handle.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. In this work we adopt and extend the notation presented by Ng et al. [53].

  2. Prefix-monotone constraints were first proposed with the name of convertible constraints [58]. Since we can convert other constraints using several approaches (like using relaxations), we use the term prefix-monotone to designate the constraints that are convertible due to the order of items.

  3. A first draft of this algorithm was proposed in [57], with the name CFG (Constrained Frequent pattern-Growth).

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB 94). Morgan Kaufmann, San Francisco, pp 487–499

  2. Ahmed C, Tanbeer S, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721

    Article  Google Scholar 

  3. Albert-Lorincz H, Boulicaut JF (2003) Mining frequent sequential patterns under regular expressions: a highly adaptive strategy for pushing constraints. In: Proceedings of the 3rd SIAM international conference on data mining (SDM 03). Springer, San Francisco, pp 316–320

  4. Antunes C (2007) Onto4ar: a framework for mining association rules. In: Workshop on constraint-based mining and learning in the international conference on principles and practice of knowledge discovery in databases (PKDDW-CMILE 07). Springer, Warsaw, p 37

  5. Antunes C (2008) An ontology-based framework for mining patterns in the presence of background knowledge. In: Proceedings of international conference on advanced intelligence (ICAI 08). Post and Telecom Press, Beijing, pp 163–168

  6. Antunes C (2009) Mining patterns in the presence of domain knowledge. In: Proceedings of the 11th international conference on enterprise information systems (ICEIS 09). Springer, Milan, pp 188–193

  7. Antunes C (2009) Pattern mining over star schemas in the onto4ar framework. In: Proceedings of the 2009 international workshop on semantic aspects in data mining (SADM 09). IEEE Computer Society, Washington, pp 453–458

  8. Antunes C, Oliveira A (2002) Inference of sequential association rules guided by context-free grammars. In: Proceedings of 6th international conference on grammatical inference (ICGI 2002). Springer, Amsterdam, pp 289–293

  9. Antunes C, Oliveira A (2003) Generalization of pattern-growth methods for sequential pattern mining with gap constraints. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition (MLDM 03). Springer, Leipzig, pp 239–251

  10. Antunes C, Oliveira A (2005) Constraint relaxations for discovering unknown sequential patterns. In: Knowledge discovery in inductive databases: 3rd international workshop, KDID 2004 (Revised Selected and Invited Papers), pp 11–32

  11. Antunes C, Oliveira AL (2004) Sequential pattern mining with approximated constraints. In: Proceedings of IADIS international applied computing conference (AC 04). IADIS Press, Lisbon, pp 131–138

  12. Bayardo RJ (2005) The hows, whys, and whens of constraints in itemset and rule discovery. In: Proceedings of the 2004 European conference on constraint-based mining and inductive databases. Springer, Hinterzarten, pp 1–13

  13. Bayardo RJ, Agrawal R (1999) Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 99). ACM, San Diego, pp 145–154

  14. Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2003) Adaptive constraint pushing in frequent pattern mining. In: Proceedings of the 7th conference on principles and practice of knowledge discovery in databases (PKDD 03). Springer, Berlin, pp 47–58

  15. Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Exante: a preprocessing method for frequent-pattern mining. IEEE Intell Syst 20(3):25–31

    Article  Google Scholar 

  16. Boulicaut JF (2004) Inductive databases and multiple uses of frequent itemsets: the cinq approach. In: Database support for data mining applications. Springer, Berlin, pp 1–23

  17. Boulicaut JF, Jeudy B (2000) Using constraints for itemset mining: Should we prune or not? In: Actes des 16èmes Journées Bases de Données Avancées (BDA 00). Blois, France

  18. Boulicaut JF, Jeudy B (2005) Constraint-based data mining. In: Maimon O, Rokach L (eds) The data mining and knowledge discovery handbook. Springer, Berlin, pp 399–416

  19. Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. SIGMOD Rec 26(2):265–276

    Article  Google Scholar 

  20. Bucila C, Gehrke J, Kifer D, White WM (2003) Dualminer: a dual-pruning algorithm for itemsets with constraints. Data Min Knowl Discov 7(3):241–272

    Article  MathSciNet  Google Scholar 

  21. Cao L, Luo D, Zhang C (2007) Knowledge actionability: satisfying technical and business interestingness. Int J Bus Intell Data Min 2(4):496–514

    Article  Google Scholar 

  22. Capelle M, Masson C, Boulicaut JF (2002) Mining frequent sequential patterns under a similarity constraint. In: Proceedings of the third international conference on intelligent data engineering and automated learning (IDEAL 02). Springer, London, pp 1–6

  23. Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Third IEEE international conference on data mining (ICDM 03). IEEE, pp 19–26

  24. De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 08). ACM, New York, pp 204–212

  25. De Raedt L, Jaeger M, Lee S, Mannila H (2010) A theory of inductive query answering. In: Džeroski S, Goethals B, Panov P (eds) Inductive databases and constraint-based data mining. Springer, New York, pp 79–103

    Chapter  Google Scholar 

  26. De Raedt L, Kramer S (2001) The levelwise version space algorithm and its application to molecular fragment finding. In: Proceedings of the 17th international joint conference on artificial intelligence—Volume 2 (IJCAI 01). Morgan Kaufmann Publishers Inc., Seattle, pp 853–859

  27. Dong G, Li, J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 99). ACM, San Diego, pp 43–52

  28. Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1):1–16

    Article  Google Scholar 

  29. Frawley WJ, Piatetsky-Shapiro G, Matheus CJ (1992) Knowledge discovery in databases: an overview. AI Mag 13(3):57–70

    Google Scholar 

  30. Garofalakis MN, Rastogi R, Shim K (1999) Spirit: sequential pattern mining with regular expression constraints. In: Proceedings of the 25th international conference on very large data bases (VLDB 99). Morgan Kaufmann Publishers Inc., San Francisco, pp 223–234

  31. Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds) Data mining: next generation challenges and future directions. AAAI/MIT Press

  32. Grahne G, Lakshmanan LVS, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceedings of 16th international conference on data engineering, pp 512–521

  33. Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86

    Article  MathSciNet  Google Scholar 

  34. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier, Amsterdam

    Google Scholar 

  35. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD. ACM, New York, pp 1–12

  36. Jaroszewicz S, Scheffer T (2005) Fast discovery of unexpected patterns in data, relative to a bayesian network. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining (KDD 05). ACM, Chicago, pp 118–127

  37. Jaroszewicz S, Simovici DA (2004) Interestingness of frequent itemsets using bayesian networks as background knowledge. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 04). ACM, Seattle, pp 178–186

  38. Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proceedings of the 13th international conference on data engineering (ICDE 97). IEEE Computer Society, Birmingham, pp 220–231

  39. Leung CKS, Brajczuk DA (2009) Efficient algorithms for mining constrained frequent patterns from uncertain data. In: Proceedings of the 1st ACM SIGKDD workshop on knowledge discovery from uncertain data (U 09). ACM, Paris, pp 9–18

  40. Leung CKS, Hao B, Brajczuk D (2010) Mining uncertain data for frequent itemsets that satisfy aggregate constraints. In: Proceedings of the 2010 ACM symposium on applied computing (SAC 10). ACM, Sierre, pp 1034–1038

  41. Leung CKS, Khan Q (2006) Efficient mining of constrained frequent patterns from streams. In: Proceedings of the 10th international database engineering and applications symposium (IDEAS 06), vol 0. IEEE Computer Society, Delhi, pp 61–68

  42. Leung CKS, Lakshmanan L, Ng R (2002) Exploiting succinct constraints using fp-trees. SIGKDD Explor Newsl 4(1):40–49

    Article  Google Scholar 

  43. Leung CKS, Sun L (2012) A new class of constraints for constrained frequent pattern mining. In: Proceedings of the 27th annual ACM symposium on applied computing (SAC 12). ACM, Trento, pp 199–204

  44. Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217

    Article  Google Scholar 

  45. Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 1998 international conference on knowledge discovery and data mining (KDD 98). AAAI Press, New York, pp 80–86

  46. Liu H, Lin Y, Han J (2011) Methods for mining frequent items in data streams: an overview. Knowl Inf Syst 26(1):1–30

    Article  Google Scholar 

  47. Liu Y, Keng Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD 05). Springer, Berlin, pp 689–695

  48. Mabroukeh N, Ezeife C (2009) Semantic-rich markov models for web prefetching. In: Proceedings of the IEEE international conference on data mining workshops (ICDMW 09). Miami, pp 465–470

  49. Mabroukeh N, Ezeife C (2009) Using domain ontology for semantic web usage mining and next page prediction. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM 09). ACM, Hong Kong, pp 1677–1680

  50. Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases (VLDB 02). Morgan Kaufman, Hong Kong, pp 346–357

  51. Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3):241–258

    Article  Google Scholar 

  52. Mannila H, Toivonen H, Inkeri Verkamo A (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289

    Article  Google Scholar 

  53. Ng R, Lakshmanan L, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data. ACM, Seattle, pp 13–24

  54. Nijssen S, Jiménez A, Guns T (2011) Constraint-based pattern mining in multi-relational databases. In: ICDM workshops. IEEE Computer Society, Vancouver, pp 1120–1127

  55. Özden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: Proceedings of the 14th international conference on data engineering (ICDE 98). IEEE Computer Society, Washington, pp 412–421

  56. Padmanabhan B, Tuzhilin A (1998) A belief-driven method for discovering unexpected patterns. In: Proceedings of the 4th international conference on knowledge discovery in data mining (KDD 98). AAAI Press, pp 94–100

  57. Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 00). ACM, Boston, pp 350–354

  58. Pei J, Han J (2002) Constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor Newsl 4(1):31–39

    Article  Google Scholar 

  59. Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proceedings of the 17th international conference on data engineering (ICDE 01). IEEE Computer Society, Washington, pp 433–442

  60. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: mining sequential patterns by prefix-projected growth. In: Proceedings of the 17th international conference on data engineering (ICDE 01). IEEE Computer Society, Washington, pp 215–224

  61. Pei J, Han J, Wang W (2002) Mining sequential patterns with constraints in large databases. In: Proceedings of the 2002 ACM international conference on information and knowledge management (CIKM 02). McLean, pp 18–25

  62. Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160

    Article  Google Scholar 

  63. Silva A, Antunes C (2010) Pattern mining on stars with fp-growth. In: Proceedings of the 7th international conference on modeling decisions for artificial intelligence (MDAI 10). Springer, Perpignan, pp 175–186

  64. Silva A, Antunes C (2013) Pushing constraints into a pattern tree. In: Proceedings of the 10th international conference on modeling decisions for artificial intelligence (MDAI 13). Springer, Barcelona

  65. Silva A, Antunes C (2013) Pushing constraints into data streams. In: 2nd international workshop on big data, streams and heterogeneous source mining (BigMine 13). ACM, London, pp 79–86

  66. Silva A, Antunes C (2013) Towards the integration of constrained mining with star schemas. In: 13th IEEE international conference on data mining workshops—domain driven data mining (DDDM 13). IEEE Computer Society, pp 413–420

  67. Soulet A, Crmilleux B (2005) An efficient framework for mining flexible constraints. In: Ho T, Cheung D, Liu H (eds) Advances in knowledge discovery and data mining, Lecture Notes in Computer Science, vol 3518. Springer, Berlin, pp 661–671

    Chapter  Google Scholar 

  68. Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21th international conference on very large data bases (VLDB 95). Morgan Kaufmann Publishers Inc., San Francisco, pp 407–419

  69. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology (EDBT 96). Springer, London, pp 3–17

  70. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD 97). AAAI Press, California, pp 67–73

  71. Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 10). ACM, London, pp 253–262

  72. Wang K, Jiang Y, Lakshmanan LVS (2003) Mining unexpected rules by pushing user dynamics. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 03). ACM, Washington, pp 246–255

  73. Wang K, Jiang Y, Yu JX, Dong G, Han J (2005) Divide-and-approximate: a novel constraint push strategy for iceberg cube mining. IEEE Trans Knowl Data Eng 17(3):354–368

    Article  Google Scholar 

  74. Wu CW, Lin YF, Yu PS, Tseng VS (2013) Mining high utility episodes in complex event sequences. In: Proceedings of 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 13). ACM, London, pp 536–544

  75. Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(4):597–604

    Article  Google Scholar 

  76. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the fourth SIAM international conference on data mining (ICDM 04), pp 482–486

  77. Yin J, Zheng Z, Cao L (2012) Uspan: An efficient algorithm for mining high utility sequential patterns. In: Proceedings of 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 12). ACM, London, pp 660–668

  78. Yun U, Leggett JJ (2005) Wfim: Weighted frequent itemset mining with a weight range and a minimum weight. In: SDM

  79. Zaki M (2000) Sequence mining in categorical domains: Incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management (CIKM 00). ACM, McLean, pp 422–429

  80. Zhang X, Chou PL, Dong G (2007) Efficient computation of iceberg cubes by bounding aggregate functions. IEEE Trans Knowl Data Eng 19(7):903–918

    Article  Google Scholar 

  81. Zhu F, Yan X, Han J, Yu PS (2007) gprune: a constraint pushing framework for graph pattern mining. In: Proceedings of the 11th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD 07). Springer, Nanjing, pp 388–400

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreia Silva.

Additional information

This work was partially supported by FCT—Fundação para a Ciência e a Tecnologia, under Project D2PM (PTDC/EIA-EIA/110074/2009) and Ph.D. Grant SFRH/BD/64108/2009.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Silva, A., Antunes, C. Constrained pattern mining in the new era. Knowl Inf Syst 47, 489–516 (2016). https://doi.org/10.1007/s10115-015-0860-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0860-5

Keywords

Navigation