Skip to main content
Log in

Survey on using constraints in data mining

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

This paper provides an overview of the current state-of-the-art on using constraints in knowledge discovery and data mining. The use of constraints in a data mining task requires specific definition and satisfaction tools during knowledge extraction. This survey proposes three groups of studies based on classification, clustering and pattern mining, whether the constraints are on the data, the models or the measures, respectively. We consider the distinctions between hard and soft constraint satisfaction, and between the knowledge extraction phases where constraints are considered. In addition to discussing how constraints can be used in data mining, we show how constraint-based languages can be used throughout the data mining process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The basic idea of support vector machine is to represent the decision boundary using a subset of the training examples, known as support vectors. Given a set of hyperplanes, the classifier selects one hyperplane for representing its decision boundary, based on how well they are expected to perform on the example to classify.

  2. The Euclidean distance, i.e. straight-line distance is an ineffective measure in the presence of obstacles and facilitators. An obstacle is a physical object that obstructs reachability among the individuals. On the contrary, a facilitator is a physical object that enhances reachability among people.

  3. MiniZinc is a constraint-modeling language. It is sufficiently high-level to express most constraint problems easily, but low-level enough to be mapped onto existing solvers easily and consistently (Nethercote et al. 2007). The MiningZinc language (Guns et al. 2013b) cited in this section is an extension of MiniZinc for data mining.

  4. Inductive databases extend the closure principle to the knowledge discovery field. The principle simply states that the output of a query for knowledge extraction can be the input of another query of a compatible type (Imielinski and Mannila 1996).

  5. The system automatically captures the properties of such constraints (e.g. monotonicity and anti-monotonicity), to be used directly during the extraction of the mining model.

References

  • Agrawal R, Srikant R, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216

  • Ahmed CF, Tanbeer SK, Jeong BS, Lee YK, Choi HJ (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994

    Google Scholar 

  • An A, Stefanowski J, Ramanna S, Butz CJ, Pedrycz W, Wang G (eds) (2007) Rough sets, fuzzy sets, data mining and granular computing. In: Proceedings of the 11th international conference, RSFDGrC 2007, Toronto, Canada, May 14–16, 2007, (Lecture Notes in Computer Science), vol 4482. Springer

  • Antunes C (2009) Pattern mining over star schemas in the Onto4AR framework. In: Proceedings of the IEEE international conference on data mining (ICDM) workshops, pp 453–458

  • Antunes C, Oliveira AL (2003) Sequence mining in categorical domains: Incorporating constraints. In: Proceedings of the 3th international conference on machine learning and data mining in pattern recognition (MLDM), pp 239–251

  • Antunes C, Oliveira A (2004) Constraint relaxations for discovering unknown sequential patterns. In: Proceedings of the third international workshop on knowledge discovery in inductive databases (KDID), pp 11–32

    Google Scholar 

  • Babaki B, Guns T, Nijssen S (2014) Constrained clustering using column generation. In: Simonis H (ed) Integration of AI and OR techniques in constraint programming: proceedings of the 11th international conference, CPAIOR 2014, Cork, Ireland, May 19–23, 2014. Lecture Notes in Computer Science, vol 8451, pp. 438–454. Springer. doi:10.1007/978-3-319-07046-9_31

    Google Scholar 

  • Bade K, Nürnberger A (2006) Personalized hierarchical clustering. In: IEEE/ACM international conference on web intelligence (WIC), pp 181–187

  • Bade K, Nürnberger A (2008) Creating a cluster hierarchy under constraints of a partially known hierarchy. In: Proceedings of the SIAM international conference on data mining (SDM), pp 13–24

    Google Scholar 

  • Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395

    MathSciNet  Google Scholar 

  • Banerjee A, Ghosh J (2008) Clustering with balancing constraints. Constrained clustering: advances in algorithms, theory, and applications. Chapman and Hall/CRC, Boca Raton, pp 171–200

    Google Scholar 

  • Baralis E, Garza P, Quintarelli E, Tanca L (2007) Answering XML queries by means of data summaries. ACM Trans Inf Syst J 25(3):10–16

    Google Scholar 

  • Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194:68–84

    Google Scholar 

  • Baralis E, Cerquitelli T, Chiusano S (2005) Index support for frequent itemset mining in a relational DBMS. In: Proceedings of the 21st international conference on data engineering (ICDE), pp 754–765

  • Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. In: Proceedings of the twentieth international conference on machine learning (ICML), pp 11–18

  • Basu S, Davidson I, Wagstaff KL (2008) Constrained clustering: advances in algorithms, theory, and applications. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Basu S, Banerjee A, Mooney RJ (2004a) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the Fourth SIAM international conference on data mining (SDM)

  • Basu S, Bilenko M, Mooney RJ (2004b) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 59–68

  • Bellandi A, Furletti B, Grossi V, Romei A (2007) Ontology-driven association rules extraction: a case study. In: Proceedings of the international workshop on contexts and ontologies: representation and reasoning (C&O:RR), pp 1–10

  • Bellandi A, Furletti B, Grossi V, Romei A (2008) Ontological support for association rule mining. In: Proceedings of the 26th IASTED international conference on artificial intelligence and applications (AIA), AIA ’08. ACTA Press, Anaheim, pp 110–115. http://dl.acm.org/citation.cfm?id=1712759.1712781

  • Bentayeb F, Darmont J (2002) Decision tree modeling with relational views. In: Proceedings of the 13th international symposium on foundations of intelligent systems (ISMIS), pp 423–431

    Google Scholar 

  • Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    MATH  Google Scholar 

  • Bernhardt J, Chaudhuri S, Fayyad U, Netz A (2001) Integrating data mining with SQL databases: OLE DB for data mining. In: Proceedings of the 17th international conference on data engineering (ICDE), pp 379–387

  • Bernstein A, Mannor S, Shimkin N (2010) Online classification with specificity constraints. In: Proceedings of the 24th annual conference on neural information processing systems (NIPS), pp 190–198

  • Bertsekas DP (1991) Linear network optimization: algorithms and codes. MIT Press Cambridge. http://opac.inria.fr/record=b1089011

  • Besson J, Pensa RG, Robardet C, Boulicaut JF (2006) Knowledge discovery in inductive databases: 4th international workshop, KDID 2005, Porto, Portugal, October 3, 2005, Revised selected and invited papers, chap. Constraint-based mining of fault-tolerant patterns from boolean data. Springer, Berlin Heidelberg, pp 55–71. doi:10.1007/11733492_4

    Google Scholar 

  • Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on machine learning (ICML), ICML ’04. ACM, New York, pp. 11–18. doi:10.1145/1015330.1015360

  • Bistarelli S, Montanari U, Rossi F (1997) Semiring-based constraint satisfaction and optimization. J ACM 44(2):201–236. doi:10.1145/256303.256306

    Article  MathSciNet  MATH  Google Scholar 

  • Bistarelli S, Bonchi F (2007) Soft constraint based pattern mining. Data Knowl Eng 62(1):118–137

    Google Scholar 

  • Blaszczynski J, Deng W, Hu F, Slowinski R, Szelag M, Wang G (2012) On different ways of handling inconsistencies in ordinal classification with monotonicity constraints. In: Greco S, Bouchon-Meunier B, Coletti G, Fedrizzi M, Matarazzo B, Yager RR (eds) Advances on computational intelligence: 14th international conference on information processing and management of uncertainty in knowledge-based systems, IPMU 2012, Catania, Italy, July 9–13, 2012. Proceedings, Part I, communications in computer and information science, vol 297. Springer, pp 300–309. doi:10.1007/978-3-642-31709-5_31

    Google Scholar 

  • Blaszczynski J, Slowinski R, Szelag M (2010) Probabilistic rough set approaches to ordinal classification with monotonicity constraints. In: Computational intelligence for knowledge-based systems design, 13th international conference on information processing and management of uncertainty, IPMU 2010, pp 99–108

    Google Scholar 

  • Blockeel H, Calders T, Fromont É, Goethals B, Prado A, Robardet C (2012) An inductive database system based on virtual mining views. Data Min Knowl Discov 24(1):247–287

    MATH  Google Scholar 

  • Blockeel H, Calders T, Fromont É, Goethals B, Prado A (2008a) Mining views: database views for data mining. In: Alonso G, Blakeley JA, Chen ALP (eds) Proceedings of the 24th international conference on data engineering, ICDE 2008, April 7–12, 2008, Cancún, México. IEEE computer society, pp 1608–1611. doi:10.1109/ICDE.2008.4497633

  • Blockeel H, Calders T, Fromont É, Goethals B, Prado A, Robardet C (2008b) An inductive database prototype based on virtual mining views. In: Li Y, Liu B, Sarawagi S (eds) Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, NV, August 24–27, 2008. ACM, pp 1061–1064. doi:10.1145/1401890.1402019

  • Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) ExAnte: a preprocessing method for frequent-pattern mining. IEEE Intell Syst 20(3):25–31

    Google Scholar 

  • Bonchi F, Giannotti F, Lucchese C, Orlando S, Perego R, Trasarti R (2009) A constraint-based querying system for exploratory pattern discovery. Inf Syst 34(1):3–27

    Google Scholar 

  • Bonchi F, Lucchese C (2007) Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl Eng 60(2):377–399

    Google Scholar 

  • Boulicaut J, Jeudy B (2010) Constraint-based data mining. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer, New York. doi:10.1007/978-0-387-09823-4_17

    Chapter  MATH  Google Scholar 

  • Boulicaut JF, Masson C (2005) Data mining query languages. In: Maimom O, Rokach L (eds) The data mining and knowledge discovery handbook. Springer, New York, pp 715–727

    Google Scholar 

  • Bradley PS, Bennett KP, Demiriz A (2000) Constrained k-means clustering. In: Technical report, MSR-TR-2000-65, Microsoft Research

  • Brunner C, Fischer A, Luig K, Thies T (2012) Pairwise support vector machines and their application to large scale problems. J Mach Learn Res 13(1): 2279–2292. http://dl.acm.org/citation.cfm?id=2503308.2503316

  • Bucilă C, Gehrke J, Kifer D, White W (2003) DualMiner: a dual-pruning algorithm for itemsets with constraints. Data Min Knowl Discov 7(3):241–272

    MathSciNet  Google Scholar 

  • Bult JR, Wansbeek TJ (1995) Optimal selection for direct mail. Market Sci 14(4):378–394

    Google Scholar 

  • Capelle M, Masson C, Boulicaut J (2003) Mining frequent sequential patterns under regular expressions: a highly adaptive strategy for pushing constraints. In: Proceedings of the third SIAM international conference on data mining (SDM), pp 316–320

  • Cerf L, Besson J, Robardet C, Boulicaut J (2009) Closed patterns meet n-ary relations. ACM Trans Knowl Discov Data (TKDD). doi:10.1145/1497577.1497580

    Article  Google Scholar 

  • Cerf L, Besson J, Nguyen K, Boulicaut J (2013) Closed and noise-tolerant patterns in n-ary relations. Data Min Knowl Discov 26(3):574–619. doi:10.1007/s10618-012-0284-8

    Article  MathSciNet  MATH  Google Scholar 

  • Ceri S, Meo R, Psaila G (1998) An extension to SQL for mining association rules. Data Min Knowl Discov 2(2):195–224. doi:10.1023/A:1009774406717

    Article  Google Scholar 

  • Chand C, Thakkar A, Ganatra A (2012a) Sequential pattern mining: survey and current research challenges. Int J Soft Comput Eng (IJSCE) 2(1):2231–2307

    Google Scholar 

  • Chand C, Thakkar A, Ganatra A (2012b) Target oriented sequential pattern mining using recency and monetary constraints. Int J Comput Appl 45(10):12–18

    Google Scholar 

  • Chang JH (2011) Mining weighted sequential patterns in a sequence database with a time-interval weight. Knowl Based Syst 24(1):1–9

    MathSciNet  Google Scholar 

  • Chen E, Cao H, Li Q, Qian T (2008) Efficient strategies for tough aggregate constraint-based sequential pattern mining. Inf Sci 178(6):1498–1518

    MathSciNet  MATH  Google Scholar 

  • Chen YL, Kuo MH, yi Wu S, Tang K (2009) Discovering recency, frequency, and monetary (RFM) sequential patterns from customers’ purchasing data. Electron Commer Res Appl 8(5):241–251

    Google Scholar 

  • Coleman T, Saunderson J, Wirth A (2008) Spectral clustering with inconsistent advice. In: Proceedings of the twenty-fifth international conference on machine learning (ICML), pp 152–159

  • Costa JA, Iii AOH (2005) Classification constrained dimensionality reduction. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 1077–1080

  • Dao TBH, Duong KC, Vrain C (2013) A declarative framework for constrained clustering. In: Blockeel H, Kersting K, Nijssen S, Zelezn F (eds) ECML/PKDD (3), Lecture Notes in Computer Science, vol 8190. Springer, pp 419–434. doi:10.1007/978-3-642-40994-3

    Google Scholar 

  • Dao TBH, Duong KC, Vrain C (2015) Constrained minimum sum of squares clustering by constraint programming. In: Proceedings of the 21st international conference on principles and practice of constraint programming (CP 2015). Cork, Ireland, pp 557–573. https://hal.archives-ouvertes.fr/hal-01168193

    Google Scholar 

  • Davidson I, Ravi SS (2005a) Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Knowledge discovery in databases: PKDD 2005, 9th European conference on principles and practice of knowledge discovery in databases (PKDD), pp 59–70

    Google Scholar 

  • Davidson I, Ravi SS (2005b) Clustering with constraints: feasibility issues and the \(k\)-means algorithm. In: Kargupta H, et al. (eds) Proceedings of the 2005 SIAM international conference on data mining, pp 138–149. doi:10.1137/1.9781611972757.13

  • Davidson I, Ravi SS (2006) Identifying and generating easy sets of constraints for clustering. In: Proceedings of the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference (AAAI), pp 336–341

  • Davidson I, Ravi SS (2007) The complexity of non-hierarchical clustering with instance and cluster level constraints. Data Min Knowl Discov 14(1):25–61

    MathSciNet  Google Scholar 

  • Davidson I, Ravi SS (2009) Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Min Knowl Discov 18(2):257–282

    MathSciNet  Google Scholar 

  • Davidson I, Wagstaff K, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: Knowledge discovery in databases: PKDD 2006, 10th European conference on principles and practice of knowledge discovery in databases (PKDD), pp 115–126

    Google Scholar 

  • Dawson S, di Vimercati SDC, Samarati P (1999) Specification and enforcement of classification and inference constraints. In: IEEE symposium on security and privacy, pp 181–195

  • De Raedt L, Guns T, Nijssen S (2010) Constraint programming for data mining and machine learning. In: Fox M, Poole D (eds) Proceedings of the twenty-fourth AAAI conference on artificial intelligence, AAAI 2010, Atlanta, July 11–15, 2010. AAAI Press, pp 1671–1675. http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1837

  • De Raedt L (2002) A perspective on inductive databases. SIGKDD Explor 4(2):69–77. doi:10.1145/772862.772871

    MathSciNet  Google Scholar 

  • Demiriz A, Bennett KP, Bradley PS (2008) Using assignment constraints to avoid empty clusters in k-means clustering. Constrained clustering: advances in algorithms, theory, and applications. Chapman and Hall/CRC, Boca Raton, pp 201–220

    Google Scholar 

  • Druck G, Mann GS, McCallum A (2008) Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 595–602

  • Duivesteijn W, Feelders A (2008) Nearest neighbour classification with monotonicity constraints. Mach Learn Knowl Discov Databases Eur Conf ECML/PKDD 2008:301–316

    Google Scholar 

  • Dzeroski S, Goethals B, Panov P (2010) Inductive databases and constraint-based data mining. Springer, New York

    MATH  Google Scholar 

  • Euler T, Klinkenberg R, Mierswa I, Scholz M, Wurst M (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 935–940

  • Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874

    MathSciNet  Google Scholar 

  • Fiot C, Laurent A, Teisseire M (2009) Softening the blow of frequent sequence analysis: soft constraints and temporal accuracy. Int J Web Eng Technol 5(1):24–47

    Google Scholar 

  • Fromont É, Blockeel H, Struyf J (2006) Integrating decision tree learning into inductive databases. In: Proceedings of the 5th international workshop on knowledge discovery in inductive databases (KDID), pp 81–96

  • Fu Y, Han J (1995) Meta-rule-guided mining of association rules in relational databases. In: Proceedings of the post-conference workshops on integration of knowledge discovery in databases with deductive and object-oriented databases (KDOOD/TDOOD), pp 39–46

  • Fu Y, Han J, Koperski K, Wang W, Zaiane O (1996) DMQL: a data mining query language for relational databases. In: Proceedings of the first workshop on research issues in data mining and knowledge discovery (DMKD), pp 122–133

  • Garofalakis MN, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceedings of 25th international conference on very large data bases (VLDB), pp 223–234

  • Garofalakis MN, Hyun D, Rastogi R, Shim K (2003) Building decision trees with constraints. Data Min Knowl Discov 7(2):187–214

    MathSciNet  Google Scholar 

  • Giannotti F, Nanni M, Pedreschi D (2000) Logic-based knowledge discovery in databases. In: Proceedings of tenth European–Japanese conference on information modelling and knowledge bases (EJC), pp 279–283

  • Gilpin S, Davidson I (2011) Incorporating SAT solvers into hierarchical clustering algorithms: an efficient and flexible approach. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1136–1144

  • Grossi V, Monreale A, Nanni M, Pedreschi D, Turini F (2015) Software engineering and formal methods: SEFM 2015 collocated workshops: ATSE, HOFM, MoKMaSD, and VERY*SCART, York, UK, September 7–8, 2015. Revised selected papers, chap. clustering formulation using constraint optimization. Springer, Berlin, Heidelberg, pp 93–107. doi:10.1007/978-3-662-49224-6_9

    Google Scholar 

  • Grossi V, Romei A (2012) XQuake as a constraint-based mining language. In: Proceedings of the ECAI 2012 workshop on combining constraint solving with mining and learning (CoCoMile), pp 90–91

  • Gu W, Chen B, Hu J (2010) Combining binary-svm and pairwise label constraints for multi-label classification. In: Proceedings of the IEEE international conference on systems, man and cybernetics (SMC), pp 4176–4181

  • Guns T, Nijssen S, De Raedt L (2011) Itemset mining: a constraint programming perspective. Artif Intell 175(12–13):1951–1983

    MathSciNet  MATH  Google Scholar 

  • Guns T, Nijssen S, De Raedt L (2013) k-Pattern set mining under constraints. IEEE Trans Knowl Data Eng 25(2):402–418

    Google Scholar 

  • Guns T, Dries A, Tack G, Nijssen S, De Raedt L (2013a) Miningzinc: a modeling language for constraint-based mining. In: Rossi F (ed) IJCAI 2013, proceedings of the 23rd international joint conference on artificial intelligence, Beijing, China, August 3–9, 2013. IJCAI/AAAI. http://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/view/6947

  • Guns T, Dries A, Tack G, Nijssen S, De Raedt L (2013b) The miningzinc framework for constraint-based itemset mining. In: Ding W, Washio T, Xiong H, Karypis G, Thuraisingham BM, Cook DJ, Wu X (eds) 13th IEEE international conference on data mining workshops, ICDM workshops, TX, December 7–10, 2013. IEEE computer society, pp 1081–1084. doi:10.1109/ICDMW.2013.38

  • Han J, Lakshmanan LVS, Ng RT (1999) Constraint-based multidimensional data mining. IEEE Comput 32(8):46–50

    Google Scholar 

  • Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86

    MathSciNet  Google Scholar 

  • Han J, Fu Y (1999) Mining multiple-level association rules in large databases. IEEE Trans Knowl Data Eng 11(5):798–805

    Google Scholar 

  • Han J, Kamber M (2012) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Hansen P, Aloise D (2009) A survey on exact methods for minimum sum-of-squares clustering, pp 1–2. http://www.math.iit.edu/Buck65files/msscStLouis.pdf

  • Har-Peled S, Roth D, Zimak D (2002) Constraint classification: a new approach to multiclass classification. In: Proceedings of the 13th international conference algorithmic learning theory (ALT), pp 365–379

    Google Scholar 

  • Hirate Y, Yamana H (2006) Generalized sequential pattern mining with item intervals. J Comput 1(3):51–60

    Google Scholar 

  • Hu YH, Kao YH (2011) Mining sequential patterns with consideration to recency, frequency, and monetary. In: Proceedings of the Pacific Asia conference on information systems (PACIS), pp 78–91

  • Hu YH, Yen TW (2010) Considering RFM-values of frequent patterns in transactional databases. In: Proceedings of the 2th international conference on software engineering and data mining (SEDM), pp 422–427

  • Hwang JH, Gu MS (2014) Ontology based service frequent pattern mining. Future Inf Technol 309:809–814. doi:10.1007/978-3-642-55038-6-123

    Article  Google Scholar 

  • Imielinski T, Mannila H (1996) A database perspective on knowledge discovery. Commun ACM 39(11):58–64

    Google Scholar 

  • Imielinski T, Virmani A (1999) MSQL: a query language for database mining. Data Min Knowl Discov 2(4):373–408

    Google Scholar 

  • Jeudy B, Boulicaut JF (2002) Optimization of association rule mining queries. Intell Data Anal 6(4):341–357

    MATH  Google Scholar 

  • Kestler H, Kraus J, Palm G, Schwenker F (2006) On the effects of constraints in semi-supervised hierarchical clustering. In: Schwenker F, Marinai S (eds) Artificial neural networks in pattern recognition, vol 4087., Lecture notes in computer scienceSpringer, Berlin, heidelberg, pp 57–66

    Google Scholar 

  • Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the nineteenth international conference on machine learning, ICML ’02. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 307–314. http://dl.acm.org/citation.cfm?id=645531.655989

  • Kumar N, Kummamuru K (2008) Semisupervised clustering with metric learning using relative comparisons. IEEE Trans Knowl Data Eng 20(4):496–503

    Google Scholar 

  • Kummamuru K, Krishnapuram R, Agrawal R (2004) Learning spatially variant dissimilarity (svad) measures. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 611–616

  • Lakshmanan LVS, Ng R, Han J, Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. ACM SIGMOD Rec 28(2):157–168. doi:10.1145/304181.304196

    Article  Google Scholar 

  • Lange TCMH, Anil L, Jain K, Buhmann JM (2005) Learning with constrained and unlabeled data. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 731–738

  • Law MHC, Topchy AP, Jain AK (2005) Model-based clustering with probabilistic constraints. In: Kargupta et al., pp 641–645. doi:10.1137/1.9781611972757.77

  • Law MHC, Topchy A, Jain AK (2004) Structural, syntactic, and statistical pattern recognition: joint IAPR international workshops, SSPR 2004 and SPR 2004, Lisbon, Portugal, August 18–20, 2004. Proceedings, chap. Clustering with soft and group constraints. Springer, Berlin, Heidelberg, pp 662–670. doi:10.1007/978-3-540-27868-9_72

    Google Scholar 

  • Law Y, Wang H, Zaniolo C (2004) Query languages and data models for database sequences and data streams. In: Proceedings of the 30th international conference on very large data bases (VLDB), pp 492–503

    Google Scholar 

  • Leung CKS, Hao B, Brajczuk DA (2010) Mining uncertain data for frequent itemsets that satisfy aggregate constraints. In: Proceedings of the 2010 ACM symposium on applied computing (SAC), pp 1034–1038

  • Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217

    Google Scholar 

  • Li Z, Liu J, Tang X (2009) Constrained clustering via spectral regularization. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 421–428

  • Lin MY, Hsueh SC, Chang CW (2008) Fast discovery of sequential patterns in large databases using effective time-indexing. Inf Sci 178(22):4228–4245

    MathSciNet  MATH  Google Scholar 

  • Lin MY, Lee SY (2005) Efficient mining of sequential patterns with time constraints by delimited pattern growth. Knowl Inf Syst 7(4):499–514

    Google Scholar 

  • Liu EY, Zhang Z, Wang W (2011) Clustering with relative constraints. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 947–955

  • Lu Z, Carreira-Perpiñán MÁ (2008) Constrained spectral clustering through affinity propagation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 1–8

  • Lucey S, Ashraf AB (2013) Nearest neighbor classifier generalization through spatially constrained filters. Pattern Recognit 46(1):325–331. doi:10.1016/j.patcog.2012.06.009

    Article  MATH  Google Scholar 

  • Lu Z, Leen TK (2007) Penalized probabilistic clustering. Neural Comput 19(6):1528–1567. doi:10.1162/neco.2007.19.6.1528

    Article  MathSciNet  MATH  Google Scholar 

  • Mabroukeh NR, Ezeife CI (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43(1):1–3

    Google Scholar 

  • Mansingh G, Osei-Bryson KM, Reichgelt H (2011) Using ontologies to facilitate post-processing of association rules by domain experts. Inf Sci 181(3):419–434

    Google Scholar 

  • Marinica C, Guillet F (2010a) Knowledge-based interactive postmining of association rules using ontologies. IEEE Trans Knowl Data Eng 22(6):784–797

    Google Scholar 

  • Marinica C, Guillet F (2010) Knowledge-based interactive postmining of association rules using ontologies. IEEE Trans Knowl Data Eng 22(6):784–797. doi:10.1109/TKDE.2010.29

    Article  Google Scholar 

  • Marriott K, Nethercote N, Rafeh R, Stuckey PJ, de la Banda MG, Wallace M (2008) The design of the zinc modelling language. Constraints 13(3):229–267. doi:10.1007/s10601-008-9041-4

    Article  MathSciNet  MATH  Google Scholar 

  • Masseglia F, Poncelet P, Teisseire M (2009) Efficient mining of sequential patterns with time constraints: reducing the combinations. Expert Syst Appl 36(2):2677–2690

    Google Scholar 

  • Masson C, Robardet C, Boulicaut J (2004) Optimizing subset queries: a step towards sql-based inductive databases for itemsets. In: Haddad H, Omicini A, Wainwright RL, Liebrock LM (eds) Proceedings of the 2004 ACM symposium on applied computing (SAC), Nicosia, Cyprus, March 14–17, 2004. ACM, pp 535–539. doi:10.1145/967900.968013

  • Meo R, Psaila G, Ceri S (1998) An extension to SQL for mining association rules. Data Min Knowl Disc 2(2):195–224. doi:10.1023/A:1009774406717

    Article  Google Scholar 

  • Meo R, Psaila G (2006) An XML-based database for knowledge discovery. In: Proceedings of the 10th international conference on extending database technology (EDBT), pp 814–828

    Google Scholar 

  • Morzy T, Zakrzewicz M (1997) SQL-like language for database mining. In: Proceedings of the first east-European symposium on advances in databases and information systems (ADBIS), pp 331–317

  • Nethercote N, Stuckey PJ, Becket R, Brand S, Duck GJ, Tack G (2007) Minizinc: towards a standard CP modelling language. In: Proceedings of the 13th international conference on principles and practice of constraint programming, CP’07. Springer, Berlin, Heidelberg, pp 529–543. http://dl.acm.org/citation.cfm?id=1771668.1771709

  • Nguyen N, Caruana R (2008) Improving classification with pairwise constraints: a margin-based approach. In: Daelemans W, Goethals B, Morik K (eds) ECML/PKDD (2), Lecture Notes in Computer Science, vol 5212. Springer, pp 113–124. http://dblp.uni-trier.de/db/conf/pkdd/pkdd2008-2.html#NguyenC08

  • Nijssen S, Fromont E (2007) Mining optimal decision trees from itemset lattices. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 530–539

  • Nijssen S, Fromont E (2010) Optimal constraint-based decision tree induction from itemset lattices. Data Min Knowl Disc 21(1):9–51. doi:10.1007/s10618-010-0174-x

    Article  MathSciNet  Google Scholar 

  • Niyogi P, Pierrot JB, Siohan O (2000) Multiple classifiers by constrained minimization. In: Proceedings of the acoustics, speech, and signal processing, 2000. On IEEE international conference, vol 06, ICASSP ’00. IEEE Computer Society, Washington, DC, pp 3462–3465. doi:10.1109/ICASSP.2000.860146

  • Okabe M, Yamada S (2012) Clustering by learning constraints priorities. In: Proceedings of the 12th international conference on data mining (ICDM), pp 1050–1055

  • Park SH, Fürnkranz J (2008) Multi-label classification with label constraints. In: Technical report, knowledge engineering group, TU Darmstadt

  • Pei J, Han J, Lakshmanan LVS (2004) Pushing convertible constraints in frequent itemset mining. Data Min Knowl Disc 8(3):227–252

    MathSciNet  Google Scholar 

  • Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern growth methods. Inf Sci 28(2):133–160

    Google Scholar 

  • Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 350–354

  • Pinto H, Han J, Pei J, Wang K, Chen Q, Dayal U (2001) Multi-dimensional sequential pattern mining. In: Proceedings of the 2001 ACM CIKM international conference on information and knowledge management (CIKM), pp 81–88

  • Plantevit M, Laurent A, Laurent D, Teisseire M, Choong YW (2010) Mining multidimensional and multilevel sequential patterns. Trans Knowl Discov Data 4(1):4

    Google Scholar 

  • Pyle D (1999) Data preparation for data mining. Morgan Kaufmann Publishers Inc., San francisco

    Google Scholar 

  • Richter L, Wicker J, Kessler K, Kramer S (2008) An inductive database and query language in the relational model. In: Proceedings of the 11th international conference on extending database technology (EDBT), pp 740–744

  • Rigollet P, Tong X (2011a) Neyman-pearson classification, convexity and stochastic constraints. J Mach Learn Res 12:2831–2855

    MathSciNet  MATH  Google Scholar 

  • Rigollet P, Tong X (2011b) Neyman-pearson classification under a strict constraint. Proc Track J Mach Learn Res 19:595–614

    Google Scholar 

  • Romei A, Ruggieri S, Turini F (2006) KDDML: a middleware language and system for knowledge discovery in databases. Data Knowl Eng 57(2):179–220. doi:10.1016/j.datak.2005.04.007

    Article  Google Scholar 

  • Romei A, Turini F (2011) Programming the KDD process using XQuery. In: Proceedings of the international conference on knowledge discovery and information retrieval (KDIR), pp 131–139

  • Romei A, Turini F (2010) XML data mining. Softw Pract Exp 40(2):101–130. doi:10.1002/spe.944

    Article  Google Scholar 

  • Romei A, Turini F (2011) Inductive database languages: requirements and examples. Knowl Inf Syst 26(3):351–384

    Google Scholar 

  • Ruiz C, Spiliopoulou M, Ruiz EM (2010) Density-based semi-supervised clustering. Data Min Knowl Disc 21(3):345–370

    MathSciNet  Google Scholar 

  • Sarawagi S, Thomas S, Agrawal R (2000) Integrating association rule mining with relational database systems: alternatives and implications. Data Min Knowl Disc 4(2/3):89–125

    Google Scholar 

  • Schultz M, Joachims T (2003) Learning a distance metric from relative comparisons. In: Thrun S, Saul LK, Schölkopf B (eds) Proceeding of advances in neural information processing systems (NIPS), December 8–13, 2003, Vancouver and Whistler, British Columbia. MIT Press, pp 41–48. http://papers.nips.cc/paper/2366-learning-a-distance-metric-from-relative-comparisons

  • Shankar S (2009) Utility sentient frequent itemset mining and association rule mining: a literature survey and comparative study. Int J Soft Comput Appl 4:81–95

    Google Scholar 

  • Small K, Wallace BC, Brodley CE, Trikalinos TA (2011) The constrained weight space SVM: learning with ranked features. In: Proceedings of the 28th international conference on machine learning (ICML), pp 865–872

  • Soulet A, Crémilleux B (2005) Optimizing constraint-based mining by automatically relaxing constraints. In: Proceedings of the 5th IEEE international conference on data mining (ICDM), 27–30 November 2005, Houston. IEEE Computer Society, pp 777–780. doi:10.1109/ICDM.2005.112

  • Soulet A, Crémilleux B (2009) Mining constraint-based patterns using automatic relaxation. Intell Data Anal 13(1):109–133

    Google Scholar 

  • Soulet A, Crémilleux B, Plantevit M (2011) Summarizing contrasts by recursive pattern mining. In: Spiliopoulou M, Wang H, Cook DJ, Pei J, Wang W, Zaïane OR, Wu X (eds) Data mining workshops (ICDMW), 2011 IEEE 11th international conference on, Vancouver, December 11, 2011. IEEE Computer Society, pp 1155–1162. doi:10.1109/ICDMW.2011.161

  • Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21th conference on very large data bases (VLDB), pp 407–419

  • Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology (EDBT), pp 3–17

    Google Scholar 

  • Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the third international conference on knowledge discovery and data mining (KDD), pp 67–73

  • Sriphaew K, Theeramunkong T (2002) A new method for finding generalized frequent itemsets in generalized association rule mining. In: Proceedings of the 7th IEEE symposium on computers and communications (ISCC), pp 1040–1045

  • Strehl A, Ghosh J (2003) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Comput 15(2):208–230

    MATH  Google Scholar 

  • Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Addison Wesley, Boston

    Google Scholar 

  • Tao F, Murtagh F (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 661–666

  • Trasarti R, Bonchi F, Goethals B (2008) Sequence mining automata: a new technique for mining frequent sequences under regular expressions. In: Proceedings of the 8th IEEE international conference on data mining (ICDM), pp 1061–1066

  • Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. doi:10.1109/TKDE.2012.59

    Article  Google Scholar 

  • Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6:1453–1484

    MathSciNet  MATH  Google Scholar 

  • Vanderlooy S, Sprinkhuizen-Kuyper IG, Smirnov EN, van den Herik HJ (2009) The roc isometrics approach to construct reliable classifiers. Intell Data Anal 13(1):3–37. http://dl.acm.org/citation.cfm?id=1551758.1551760

    Google Scholar 

  • Vens C, Struyf J, Schietgat L, Dzeroski S, Blockeel H (2008) Decision trees for hierarchical multi-label classification. Mach Learn 73(2):185–214

    Google Scholar 

  • von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416. doi:10.1007/s11222-007-9033-z

    Article  MathSciNet  Google Scholar 

  • Vu V, Labroche N, Bouchon-Meunier B (2010) An efficient active constraint selection algorithm for clustering. In: 20th international conference on pattern recognition, ICPR 2010, Istanbul, Turkey, 23–26 August 2010. IEEE Computer Society, pp 2969–2972. doi:10.1109/ICPR.2010.727

  • Wagstaff K, Basu S, Davidson I (2006) When is constrained clustering beneficial, and why? In: Proceedings, the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference (AAAI)

  • Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the seventeenth national conference on artificial intelligence and twelfth conference on Innovative applications of artificial intelligence (AAAI/IAAI), pp 1103–1110

  • Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, ICML ’01. Morgan Kaufmann Publishers Inc., San Francisco, pp 577–584. http://dl.acm.org/citation.cfm?id=645530.655669

  • Wang K, Jiang Y, Yu JX, Dong G, Han J (2005) Divide-and-approximate: a novel constraint push strategy for iceberg cube mining. IEEE Trans Knowl Data Eng 17(3):354–368

    Google Scholar 

  • Wang X, Rostoker C, Hamilton HJ (2012) A density-based spatial clustering for physical constraints. J Intell Inf Syst 38(1):269–297

    Google Scholar 

  • Wang X, Qian B, Davidson I (2014) On constrained spectral clustering and its applications. Data Min Knowl Disc 28(1):1–30. doi:10.1007/s10618-012-0291-9

    Article  MathSciNet  MATH  Google Scholar 

  • Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 563–572

  • Wang F, Ding CHQ, Li T (2009) Integrated kl (k-means—laplacian) clustering: a new clustering approach by combining attribute data and pairwise relations. In: Proceedings of the SIAM international conference on data mining (SDM), pp 38–48

    Google Scholar 

  • Wang W, Yang J, Yu PS (2000) Efficient mining of weighted association rules (WAR). In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 270–274

  • Wei JT, Lin SY, Wu HH (2010) A review of the application of rfm model. Afr J Bus Manag 4(19):4199–4206

    Google Scholar 

  • Witten IH, Frank E, Hall M (2011) Data mining, pratical machine learning tools and techiniques, 3rd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Wu CM, Huang YF (2011) Generalized association rule mining using an efficient data structure. Expert Syst Appl 38(6):7277–7290

    Google Scholar 

  • Xing EP, Ng AY, Jordan MI, Russell SJ (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems (NIPS), pp 505–512

  • Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning, with application to clustering with side-information. Advances in neural information processing systems 15. MIT Press, Cambridge

    Google Scholar 

  • Yan R, Zhang J, Yang J, Hauptmann AG (2006) A discriminative learning framework with pairwise constraints for video object classification. IEEE Trans Pattern Anal Mach Intell 28(4):578–593. doi:10.1109/TPAMI.2006.65

    Article  Google Scholar 

  • Yan W, Goebel KF (2004) Designing classifier ensembles with constrained performance requirements. In: Proceedings of the SPIE defense security symposium, multisensor multisource information fusion: architectures, algorithms, and applications (2004), pp 78-87

  • Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the fourth SIAM international conference on data mining (SDM), pp 482–486

  • Yun U (2008) A new framework for detecting weighted sequential patterns in large sequence databases. Knowl Based Syst 21(2):110–122

    Google Scholar 

  • Yun U, Shin H, Ryu KH, Yoon E (2012) An efficient mining algorithm for maximal weighted frequent patterns in transactional databases. Knowl Based Syst 33:53–64

    Google Scholar 

  • Yun U, Leggett JJ (2005) WFIM: weighted frequent itemset mining with a weight range and a minimum weight. In: Kargupta et al., pp 636–640. doi:10.1137/1.9781611972757.76

  • Yun U, Ryu KH (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Int J Inf Technol Decis Mak 9(4):575–599

    MATH  Google Scholar 

  • Yun U, Ryu KH (2011) Approximate weighted frequent pattern mining with/without noisy environments. Knowl Based Syst 24(1):73–82

    Google Scholar 

  • Zaidan O, Eisner J (2008) Modeling annotators: a generative approach to learning from annotator rationales. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 31–40

  • Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management (CIKM), pp 422–429

  • Zhang J, Yan R (2007) On the value of pairwise constraints in classification and consistency. In: Proceedings of the 24th international conference on machine learning, ICML ’07. ACM, New York, pp 1111–1118. doi:10.1145/1273496.1273636

  • Zhang C, Zhang S (2002) Association rule mining, models and algorithms, lecture notes in computer science. Springer, New York

  • Zhang Y, Zhang L, Nie G, Shi Y (2009) A survey of interestingness measures for association rules. In: Proceedings of the second international conference on business intelligence and financial engineering, (BIFE), pp 460–463

  • Zhong S, Ghosh J (2003) Scalable, balanced model-based clustering. In: Proceedings of the third SIAM international conference on data mining (SDM), San Francisco, pp 71–82

Download references

Acknowledgments

This work was supported by the European Commission under the project “Inductive Constraint Programming (ICON)” contract number FP7-284715, and by a Grant for “Big Data Social Mining” of the University of Pisa. We warmly thank the anonymous referees for their very valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valerio Grossi.

Additional information

Responsible editor: Kristian Kersting.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grossi, V., Romei, A. & Turini, F. Survey on using constraints in data mining. Data Min Knowl Disc 31, 424–464 (2017). https://doi.org/10.1007/s10618-016-0480-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0480-z

Keywords

Navigation