Advertisement

Journal of Intelligent Information Systems

, Volume 44, Issue 2, pp 193–221 | Cite as

Soft constraints for pattern mining

  • Willy Ugarte
  • Patrice Boizumault
  • Samir Loudni
  • Bruno Crémilleux
  • Alban Lepailleur
Article

Abstract

Constraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In practice, many constraints require threshold values whose choice is often arbitrary. This difficulty is even harder when several thresholds are required and have to be combined. Moreover, patterns barely missing a threshold will not be extracted even if they may be relevant. The paper advocates the introduction of softness into the pattern discovery process. By using Constraint Programming, we propose efficient methods to relax threshold constraints as well as constraints involved in patterns such as the top-k patterns and the skypatterns. We show the relevance and the efficiency of our approach through a case study in chemoinformatics for discovering toxicophores.

Keywords

Constraint-based pattern mining Soft constraints Soft skypatterns Constraint Programming Disjonctive relaxation Chemoinformatics 

References

  1. Bajorath, J., & Auer, J. (2006). Emerging chemical patterns: a new methodology for molecular classification and compound selection. Journal of Chemical Information and Modeling, 46, 2502–2514.CrossRefGoogle Scholar
  2. Bistarelli, S., & Bonchi, F. (2007). Soft constraint based pattern mining. Data and Knowledge Engineering, 62(1), 118–137.CrossRefGoogle Scholar
  3. Börzönyi, S., Kossmann, D., Stocker, K. (2001). The skyline operator. In Proceedings of the 17th International Conference on Data Engineering (ICDE’01) (pp. 421–430). Springer: IEEE Computer Science.CrossRefGoogle Scholar
  4. De Raedt, L., Guns, T., Nijssen, S. (2008). Constraint programming for itemset mining. In KDD’08 (pp. 204–212). ACM.Google Scholar
  5. De Raedt, L., & Zimmermann, A. (2007). Constraint-based pattern set mining. In Proceedings of the 7th SIAM international conference on data mining. Minneapolis, MN: SIAM.Google Scholar
  6. Garofalakis, M.N., Rastogi, R., Shim, K. (1999). SPIRIT: Sequential pattern mining with regular expression constraints. Proceedings of 25th international conference on very large data bases, (pp. 223–234).Google Scholar
  7. Gavanelli, M. (2002). An algorithm for multi-criteria optimization in csps. In F. van Harmelen (Ed.), ECAI (pp. 136–140). IOS Press.Google Scholar
  8. Guns, T., Nijssen, S., De Raedt, L. (2011). Itemset mining: a constraint programming perspective. Artificial Intelligence, 175(12–13), 1951–1983.CrossRefzbMATHMathSciNetGoogle Scholar
  9. Hüllermeier, E. (2005). Fuzzy methods in machine learning and data mining: status and prospects. Fuzzy Sets and Systems, 156(3), 387–406.CrossRefMathSciNetGoogle Scholar
  10. Jin, W., Han, J., Ester, M. (2004). Mining thick skylines over large databases. In PKDD’04 (pp. 255–266).Google Scholar
  11. Ke, Y., Cheng, J., Yu, J.X. (2009). Top-k correlative graph mining. In SDM (pp. 1038–1049).Google Scholar
  12. Khiari, M., Boizumault, P., Crémilleux, B. (2010). Constraint programming for mining n-ary patterns. In CP’10. LNCS (Vol. 6308, pp. 552–567). Springer.Google Scholar
  13. Kung, H.T., Luccio, F., Preparata, F.P. (1975). On finding the maxima of a set of vectors. Journal of the ACM, 22(4), 469–476. doi: 10.1145/321906.321910.CrossRefzbMATHMathSciNetGoogle Scholar
  14. Lin, X., Yuan, Y., Zhang, Q., Zhang, Y. (2007). Selecting stars: The k most representative skyline operator. In ICDE 2007 (pp. 86–95). IEEE Computer Society Press.Google Scholar
  15. Mannila, H., & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3), 241–258.CrossRefGoogle Scholar
  16. Matousek, J. (1991). Computing dominances in e. Information Processing Letter, 38(5), 277–278.CrossRefzbMATHMathSciNetGoogle Scholar
  17. Ng, R.T., Lakshmanan, V.S., Han, J., Pang, A. (1998). Exploratory mining and pruning optimizations of constrained associations rules. In Proceedings of ACM SIGMOD’98 (pp. 13–24). ACM Press.Google Scholar
  18. Novak, P.K., Lavrac, N., Webb, G.I. (2009). Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403.zbMATHGoogle Scholar
  19. Papadias, D., Tao, Y., Fu, G., Seeger, B. (2005). Progressive skyline computation in database systems. ACM Transactions on Database Systems, 30(1), 41–82.CrossRefGoogle Scholar
  20. Papadias, D., Yiu, M.L., Mamoulis, N., Tao, Y. (2008). Nearest neighbor queries in network databases. In Encyclopedia of GIS (pp. 772–776).Google Scholar
  21. Petit, T., Régin, J., Bessière, C., Puget, J. (2000). An original constraint based approach for solving over constrained problems. In CP’2000. LNCS (Vol. 1894, pp. 543–548). Springer.Google Scholar
  22. Poezevara, G., Cuissart, B., Crémilleux, B. (2011). Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs. Journal of Intelligent Information System, 37(3), 333–353.CrossRefGoogle Scholar
  23. Soulet, A., Raïssi, C., Plantevit, M., Crémilleux, B. (2011). Mining dominant patterns in the sky. In 11th IEEE Int. Conf. on Data Mining series (ICDM 2011) (pp. 655–664).Google Scholar
  24. Steuer, R.E. (1992). Multiple criteria optimization: Theory, computation and application. Radio e Svyaz, Moscow (504 pp) (in Russian)Google Scholar
  25. Tan, K.L., Eng, P.K., Ooi, B.C. (2001). Efficient progressive skyline computation. In VLDB (pp. 301–310).Google Scholar
  26. Ugarte, W., Boizumault, P., Loudni, S., Crémilleux, B. (2012). Soft threshold constraints for pattern mining. In J.G. Ganascia, P. Lenca, J.M. Petit (Eds.), Discovery science. Lecture notes in computer science (Vol. 7569, pp. 313–327). Springer.Google Scholar
  27. Wang, J., Han, J., Lu, Y., Tzvetkov, P. (2005). Tfp: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 17(5), 652–664.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Willy Ugarte
    • 1
  • Patrice Boizumault
    • 1
  • Samir Loudni
    • 1
  • Bruno Crémilleux
    • 1
  • Alban Lepailleur
    • 2
  1. 1.GREYC (CNRS UMR 6072)University of CaenCaenFrance
  2. 2.CERMN (UPRES EA 4258 - FR CNRS 3038 INC3M)University of CaenCaen CedexFrance

Personalised recommendations