Extending the Soft Constraint Based Mining Paradigm
The paradigm of pattern discovery based on constraints has been recognized as a core technique in inductive querying: constraints provide to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. So far the research on this paradigm has mainly focussed on the latter aspect: the development of efficient algorithms for the evaluation of constraint-based mining queries. Due to the lack of research on methodological issues, the constraint-based pattern mining framework still suffers from many problems which limit its practical relevance. In our previous work , we analyzed such limitations and showed how they flow out from the same source: the fact that in the classical constraint-based mining, a constraint is a rigid boolean function which returns either true or false. To overcome such limitations we introduced the new paradigm of pattern discovery based on Soft Constraints, and instantiated our idea to the fuzzy soft constraints. In this paper we extend the framework to deal with probabilistic and weighted soft constraints: we provide theoretical basis and detailed experimental analysis. We also discuss a straightforward solution to deal with top-k queries. Finally we show how the ideas presented in this paper have been implemented in a real Inductive Database system.
KeywordsAssociation Rule Frequent Itemsets Regular Language Soft Constraint Pattern Discovery
Unable to display preview. Download preview PDF.
- 1.Antunes, C., Oliveira, A.L.: Constraint relaxations for discovering unknown sequential patterns. In: Goethals, B., Siebes, A. (eds.) KDID 2004. LNCS, vol. 3377, pp. 11–32. Springer, Heidelberg (2005)Google Scholar
- 4.Besson, J., Robardet, C., Boulicaut, J.F., Rome, S.: Constraint-based concept mining and its application to microarray data analysis. Intelligent Data Analysis journal, 59–82 (2005)Google Scholar
- 8.Bonchi, F., Giannotti, F., Lucchese, C., Orlando, S., Perego, R., Trasarti, R.: ConQueSt: a constraint-based querying system for exploratory pattern discovery. In: Proceedings of The 22nd IEEE International Conference on Data Engineering, pp. 22–33. IEEE Computer Society Press, Los Alamitos (2006) Google Scholar
- 9.Bonchi, F., Lucchese, C.: Extending the state-of-the-art of constraint-based pattern discovery. Data and Knowledge Engineering (DKE) (to appear, 2006)Google Scholar
- 10.Bonchi, F., Giannotti, F., Lucchese, C., Orlando, S., Perego, R., Trasarti, R.: On interactive pattern mining from relational databases. In: KDID 2006. LNCS, vol. 4747, pp. 42–62. Springer, Heidelberg (2007)Google Scholar
- 12.Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 256–276. ACM Press, New York (1997)Google Scholar
- 15.Hilderman, R.J., Hamilton, H.J.: Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, Boston (2002)Google Scholar
- 18.Lau, A., Ong, S., Mahidadia, A., Hoffmann, A., Westbrook, J., Zrimec, T.: Mining patterns of dyspepsia symptoms across time points using constraint association rules. In: Whang, k-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 124–135. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 19.Mitasiunaite, I., Boulicaut, J.-F.: About softness for inductive querying on sequence databases. In: Proceedings 7th International Baltic Conference on Databases and Information Systems DB IS 2006, July 3-6 2006, Vilnius (Lithuania) (2006)Google Scholar
- 20.Silberschatz, A., Tuzhilin, A.: On subjective measures of interestingness. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pp. 275–281 (1995)Google Scholar
- 21.Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’2002), ACM Press, New York (2002)Google Scholar
- 22.Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2005)Google Scholar