Abstract
We study the problem of mining substring patterns from string databases. Patterns are selected using a conjunction of monotonic and anti-monotonic predicates. Based on the earlier introduced version space tree data structure, a novel algorithm for discovering substring patterns is introduced. It has the nice property of requiring only one database scan, which makes it highly scalable and applicable in distributed environments, where the data are not necessarily stored in local memory or disk. The algorithm is experimentally compared to a previously introduced algorithm in the same setting.
This work was supported by the EU IST FET project cInQ, contract number IST-2000-26469.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., U.S.A., pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.S.P. (eds.) Eleventh International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press, Los Alamitos (1995)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Weiner, P.: Linear pattern matching algorithm. In: Proc. 14 IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
De Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering (extended abstract). In: Kumar, V., Tsumoto, S., Zhong, N., Philip, S., Yu, X.W. (eds.) Proc. The 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi, Japan, pp. 123–130 (2002) ISBN 0-7695-1754-4
Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in HIV data. In: KDD-2001: The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001) ISBN: 158113391X
De Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI 2001: Seventeenth International Joint Conference on Artificial Intelligence (2001)
De Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering (2003) (submitted to a journal)
Lee, S.D., De Raedt, L.: An algebra for inductive query evaluation. In: [20], pp. 147–154
Grahne, G., Lakshmanan, L.V.S., Wang, X.: Efficient mining of constrained correlated sets. In: Proceedings of the 16th International Conference on Data Engineering, pp. 512–521. IEEE Computer Society, Los Alamitos (2000)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, U.S.A, pp. 1–12. ACM Press, New York (2000)
Pei, J., Han, J.: Can we push more constraints into frequent pattern mining? In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, USA (2000) ISBN: 1-58113-233-6
Fischer, J., De Raedt, L.: Towards optimizing conjunctive inductive queries. In: Proc. The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004), Carlton Crest Hotel, Sydney, Australia (2004)
Boulicaut, J.F., Jeudy, B.: Using constraints during set mining: Should we prune or not? In: Actes des Seizième Journées Bases de Données Avancées (BDA 2000), Blois, France, pp. 221–237 (2000)
Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: ExAMiner: Optimized level-wise frequent pattern mining with monotone constraints. In: [20], pp. 11–18 (2003)
Mitchell, T.M.: Generalization as search. Artificial Intelligence 18, 203–226 (1982)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Greenberg, S.: Using unix: Collected traces of 168 users. Research Report 88/333/45, Department of Computer Science, University of Calgary, Alberta, Canada (1988)
Wu, X., Tuzhilin, A., Shavlik, J.: Proceedings of The Third IEEE International Conference on Data Mining (ICDM 2003). In: Wu, X., Tuzhilin, A., Shavlik, J. (eds.) Proceedings of The Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA. Sponsored by the IEEE Computer Society, Los Alamitos (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, S.D., De Raedt, L. (2005). An Efficient Algorithm for Mining String Databases Under Constraints. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-31841-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25082-1
Online ISBN: 978-3-540-31841-5
eBook Packages: Computer ScienceComputer Science (R0)