An Efficient Algorithm for Mining String Databases Under Constraints

Lee, Sau Dan; De Raedt, Luc

doi:10.1007/978-3-540-31841-5_7

Sau Dan Lee¹⁸ &
Luc De Raedt¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3377))

Included in the following conference series:

International Workshop on Knowledge Discovery in Inductive Databases

198 Accesses
9 Citations

Abstract

We study the problem of mining substring patterns from string databases. Patterns are selected using a conjunction of monotonic and anti-monotonic predicates. Based on the earlier introduced version space tree data structure, a novel algorithm for discovering substring patterns is introduced. It has the nice property of requiring only one database scan, which makes it highly scalable and applicable in distributed environments, where the data are not necessarily stored in local memory or disk. The algorithm is experimentally compared to a previously introduced algorithm in the same setting.

This work was supported by the EU IST FET project cInQ, contract number IST-2000-26469.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)
Article Google Scholar
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., U.S.A., pp. 207–216 (1993)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.S.P. (eds.) Eleventh International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press, Los Alamitos (1995)
Chapter Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Article MATH MathSciNet Google Scholar
Weiner, P.: Linear pattern matching algorithm. In: Proc. 14 IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Google Scholar
De Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering (extended abstract). In: Kumar, V., Tsumoto, S., Zhong, N., Philip, S., Yu, X.W. (eds.) Proc. The 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi, Japan, pp. 123–130 (2002) ISBN 0-7695-1754-4
Google Scholar
Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in HIV data. In: KDD-2001: The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001) ISBN: 158113391X
Google Scholar
De Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI 2001: Seventeenth International Joint Conference on Artificial Intelligence (2001)
Google Scholar
De Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering (2003) (submitted to a journal)
Google Scholar
Lee, S.D., De Raedt, L.: An algebra for inductive query evaluation. In: [20], pp. 147–154
Google Scholar
Grahne, G., Lakshmanan, L.V.S., Wang, X.: Efficient mining of constrained correlated sets. In: Proceedings of the 16th International Conference on Data Engineering, pp. 512–521. IEEE Computer Society, Los Alamitos (2000)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, U.S.A, pp. 1–12. ACM Press, New York (2000)
Chapter Google Scholar
Pei, J., Han, J.: Can we push more constraints into frequent pattern mining? In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, USA (2000) ISBN: 1-58113-233-6
Google Scholar
Fischer, J., De Raedt, L.: Towards optimizing conjunctive inductive queries. In: Proc. The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004), Carlton Crest Hotel, Sydney, Australia (2004)
Google Scholar
Boulicaut, J.F., Jeudy, B.: Using constraints during set mining: Should we prune or not? In: Actes des Seizième Journées Bases de Données Avancées (BDA 2000), Blois, France, pp. 221–237 (2000)
Google Scholar
Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: ExAMiner: Optimized level-wise frequent pattern mining with monotone constraints. In: [20], pp. 11–18 (2003)
Google Scholar
Mitchell, T.M.: Generalization as search. Artificial Intelligence 18, 203–226 (1982)
Article MathSciNet Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Greenberg, S.: Using unix: Collected traces of 168 users. Research Report 88/333/45, Department of Computer Science, University of Calgary, Alberta, Canada (1988)
Google Scholar
Wu, X., Tuzhilin, A., Shavlik, J.: Proceedings of The Third IEEE International Conference on Data Mining (ICDM 2003). In: Wu, X., Tuzhilin, A., Shavlik, J. (eds.) Proceedings of The Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA. Sponsored by the IEEE Computer Society, Los Alamitos (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Computer Science, University of Freiburg, Germany
Sau Dan Lee & Luc De Raedt

Authors

Sau Dan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Luc De Raedt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Mathematics and computer Science Department, University of Antwerp, Middelheimlaan 1, 2020, Antwerp, Belgium
Bart Goethals
Department of Computer Science, Universiteit Utrecht,
Arno Siebes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, S.D., De Raedt, L. (2005). An Efficient Algorithm for Mining String Databases Under Constraints. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-31841-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25082-1
Online ISBN: 978-3-540-31841-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics