Skip to main content

An Efficient Algorithm for Mining String Databases Under Constraints

  • Conference paper
Knowledge Discovery in Inductive Databases (KDID 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3377))

Included in the following conference series:

Abstract

We study the problem of mining substring patterns from string databases. Patterns are selected using a conjunction of monotonic and anti-monotonic predicates. Based on the earlier introduced version space tree data structure, a novel algorithm for discovering substring patterns is introduced. It has the nice property of requiring only one database scan, which makes it highly scalable and applicable in distributed environments, where the data are not necessarily stored in local memory or disk. The algorithm is experimentally compared to a previously introduced algorithm in the same setting.

This work was supported by the EU IST FET project cInQ, contract number IST-2000-26469.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)

    Article  Google Scholar 

  2. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., U.S.A., pp. 207–216 (1993)

    Google Scholar 

  3. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.S.P. (eds.) Eleventh International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press, Los Alamitos (1995)

    Chapter  Google Scholar 

  4. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  5. Weiner, P.: Linear pattern matching algorithm. In: Proc. 14 IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  6. De Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering (extended abstract). In: Kumar, V., Tsumoto, S., Zhong, N., Philip, S., Yu, X.W. (eds.) Proc. The 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi, Japan, pp. 123–130 (2002) ISBN 0-7695-1754-4

    Google Scholar 

  7. Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in HIV data. In: KDD-2001: The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001) ISBN: 158113391X

    Google Scholar 

  8. De Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI 2001: Seventeenth International Joint Conference on Artificial Intelligence (2001)

    Google Scholar 

  9. De Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering (2003) (submitted to a journal)

    Google Scholar 

  10. Lee, S.D., De Raedt, L.: An algebra for inductive query evaluation. In: [20], pp. 147–154

    Google Scholar 

  11. Grahne, G., Lakshmanan, L.V.S., Wang, X.: Efficient mining of constrained correlated sets. In: Proceedings of the 16th International Conference on Data Engineering, pp. 512–521. IEEE Computer Society, Los Alamitos (2000)

    Google Scholar 

  12. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, U.S.A, pp. 1–12. ACM Press, New York (2000)

    Chapter  Google Scholar 

  13. Pei, J., Han, J.: Can we push more constraints into frequent pattern mining? In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, USA (2000) ISBN: 1-58113-233-6

    Google Scholar 

  14. Fischer, J., De Raedt, L.: Towards optimizing conjunctive inductive queries. In: Proc. The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004), Carlton Crest Hotel, Sydney, Australia (2004)

    Google Scholar 

  15. Boulicaut, J.F., Jeudy, B.: Using constraints during set mining: Should we prune or not? In: Actes des Seizième Journées Bases de Données Avancées (BDA 2000), Blois, France, pp. 221–237 (2000)

    Google Scholar 

  16. Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: ExAMiner: Optimized level-wise frequent pattern mining with monotone constraints. In: [20], pp. 11–18 (2003)

    Google Scholar 

  17. Mitchell, T.M.: Generalization as search. Artificial Intelligence 18, 203–226 (1982)

    Article  MathSciNet  Google Scholar 

  18. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pp. 487–499. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  19. Greenberg, S.: Using unix: Collected traces of 168 users. Research Report 88/333/45, Department of Computer Science, University of Calgary, Alberta, Canada (1988)

    Google Scholar 

  20. Wu, X., Tuzhilin, A., Shavlik, J.: Proceedings of The Third IEEE International Conference on Data Mining (ICDM 2003). In: Wu, X., Tuzhilin, A., Shavlik, J. (eds.) Proceedings of The Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA. Sponsored by the IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, S.D., De Raedt, L. (2005). An Efficient Algorithm for Mining String Databases Under Constraints. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31841-5_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25082-1

  • Online ISBN: 978-3-540-31841-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics