Skip to main content

A Practical Algorithm to Find the Best Subsequence Patterns

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1967))

Abstract

Given two sets of strings, consider the problem to find a subsequence that is common to one set but never appears in the other set. The problem is known to be NP-complete.We generalize the problem to an optimization problem, and give a practical algorithm to solve it exactly. Our algorithm uses pruning heuristic and subsequence automata, and can find the best subsequence. We show some experiments, that convinced us the approach is quite promising.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the 11th International Conference on Data Engineering, Mar. 1995.

    Google Scholar 

  2. D. Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21(1):46–62, Aug. 1980.

    Article  MATH  MathSciNet  Google Scholar 

  3. H. Arimura and S. Shimozono. Maximizing agreement with a classification by bounded or unbounded number of associated words. In Proc. of 9th Annual International Symposium on Algorithms and Computation, volume 1533 of Lecture Notes in Computer Science. Springer-Verlag, Dec. 1998.

    Google Scholar 

  4. H. Arimura, A. Wataki, R. Fujino, and S. Arikawa. A fast algorithm for discovering optimal string patterns in large text databases. In Proc. the 8th International Workshop on Algorithmic Learning Theory, volume 1501 of Lecture Notes in Artificial Intelligence, pages 247–261. Springer-Verlag, Oct. 1998.

    Google Scholar 

  5. R. A. Baeza-Yates.Searching subsequences. Theoretical Computer Science, 78(2):363–376, Jan. 1991.

    Article  MATH  MathSciNet  Google Scholar 

  6. A. Califano. SPLASH: Structural pattern localization analysis by sequential histograms. Bioinformatics, Feb. 1999.

    Google Scholar 

  7. M. Crochemore and Z. Troníček. Directed acyclic subsequence graph for multiple texts. Technical Report IGM-99-13, Institut Gaspard-Monge, June 1999.

    Google Scholar 

  8. G. Das, R. Fleischer, L. Gasieniek, D. Gunopulos, and J. Kärkkäinen. iEpisode matching. In A. Apostolico and J. Hein, editors, Proc. of the 8th Annual Symposium on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 12–27. Springer-Verlag, 1997.

    Google Scholar 

  9. R. Feldman, Y. Aumann, A. Amir, A. Zilberstein, and W. Klosgen. Maximal association rules: A new tool for mining for keyword co-occurrences in document collections. In Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, pages 167–174. AAAI Press, Aug. 1997.

    Google Scholar 

  10. R. Fujino, H. Arimura, and S. Arikawa. Discovering unordered and ordered phrase association patterns for text mining. In Proc. of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, volume 1805 of Lecture Notes in Artificial Intelligence. Springer-Verlag, Apr. 2000.

    Google Scholar 

  11. H. Hoshino, A. Shinohara, M. Takeda, and S. Arikawa. Online construction of subsequence automata for multiple texts. In Proc. of 7th International Symposium on String Processing and Information Retrieval. IEEE Computer Society, Sept. 2000. (to appear).

    Google Scholar 

  12. L. C. K. Hui. Color set problem with applications to string matching. In Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230–243. Springer-Verlag, 1992.

    Google Scholar 

  13. T. Jiang and M. Li. On the complexity of learning strings and sequences. In Proc. of 4th ACM Conf. Computational Learning Theory, pages 367–371, 1991.

    Google Scholar 

  14. K.-I. Ko and W. Tzeng. Three ∑p 2-complete problems in computational learning theory. Computational Complexity, 1(3):269–310, 1991.

    Article  MATH  MathSciNet  Google Scholar 

  15. H. Mannila, H. Toivonen, and A. I. Vercamo. Discovering frequent episode in sequences. In Proc. of the 1st International Conference on Knowledge Discovery and Data Mining, pages 210–215. AAAI Press, Aug. 1995.

    Google Scholar 

  16. S. Miyano, A. Shinohara, and T. Shinohara. Which classes of elementary formal systems are polynomial-time learnable? In Proc. of 2nd Workshop on Algorithmic Learning Theory, pages 139–150, 1991.

    Google Scholar 

  17. S. Miyano, A. Shinohara, and T. Shinohara. Polynomial-time learning of elementary formal systems. New Generation Computing, 18:217–242, 2000.

    Article  Google Scholar 

  18. S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 226–236, May 2000.

    Google Scholar 

  19. J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.

    Google Scholar 

  20. S. Shimozono, A. Shinohara, T. Shinohara, S. Miyano, S. Kuhara, and S. Arikawa. Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Transactions of Information Processing Society of Japan, 35(10):2009–2018, Oct. 1994.

    Google Scholar 

  21. Z. Troníček and B. Melichar. Directed acyclic subsequence graph. In Proc. of the Prague Stringology Club Workshop’ 98, pages 107–118, Sept. 1998.

    Google Scholar 

  22. J. T. L. Wang, G.-W. Chirn, T. G. Marr, B. A. Shapiro, D. Shasha, and K. Zhang. Combinatorial pattern discovery for scientific data: Some preliminary results. In Proc. of the 1994 ACM SIGMOD International Conference on Management of Data, pages 115–125. ACM Press, May 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hirao, M., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S. (2000). A Practical Algorithm to Find the Best Subsequence Patterns. In: Arikawa, S., Morishita, S. (eds) Discovery Science. DS 2000. Lecture Notes in Computer Science(), vol 1967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44418-1_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-44418-1_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41352-3

  • Online ISBN: 978-3-540-44418-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics