A Practical Algorithm to Find the Best Subsequence Patterns

Hirao, Masahiro; Hoshino, Hiromasa; Shinohara, Ayumi; Takeda, Masayuki; Arikawa, Setsuo

doi:10.1007/3-540-44418-1_12

A Practical Algorithm to Find the Best Subsequence Patterns

Masahiro Hirao³,
Hiromasa Hoshino³,
Ayumi Shinohara³,
Masayuki Takeda³ &
…
Setsuo Arikawa³

Conference paper
First Online: 19 October 2001

371 Accesses
15 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1967))

Abstract

Given two sets of strings, consider the problem to find a subsequence that is common to one set but never appears in the other set. The problem is known to be NP-complete.We generalize the problem to an optimization problem, and give a practical algorithm to solve it exactly. Our algorithm uses pruning heuristic and subsequence automata, and can find the best subsequence. We show some experiments, that convinced us the approach is quite promising.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the 11th International Conference on Data Engineering, Mar. 1995.
Google Scholar
D. Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21(1):46–62, Aug. 1980.
Article MATH MathSciNet Google Scholar
H. Arimura and S. Shimozono. Maximizing agreement with a classification by bounded or unbounded number of associated words. In Proc. of 9th Annual International Symposium on Algorithms and Computation, volume 1533 of Lecture Notes in Computer Science. Springer-Verlag, Dec. 1998.
Google Scholar
H. Arimura, A. Wataki, R. Fujino, and S. Arikawa. A fast algorithm for discovering optimal string patterns in large text databases. In Proc. the 8th International Workshop on Algorithmic Learning Theory, volume 1501 of Lecture Notes in Artificial Intelligence, pages 247–261. Springer-Verlag, Oct. 1998.
Google Scholar
R. A. Baeza-Yates.Searching subsequences. Theoretical Computer Science, 78(2):363–376, Jan. 1991.
Article MATH MathSciNet Google Scholar
A. Califano. SPLASH: Structural pattern localization analysis by sequential histograms. Bioinformatics, Feb. 1999.
Google Scholar
M. Crochemore and Z. Troníček. Directed acyclic subsequence graph for multiple texts. Technical Report IGM-99-13, Institut Gaspard-Monge, June 1999.
Google Scholar
G. Das, R. Fleischer, L. Gasieniek, D. Gunopulos, and J. Kärkkäinen. iEpisode matching. In A. Apostolico and J. Hein, editors, Proc. of the 8th Annual Symposium on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 12–27. Springer-Verlag, 1997.
Google Scholar
R. Feldman, Y. Aumann, A. Amir, A. Zilberstein, and W. Klosgen. Maximal association rules: A new tool for mining for keyword co-occurrences in document collections. In Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, pages 167–174. AAAI Press, Aug. 1997.
Google Scholar
R. Fujino, H. Arimura, and S. Arikawa. Discovering unordered and ordered phrase association patterns for text mining. In Proc. of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, volume 1805 of Lecture Notes in Artificial Intelligence. Springer-Verlag, Apr. 2000.
Google Scholar
H. Hoshino, A. Shinohara, M. Takeda, and S. Arikawa. Online construction of subsequence automata for multiple texts. In Proc. of 7th International Symposium on String Processing and Information Retrieval. IEEE Computer Society, Sept. 2000. (to appear).
Google Scholar
L. C. K. Hui. Color set problem with applications to string matching. In Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230–243. Springer-Verlag, 1992.
Google Scholar
T. Jiang and M. Li. On the complexity of learning strings and sequences. In Proc. of 4th ACM Conf. Computational Learning Theory, pages 367–371, 1991.
Google Scholar
K.-I. Ko and W. Tzeng. Three ∑^p ₂-complete problems in computational learning theory. Computational Complexity, 1(3):269–310, 1991.
Article MATH MathSciNet Google Scholar
H. Mannila, H. Toivonen, and A. I. Vercamo. Discovering frequent episode in sequences. In Proc. of the 1st International Conference on Knowledge Discovery and Data Mining, pages 210–215. AAAI Press, Aug. 1995.
Google Scholar
S. Miyano, A. Shinohara, and T. Shinohara. Which classes of elementary formal systems are polynomial-time learnable? In Proc. of 2nd Workshop on Algorithmic Learning Theory, pages 139–150, 1991.
Google Scholar
S. Miyano, A. Shinohara, and T. Shinohara. Polynomial-time learning of elementary formal systems. New Generation Computing, 18:217–242, 2000.
Article Google Scholar
S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 226–236, May 2000.
Google Scholar
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
Google Scholar
S. Shimozono, A. Shinohara, T. Shinohara, S. Miyano, S. Kuhara, and S. Arikawa. Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Transactions of Information Processing Society of Japan, 35(10):2009–2018, Oct. 1994.
Google Scholar
Z. Troníček and B. Melichar. Directed acyclic subsequence graph. In Proc. of the Prague Stringology Club Workshop’ 98, pages 107–118, Sept. 1998.
Google Scholar
J. T. L. Wang, G.-W. Chirn, T. G. Marr, B. A. Shapiro, D. Shasha, and K. Zhang. Combinatorial pattern discovery for scientific data: Some preliminary results. In Proc. of the 1994 ACM SIGMOD International Conference on Management of Data, pages 115–125. ACM Press, May 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Kyushu University 33, 812-8581, Fukuoka, JAPAN
Masahiro Hirao, Hiromasa Hoshino, Ayumi Shinohara, Masayuki Takeda & Setsuo Arikawa

Authors

Masahiro Hirao
View author publications
You can also search for this author in PubMed Google Scholar
Hiromasa Hoshino
View author publications
You can also search for this author in PubMed Google Scholar
Ayumi Shinohara
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Takeda
View author publications
You can also search for this author in PubMed Google Scholar
Setsuo Arikawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Information Science and Electrical Engineering, Department of Informatics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, 812-8581, Fukuoka, Japan
Setsuo Arikawa
Faculty of Science Department of Information Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-0033, Tokyo, Japan
Shinichi Morishita

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hirao, M., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S. (2000). A Practical Algorithm to Find the Best Subsequence Patterns. In: Arikawa, S., Morishita, S. (eds) Discovery Science. DS 2000. Lecture Notes in Computer Science(), vol 1967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44418-1_12

Download citation

DOI: https://doi.org/10.1007/3-540-44418-1_12
Published: 19 October 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41352-3
Online ISBN: 978-3-540-44418-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics