Skip to main content

Finding Best Patterns Practically

  • Chapter
  • First Online:
Progress in Discovery Science

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2281))

Abstract

Finding a pattern which separates two sets is a critical task in discovery. Given two sets of strings, consider the problem to find a subsequence that is common to one set but never appears in the other set. The problem is known to be NP-complete. Episode pattern is a generalized concept of subsequence pattern where the length of substring containing the subsequence is bounded. We generalize these problems to optimization problems, and give practical algorithms to solve them exactly. Our algorithms utilize some pruning heuristics based on the combinatorial properties of strings, and efficient data structures which recognize subsequence and episode patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21(1):46–62, Aug. 1980.

    Article  MATH  MathSciNet  Google Scholar 

  2. R. A. Baeza-Yates. Searching subsequences. Theoretical Computer Science, 78(2):363–376, Jan. 1991.

    Article  MATH  MathSciNet  Google Scholar 

  3. A. Califano. SPLASH: Structural pattern localization analysis by sequential histograms. Bioinformatics, Feb. 1999.

    Google Scholar 

  4. M. Crochemore and Z. Troníček. Directed acyclic subsequence graph for multiple texts. Technical Report IGM-99-13, Institut Gaspard-Monge, June 1999.

    Google Scholar 

  5. R. Feldman, Y. Aumann, A. Amir, A. Zilberstein, and W. Klosgen. Maximal association rules: A new tool for mining for keyword co-occurrences in document collections. In Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, pages 167–170. AAAI Press, Aug. 1997.

    Google Scholar 

  6. Y. Hamuro, H. Kawata, N. Katoh, and K. Yada. A machine learning algorithm for analyzing string patterns helps to discover simple and interpretable business rules from purchase history. In Progress in Discovery Science, LNCS, 2002. (In this volume).

    Google Scholar 

  7. M. Hirao, H. Hoshino, A. Shinohara, M. Takeda, and S. Arikawa. A practical algorithm to find the best subsequence patterns. In Proc. of The Third International Conference on Discovery Science, volume 1967 of Lecture Notes in Artificial Intelligence, pages 141–154. Springer-Verlag, Dec. 2000.

    Google Scholar 

  8. M. Hirao, S. Inenaga, A. Shinohara, M. Takeda, and S. Arikawa. A practical algorithm to find the best episode patterns. In Proc. of The Fourth International Conference on Discovery Science, Lecture Notes in Artificial Intelligence. Springer-Verlag, Nov. 2001.

    Google Scholar 

  9. H. Hoshino, A. Shinohara, M. Takeda, and S. Arikawa. Online construction of subsequence automata for multiple texts. In Proc. of 7th International Symposium on String Processing and Information Retrieval. IEEE Computer Society, Sept. 2000. (to appear).

    Google Scholar 

  10. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episode in sequences. In U. M. Fayyad and R. Uthurusamy, editors, Proc. of the 1st International Conference on Knowledge Discovery and Data Mining, pages 210–215. AAAI Press, Aug. 1995.

    Google Scholar 

  11. S. Miyano, A. Shinohara, and T. Shinohara. Polynomial-time learning of elementary formal systems. New Generation Computing, 18:217–242, 2000.

    Article  Google Scholar 

  12. S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 226–236. ACM Press, May 2000.

    Google Scholar 

  13. S. Shimozono, A. Shinohara, T. Shinohara, S. Miyano, S. Kuhara, and S. Arikawa. Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Transactions of Information Processing Society of Japan, 35(10):2009–2018, Oct. 1994.

    Google Scholar 

  14. Z. Troníček. Episode matching. In Proc. of 12th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science. Springer-Verlag, July 2001. (to appear).

    Google Scholar 

  15. J. T. L. Wang, G.-W. Chirn, T. G. Marr, B. A. Shapiro, D. Shasha, and K. Zhang. Combinatorial pattern discovery for scientific data: Some preliminary results. In Proc. of the 1994 ACM SIGMOD International Conference on Management of Data, pages 115–125. ACM Press, May 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Shinohara, A., Takeda, M., Arikawa, S., Hirao, M., Hoshino, H., Inenaga, S. (2002). Finding Best Patterns Practically. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_21

Download citation

  • DOI: https://doi.org/10.1007/3-540-45884-0_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43338-5

  • Online ISBN: 978-3-540-45884-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics