# Polynomial-time algorithms for computing characteristic strings

## Abstract

The difference between two strings is the minimum number of editing steps (insertions, deletions, changes) that convert one string into the other. Let *S* be a finite set of strings, let *T* be a subset of *S*, and let *δ* be a positive integer. A *δ*-characteristic string of *T* under *S* is a string that is a common substring of *T* and that has at least *δ*-differences from any substring of any string in *S − T*. In this paper, the following result is presented.lt can be decided in *O*(∥*T*∥+*l*_{2} · ¦*S− T*¦+*l* ·*δ·¦*¦*S−T*¦¦) time whether or not there exists a *δ*-characteristic string of *T* under *S*, where *l* denotes the length of a shortest string in *T*, ¦*S− T*¦ the cardinality of *S − T*, and ∥*T*∥ the size of *T*. If such a string exits, then all the shortest *δ*-characteristic strings of *T* under *S* can also be obtained in that time.

## Keywords

characteristic string approximate pattern matching DNA probe## Preview

Unable to display preview. Download preview PDF.

## References

- 1.R.S.Boyer and J.S.Moore: “A fast string searching algorithm,” Comm. ACM, 20, 10, pp.762–772 (Oct. 1977).Google Scholar
- 2.R.Dular, R.Kajioka and S.Kasatiya: “Comparison of Gene-Probe Commercial Kit and Culture Technique for the Diagnosis of Mycoplasma pneumoniae Infection,” Journal of Clinical Microbiology, 26, 5, pp.1068–1069 (May 1988).Google Scholar
- 3.M.Hasidume, M.Ito, M.Nakanishi and A.Hashimoto: “A linear-time algorithm for computing a shortest characteristic substring of strings” (in Japanese), IEICE Technical Report, COMP93-36, pp.39–46 (July 1993).Google Scholar
- 4.D.G.Higgins and P.M.Sharp: “Fast and sensitive multiple sequence alignments on a microcomputer,” CABIOS, 5, 2, pp.151–153 (Apr. 1989).Google Scholar
- 5.D.E.Knuth, J.H.Morris and V.R. Pratt: “Fast pattern matching in strings,” SIAM Journal on Computing, 6, 2, pp.323–350 (June 1977).Google Scholar
- 6.G.M.Landau and U.Vishkin: “Introducing efficient parallelism into approximate string matching and a new serial algorithm,” Proc. 18th ACM Symp. on Theory of Computing, pp.220–230 (May 1986).Google Scholar
- 7.G.M.Landau and U.Vishkin: “Fast parallel and serial approximate string matching,” Journal of Algorithms, 10, pp.157–169 (June 1989).Google Scholar
- 8.A.J.L.Macario and E.C.de Macario: “Gene Probes for Bacteria,” Academic Press (1990)Google Scholar
- 9.M.Nasu, K.Shimada, S.Inaoka, K.Tani and M.Kondo: “ Natural bacterial populations in river water determined by 16S and 23S rRNA-targeted oligonucleotide probes,” (submitted to Biomédical and Environmental Sciences).Google Scholar
- 10.W.R.Pearson and D.J.Lipman: “Improved tools for biological sequence comparison,” Proc. Natl. Acad. Sci. USA, 85, pp.2444–2448 (Apr. 1988).Google Scholar
- 11.P.H.Sellers: “The theory and computation of evolutionary distances: Pattern recognition,” Journal of Algorithms, 1, pp.359–373 (Dec. 1980).Google Scholar
- 12.E.Ukkonen: “On approximate string matching,” Lecture Notes in Computer Science, 158, pp.487–495 (Aug. 1983).Google Scholar
- 13.E.Ukkonen: “Finding approximate patterns in strings,” Journal of Algorithms, 6, 1, pp.132–137 (Mar. 1985).Google Scholar
- 14.E.Ukkonen: “Algorithms for approximate string matching,” Information and Control, 64, pp.100–118 (Mar. 1985).Google Scholar
- 15.P.Weiner: “Linear pattern matching algorithms,” Proc. IEEE 14th Symposium on Switching and Automata Theory, pp.1–11 (1973)Google Scholar
- 16.”Genome Databases,” Science, 254 (Oct. 1991).Google Scholar