Advertisement

Approximate Matching in the L1 Metric

  • Amihood Amir
  • Ohad Lipsky
  • Ely Porat
  • Julia Umanski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3537)

Abstract

Approximate matching is one of the fundamental problems in pattern matching, and a ubiquitous problem in real applications. The Hamming distance is a simple and well studied example of approximate matching, motivated by typing, or noisy channels. Biological and image processing applications assign a different value to mismatches of different symbols.

We consider the problem of approximate matching in the L 1 metric – the k- L 1 -distance problem. Given text T=t 0,...,t n − 1 and pattern P=p 0,...,p m − 1 strings of natural number, and a natural number k, we seek all text locations i where the L 1 distance of the pattern from the length m substring of text starting at i is not greater than k, i.e. \(\sum_{j=0}^{m-1} |{t}_{i+j} - {p}_{j}| \leq k\).

We provide an algorithm that solves the k-L 1-distance problem in time \(O(n\sqrt{k\log k})\). The algorithm applies a bounded divide-and-conquer approach and makes novel uses of non-boolean convolutions.

Keywords

Edit Distance String Match Text Element Pattern Occurrence Text Location 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abrahamson, K.: Generalized string matching. SIAM J. Comp. 16(6), 1039–1051 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Amir, A., Aumann, A., Cole, R., Lewenstein, M., Porat, E.: Function matching: Algorithms, applications, and a lower bound. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 929–942. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Amir, A., Cole, R., Hariharan, R., Lewenstein, M., Porat, E.: Overlap matching. Information and Computation 181(1), 57–74 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Amir, A., Eisenberg, E., Porat, E.: Swap and mismatch edit distance. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS, vol. 3221, pp. 16–27. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Amir, A., Farach, M.: Efficient 2-dimensional approximate matching of halfrectangular figures. Information and Computation 118(1), 1–11 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. J. Algorithms (2004)Google Scholar
  7. 7.
    Berkman, O., Breslauer, D., Galil, Z., Schieber, B., Vishkin, U.: Highly parallelizable problems. In: Proc. 21st ACM Symposium on Theory of Computation, pp. 309–319 (1989)Google Scholar
  8. 8.
    Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. 34st Annual Symposium on the Theory of Computing (STOC), pp. 592–601 (2002)Google Scholar
  9. 9.
    Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. In: Proc. 13th annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 667–676. Society for Industrial and Applied Mathematics (2002)Google Scholar
  10. 10.
    Fischer, M.J., Paterson, M.S.: String matching and other products. In: Karp, R.M. (ed.) Complexity of Computation. SIAM-AMS Proceedings, vol. 7, pp. 113–125 (1974)Google Scholar
  11. 11.
    Galil, Z.: Open problems in stringology. In: Galil, Z., Apostolico, A. (eds.) Combinatorial Algorithms on Words. NATO ASI Series F, vol. 12, pp. 1–8 (1985)Google Scholar
  12. 12.
    Galil, Z., Giancarlo, R.: Improved string matching with k mismatches. SIGACT News 17(4), 52–54 (1986)CrossRefGoogle Scholar
  13. 13.
    Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestor. Computer and System Science 13, 338–355 (1984)zbMATHMathSciNetGoogle Scholar
  14. 14.
    Karloff, H.: Fast algorithms for approximately counting mismatches. Information Processing Letters 48(2), 53–60 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Landau, G.M., Vishkin, U.: Efficient string matching with k mismatches. Theoretical Computer Science 43, 239–249 (1986)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Lipsky, O.: Efficient distance computations. Master’s thesis, Bar-Ilan University, Department of Computer Science, Ramat-Gan 52900, ISRAEL (2003)Google Scholar
  17. 17.
    Maasoumi, E., Racine, J.: Entropy and predictability of stock market returns. Journal of Econometrics 107(1), 291–312 (2002), available at http://ideas.repec.org/a/eee/econom/v107y2002i1-2p291-312.html zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Malagnini, L., Herman, R.B., Di Bona, M.: Ground motion scaling in the apennines (italy). Bull. Seism. Soc. Am. 90, 1062–1081 (2000)CrossRefGoogle Scholar
  19. 19.
    McCreight, E.M.: A space-economical suffix tree construction algorithm. J. of the ACM 23, 262–272 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Olson, M.V.: A time to sequence. Science 270, 394–396 (1995)CrossRefGoogle Scholar
  21. 21.
    Pentland, A.: Invited talk. In: NSF Institutional Infrastructure Workshop (1992)Google Scholar
  22. 22.
    Shmulevich, I., Yli-Harja, O., Coyle, E., Povel, D., Lemstrom, K.: Perceptual issues in music pattern recognition — complexity of rhythm and key finding (April 1999)Google Scholar
  23. 23.
    Weiner, P.: Linear pattern matching algorithm. In: Proc. 14 IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Amihood Amir
    • 1
  • Ohad Lipsky
    • 2
  • Ely Porat
    • 2
  • Julia Umanski
    • 2
  1. 1.Department of Computer ScienceBar-Ilan University,and Georgia TechRamat-GanIsrael
  2. 2.Department of Computer ScienceBar-Ilan UniversityRamat-GanIsrael

Personalised recommendations