Abstract
Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L 1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L 1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP’s efficiency.
This work was supported by JSPS KAKENHI(24700140,26280088) and the JST PRESTO program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bafna, V., Pevzner, P.A.: Genome rearrangements and sorting by reversals. SIAM Jour. on Comp. 25, 272–289 (1996)
Clifford, R., Sach, B.: Pattern matching in pseudo real-time. JDA 9, 67–81 (2011)
Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. TALG 3, 2:1–2:19 (2007)
Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press (1994)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press (1998)
Jacobson, G.: Space-efficient static trees and graphs. In: Proc. of FOCS, pp. 549–554 (1989)
Jalsenius, M., Porat, B., Sach, B.: Parameterized matching in the streaming model. In: STACS, pp. 400–411 (2013)
Kececioglu, J., Sankoff, D.: Exact and approximation algorithms for the inversion distance between two chromosomes. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 87–105. Springer, Heidelberg (1993)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10, 707–710 (1996)
Maruyama, S., Tabei, Y.: Fully-online grammar compression in constant space. In: Proc. of DCC, pp. 218–229 (2014)
Maruyama, S., Tabei, Y., Sakamoto, H., Sadakane, K.: Fully-online grammar compression. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 218–229. Springer, Heidelberg (2013)
Muthukrishnan, S., Sahinalp, S.C.: Approximate nearest neighbors and sequence comparison with block operations. In: Proc. of STOC, pp. 416–424 (2000)
Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012)
Navarro, G., Sadakane, K.: Fully-functional static and dynamic succinct trees. TALG (2012) (accepted); A preliminary version appeared in SODA 2010 (2010)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comp. Sci. 302(1-3), 211–222 (2003)
Shapira, D., Storer, J.A.: Edit distance with move operations. JDA 5, 380–392 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Takabatake, Y., Tabei, Y., Sakamoto, H. (2014). Online Pattern Matching for String Edit Distance with Moves. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-11918-2_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11917-5
Online ISBN: 978-3-319-11918-2
eBook Packages: Computer ScienceComputer Science (R0)