Alignment Distance of Regular Tree Languages

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10329)

Abstract

We consider the tree alignment distance problem between a tree and a regular tree language. The tree alignment distance is an alternative of the tree edit-distance, in which we construct an optimal alignment between two trees and compute its cost instead of directly computing the minimum cost of tree edits. The alignment distance is crucial for understanding the structural similarity between trees.

We, in particular, consider the following problem: given a tree t and a tree automaton recognizing a regular tree language L, find the most similar tree from L with respect to t under the tree alignment metric. Regular tree languages are commonly used in practice such as XML schema or bioinformatics. We propose an O(mn) time algorithm for computing the (ordered) alignment distance between t and L when the maximum degree of t and trees in L is bounded by a constant, and \(O(mn^2)\) time algorithm when the maximum degree of trees in L is not bounded, where m is the size of t and n is the size of finite tree automaton for L. We also study the case where a tree is not necessarily ordered, and show that the time complexity remains O(mn) if the maximum degree is bounded and MAX SNP-hard otherwise.

Keywords

Tree alignment Alignment edit-distance Regular tree languages Tree automata 

References

  1. 1.
    Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof verification and the hardness of approximation problems. J. ACM 45(3), 501–555 (1998)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Canfield, E.R., Xing, G.: Approximate matching of XML document with regular hedge grammar. Int. J. Comput. Math. 82(10), 1191–1198 (2005)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Comon, H., Dauchet, M., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree Automata Techniques and Applications (2007)Google Scholar
  4. 4.
    Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. ACM Trans. Algorithms 6(1), 2:1–2:19 (2009)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Höchsmann, M., Töller, T., Giegerich, R., Kurtz, S.: Local similarity in RNA secondary structures. In: Proceedings of the 2nd IEEE Computer Society Conference on Bioinformatics, pp. 159–168 (2003)Google Scholar
  7. 7.
    Jiang, T., Wang, L., Zhang, K.: Alignment of trees – an alternative to tree edit. Theoret. Comput. Sci. 143(1), 137–148 (1995)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Klein, P.N.: Computing the edit-distance between unrooted ordered trees. In: Proceedings of the 6th Annual European Symposium on Algorithms, pp. 91–102 (1998)Google Scholar
  9. 9.
    Kuboyama, T., Shin, K., Miyahara, T., Yasuda, H.: A theoretical analysis of alignment and edit problems for trees. In: Proceedings of the 9th Italian Conference on Theoretical Computer Science, pp. 323–337 (2005)Google Scholar
  10. 10.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)MathSciNetMATHGoogle Scholar
  11. 11.
    López, D., España, S.: Error-correcting tree language inference. Pattern Recogn. Lett. 23(1–3), 1–12 (2002)CrossRefMATHGoogle Scholar
  12. 12.
    López, D., Sempere, J.M., García, P.: Error correcting analysis for tree languages. Int. J. Pattern Recogn. Artif. Intell. 14(03), 357–368 (2000)CrossRefGoogle Scholar
  13. 13.
    Lu, C.L., Su, Z.-Y., Tang, C.Y.: A new measure of edit distance between labeled trees. In: Proceedings of the 7th Annual International Conference on Computing and Combinatorics, pp. 338–348 (2001)Google Scholar
  14. 14.
    Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proceedings of the 5th International Workshop on the Web and Databases, pp. 61–66 (2002)Google Scholar
  15. 15.
    Tai, K.-C.: The tree-to-tree correction problem. J. ACM 26(3), 422–433 (1979)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Voß, B., Giegerich, R., Rehmsmeier, M.: Complete probabilistic analysis of RNA shapes. BMC Biol. 4(1), 1–23 (2006)CrossRefGoogle Scholar
  17. 17.
    Xing, G.: Approximate matching of XML documents with schemata using tree alignment. In: Proceedings of the 2014 ACM Southeast Regional Conference, pp. 43:1–43:4 (2014)Google Scholar
  18. 18.
    Zhang, K.: A constrained edit distance between unordered labeled trees. Algorithmica 15(3), 205–222 (1996)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Zhang, K., Jiang, T.: Some MAX SNP-hard results concerning unordered labeled trees. Inf. Process. Lett. 49(5), 249–254 (1994)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Zhang, K., Statman, R., Shasha, D.: On the editing distance between unordered labeled trees. Inf. Process. Lett. 42(3), 133–139 (1992)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceYonsei UniversitySeoulRepublic of Korea
  2. 2.Department of Computer ScienceUniversity of LiverpoolLiverpoolUK

Personalised recommendations