On the Expressivity of Alignment-Based Distance and Similarity Measures on Sequences and Trees in Inducing Orderings

Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 30)

Abstract

Both ‘distance’ and ‘similarity’ measures have been proposed for the comparison of sequences and for the comparison of trees, based on scoring mappings. For a given alphabet of node-labels, the measures are parameterised by a table giving label-dependent values for swaps, deletions and insertions. The paper addresses the question whether an ordering by a ‘distance’ measure, with some parameter setting, can be also expressed by a ‘similarity’ measure, with some other parameter setting, and vice versa. Ordering of three kinds is considered: alignment-orderings, for fixed source S and target T, neighbour-orderings, where for a fixed S, varying candidate neighbours T i are ranked, and pair-orderings, where for varying S i , and varying T j , the pairings \(\langle {S}_{i},{T}_{j}\rangle\) are ranked. We show that (1) any alignment-ordering expressed by ‘distance’ setting be re-expressed by a ‘similarity’ setting, and vice versa; (2) any neigbour-ordering and pair-ordering expressed by a ‘distance’ setting be re-expressed by a ‘similarity’ setting; (3) there are neighbour-orderings and pair-orderings expressed by a ‘similarity’ setting which cannot be expressed by a ‘similarity’ setting. A consequence of this is that there are categorisation and hierarchical clustering outcomes which can be achieved via similarity but not via

Keyword

Similarity distance tree sequence 

Notes

Acknowledgements

This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (http://www.cngl.ie) at Trinity College Dublin.

References

  1. 1.
    Batagelj, V., Bren, M.: Comparing resemblance measures. J. Classif. 12(1), 73–90 (1995)MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Chen, S., Ma, B., Zhang, K.: On the similarity metric and the distance metric. Theoret. Comput. Sci. 410(24–25), 2365–2376 (2009)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Emms, M.: On stochastic tree distances and their training via expectation-maximisation. In: Proceedings of ICPRAM 2012 International Conference on Pattern Recognition Application and Methods. SciTePress (2012)Google Scholar
  4. 4.
    Emms, M., Franco-Penya, H.: Data-set used in Kendall-Tau experiments. http://www.scss.tcd.ie/Martin.Emms/SimVsDistData September 8th (2011)
  5. 5.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)MATHCrossRefGoogle Scholar
  6. 6.
    Haji, J., Ciaramita, M., Johansson, R., Kawahara, D., Meyers, A., Nivre, J., Surdeanu, M., Xue, N., Zhang, Y.: The conll-2009 shared task: Syntactic and semantic dependencies in multiple languages. In: Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL-2009). OmniPress (2009)Google Scholar
  7. 7.
    Herrbach, C., Denise, A., Dulucq, S., Touzet, H.: Alignment of rna secondary structures using a full set of operations. Technical Report 145, LRI (2006)Google Scholar
  8. 8.
    Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33(3), 239–251 (1945)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Kuboyama, T.: Matching and learning in trees. PhD thesis, Graduate School of Engineering, University of Tokyo (2007)Google Scholar
  10. 10.
    Lesot, M.J., Rifqi, M.: Order-based equivalence degrees for similarity and distance measures. In: Proceedings of the Computational Intelligence for Knowledge-Based Systems Design, and 13th International Conference on Information Processing and Management of Uncertainty. IPMU’10, pp. 19–28. Springer, Berlin (2010)Google Scholar
  11. 11.
    Omhover, J.F., Rifqi, M., Detyniecki, M.: Ranking invariance based on similarity measures in document retrieval. In: Adaptive Multimedia Retrieval, pp. 55–64 Elsevier (2005)Google Scholar
  12. 12.
    Ristad, E.S., Yianilos, P.N.: Learning string edit distance. IEEE Trans. Pattern Recogn. Mach. Intell. 20(5), 522–532 (1998)CrossRefGoogle Scholar
  13. 13.
    Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2(4), 482–489 (1981)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Spiro, P.A., Macura, N.: A local alignment metric for accelerating biosequence database search. J. Comput. Biol. 11(1), 61–82 (2004)CrossRefGoogle Scholar
  15. 15.
    Stojmirovic, A., Yu, Y.K.: Geometric aspects of biological sequence comparison. J. Comput. Biol. 16, 579–610 (2009)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Tai, K.C.: The tree-to-tree correction problem. J. ACM (JACM) 26(3), 433 (1979)Google Scholar
  17. 17.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. Assoc. Comput. Mach. 21(1), 168–173 (1974)MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18, 1245–1262 (1989)MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.School of Computer Science and StatisticsTrinity CollegeDublinIreland

Personalised recommendations