Customizable HMM-based measures to accurately compare tree sets

Iloga, Sylvain

doi:10.1007/s10044-021-00971-3

Customizable HMM-based measures to accurately compare tree sets

Theoretical advances
Published: 31 March 2021

Volume 24, pages 1149–1171, (2021)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Sylvain Iloga ORCID: orcid.org/0000-0002-4603-7744^1,2,3

135 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Trees have been topics of much interest since many decades due to various emerging applications using data represented as trees. Several techniques have been developed to compare two trees. But there is a serious lack of metrics to compare weighted trees. Existing approaches do not also allow to explicitly specify the targeted nodes properties on which the comparison should be performed. Furthermore, the problem of comparing two tree sets is not specifically addressed by existing techniques. This paper attempts to solve these problems by first proposing a distance and a similarity for the comparison of two finite sets of rooted ordered trees which can be labeled or not, as well as weighted or unweighted. To achieve this goal, a hidden Markov model is associated with each tree set for each targeted nodes property. The model associated with a tree set T for the targeted nodes property p learns how much the nodes of the trees in T verify property p. The resulting models are finally compared to derive a distance and similarity between the two sets of trees. The previous measures are then generalized for the comparison of unrooted and unordered trees. Flat classification experiments were carried out on two synthetic databases named FirstLast-L and FirstLast-LW available online. They both contain four classes of 100 rooted ordered trees whose specific and non-trivial nodes properties are clearly defined. When the distance proposed in this paper is selected as metric for the Nearest Neighbor classifier, a perfect accuracy of \(100\%\) is obtained for these two databases. This performance is \(41\%\) higher than the accuracy exhibited when the widespread tree Edit distance is selected for FirstLast-L.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Density-Based Clustering Based on Hierarchical Density Estimates

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Rahul Mondal, Evelina Ignatova, … Robert Heyer

Notes

See page 15, Section F
http://www.simotime.com/asc2ebc1.htm.
http://perso-etis.ensea.fr/sylvain.iloga/FirstLast/index.html.
http://tree-edit-distance.dbresearch.uni-salzburg.at/.

References

Valiente G (2001) An efficient bottom-up distance between trees. In: spire, pages 212–219
Bille P (2003) Tree edit distance, alignment distance and inclusion. Technical report, Citeseer
Liu T-L, Geiger D (1999) Approximate tree matching and shape similarity. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 1, pages 456–462. IEEE
Bhavsar VC, Boley H, Yang L (2004) A weighted-tree similarity algorithm for multi-agent systems in e-business environments. Comput Intell 20(4):584–602
Article MathSciNet Google Scholar
Tai K-C (1979) The tree-to-tree correction problem. J ACM (JACM) 26(3):422–433
Article MathSciNet Google Scholar
Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262
Article MathSciNet Google Scholar
Zhang K, Statman R, Shasha D (1992) On the editing distance between unordered labeled trees. Inf Process Lett 42(3):133–139
Article MathSciNet Google Scholar
Zhang K, Jiang T (1994) Some max snp-hard results concerning unordered labeled trees. Inf Process Lett 49(5):249–254
Article MathSciNet Google Scholar
Klein PN (1998) Computing the edit-distance between unrooted ordered trees. In: European Symposium on Algorithms, pages 91–102. Springer
Chen W (2001) New algorithm for ordered tree-to-tree correction problem. J Algorithms 40(2):135–158
Article MathSciNet Google Scholar
Touzet H (2007) Comparing similar ordered trees in linear-time. J Discrete Algorithms 5(4):696–705
Article MathSciNet Google Scholar
Demaine ED, Mozes S, Rossman B, Weimann O (2009) An optimal decomposition algorithm for tree edit distance. ACM Trans Algorithms (TALG) 6(1):2
MathSciNet MATH Google Scholar
Pawlik M, Augsten N (2015) Efficient computation of the tree edit distance. ACM Trans Database Syst (TODS) 40(1):1–40
Article MathSciNet Google Scholar
Pawlik M, Augsten N (2016) Tree edit distance: robust and memory-efficient. Inf Syst 56:157–173
Article Google Scholar
Schwarz S, Pawlik M, Augsten N (2017) A new perspective on the tree edit distance. In: International Conference on Similarity Search and Applications, pages 156–170. Springer
Zhang K (1995) Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recogn 28(3):463–474
Article Google Scholar
Zhang K (1996) A constrained edit distance between unordered labeled trees. Algorithmica 15(3):205–222
Article MathSciNet Google Scholar
Richter T (1997) A new measure of the distance between ordered trees and its applications. Inst für Informatik
Lu CL, Su Z-Y, Tang CY (2001) A new measure of edit distance between labeled trees. In: International Computing and Combinatorics Conference, pages 338–348. Springer
Ouangraoua A, Ferraro P, Tichit L, Dulucq S (2007) Local similarity between quotiented ordered trees. J Discrete Algorithms 5(1):23–35
Article MathSciNet Google Scholar
Selkow SM (1977) The tree-to-tree editing problem. Inf Process Lett 6(6):184–186
Article MathSciNet Google Scholar
Shin-Yee L (1979) A tree-to-tree distance and its application to cluster analysis. IEEE Trans Pattern Anal Mach Intell 2:219–224
MATH Google Scholar
Tanaka E, Tanaka K (1988) The tree-to-tree editing problem. Int J Pattern Recognit Artif Intell 2(02):221–240
Article Google Scholar
Shasha D, Zhang K (1990) Fast algorithms for the unit cost editing distance between trees. J Algorithms 11(4):581–621
Article MathSciNet Google Scholar
Sridharamurthy R, Talha BM, Adhitya K, Vijay N (2018) Edit distance between merge trees. In: IEEE transactions on visualization and computer graphics, pages 1–14
Jiang T, Wang L, Zhang K (1995) Alignment of trees–an alternative to tree edit. Theoret Comput Sci 143(1):137–148
Article MathSciNet Google Scholar
Jansson J, Lingas A (2001) A fast algorithm for optimal alignment between similar ordered trees. In: Annual Symposium on Combinatorial Pattern Matching, pages 232–240. Springer
Kilpeläinen P, et al (1992) Tree matching problems with applications to structured text databases
Alonso L, Schott R (1993) On the tree inclusion problem. In: International Symposium on Mathematical Foundations of Computer Science, pages 211–221. Springer
Kilpeläinen P, Mannila H (1995) Ordered and unordered tree inclusion. SIAM J Comput 24(2):340–356
Article MathSciNet Google Scholar
Richter T (1997) A new algorithm for the ordered tree inclusion problem. In: Annual Symposium on Combinatorial Pattern Matching, pages 150–166. Springer
Chen W (1998) More efficient algorithm for ordered tree inclusion. J Algorithms 26(2):370–385
Article MathSciNet Google Scholar
Hoffmann CM, O’Donnell MJ (1982) Pattern matching in trees. J ACM 29(1):68–95
Article MathSciNet Google Scholar
Kosaraju SR (1989) Efficient tree pattern matching. In: 30th Annual Symposium on Foundations of Computer Science, pages 178–183. IEEE
Dubiner M, Galil Z, Magen E (1990) Faster tree pattern matching. In: Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science, pages 145–150. IEEE
Ramesh RAMAKRISHNAN, Ramakrishnan IV (1992) Nonlinear pattern matching in trees. J ACM (JACM) 39(2):295–316
Article MathSciNet Google Scholar
Zhang KZ, Shasha D, Wang JT-L (1994) Approximate tree matching in the presence of variable length don’t cares. J Algorithms 16(1):33–66
Article MathSciNet Google Scholar
Farach M, Thorup M (1995) Fast comparison of evolutionary trees. Inf Comput 123(1):29–37
Article MathSciNet Google Scholar
Amir A, Keselman D (1997) Maximum agreement subtree in a set of evolutionary trees: metrics and efficient algorithms. SIAM J Comput 26(6):1656–1669
Article MathSciNet Google Scholar
Khanna S, Motwani R, Yao FF (1995) Approximation algorithms for the largest common subtree problem. Citeseer
Akutsu T, Halldórsson MM (2000) On the approximation of largest common subtrees and largest common point sets. Theor Comput Sci 233(1–2):33–50
Article MathSciNet Google Scholar
Gupta A, Nishimura N (1998) Finding largest subtrees and smallest supertrees. Algorithmica 21(2):183–210
Article MathSciNet Google Scholar
Nishimura N, Ragde P, Thilikos DM (2000) Finding smallest supertrees under minor containment. Int J Found Comput Sci 11(03):445–465
Article MathSciNet Google Scholar
Tan P-N, Steinbach M, Kumar V et al (2006) Cluster analysis: basic concepts and algorithms. Intro Data Min 8:487–568
Google Scholar
Mucherino A, Papajorgji PJ, Pardalos PM (2009) Data Mining in Agriculture, volume 34, chapter k-Nearest Neighbor Classification. Springer, New York
Bondy JA, Uppaluri SRM, et al (1976) Graph theory with applications, volume 290. Macmillan London
Cheung T-Y (1983) Graph traversal techniques and the maximum flow problem in distributed computation. IEEE Trans Software Eng 4:504–512
Article Google Scholar
Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM (JACM) 21(1):168–173
Article MathSciNet Google Scholar
Matoušek J, Thomas R (1992) On the complexity of finding iso-and other morphisms for partial k-trees. Discrete Math 108(1–3):343–364
Article MathSciNet Google Scholar
Torsello A, Hancock ER (2006) Learning shape-classes using a mixture of tree-unions. IEEE Trans Pattern Anal Mach Intell 28(6):954–967
Article Google Scholar
Torsello A, Rossi L (2011) Supervised learning of graph structure. In: International Workshop on Similarity-Based Pattern Recognition, pages 117–132. Springer
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Iloga S, Romain O, Tchuenté M (2020) An efficient generic approach for automatic taxonomy generation using HMMs. Pattern Anal Appl 1–22
Falkhausen M, Reininger H, Wolf D (1995) Calculation of distance measures between hidden markov models. In: Fourth European Conference on Speech Communication and Technology
Do MN (2003) Fast approximation of kullback-leibler distance for dependence trees and hidden markov models. IEEE Signal Process Lett 10(4):115–118
Article Google Scholar
Silva J, Narayanan S (2008) Upper bound kullback-leibler divergence for transient hidden markov models. IEEE Trans Signal Process 56(9):4176–4188
Article MathSciNet Google Scholar
Lyngso RB, Pedersen CN, Nielsen H (1999) Metrics and similarity measures for hidden markov models. In: Proc Int Conf Intell Syst Mol Biol, pages 178–186
Zeng J, Duan J, Chengrong W (2010) A new distance measure for hidden markov models. Expert Syst Appl 37(2):1550–1555
Article Google Scholar
Iloga S, Romain O, Tchuenté M (2018) An accurate hmm-based similarity measure between finite sets of histograms. Pattern Anal Appl 1–26
Sahraeian SME, Yoon B-J (2011) A novel low-complexity hmm similarity measure. IEEE Signal Process Lett 18(2):87–90
Article Google Scholar
Huang A (2008) Similarity measures for text document clustering. In: Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand, pages 49–56
Nothman J, Qin H, Yurchak R (2018) Stop word lists in free open-source software packages. In: Proceedings of Workshop for NLP Open Source Software (NLP-OSS), pages 7–12
Rico-Juan JR, Micó L (2003) Some results about the use of tree/string edit distances in a\(^\sim\) nearest neighbour classification task. In: Iberian Conference on Pattern Recognition and Image Analysis, pages 821–828. Springer
Noussi JBB, Tchendji MT, Iloga S (2019) Parallel hmm-based similarity between finite sets of histograms. http://cri-info.cm/?page_id=148
Espinosa-Manzo ALA, Arias-Estrada MO (2001) Implementing hidden markov models in a hardware architecture. In: Proceedings of the International Meeting of Computer Science (ENC’01), Aguascalientes, Mexico, volume II, pages 1007–1016

Download references

Author information

Authors and Affiliations

Department of Computer Science, Higher Teachers’ Training College, University of Maroua, P.O.box 55, Maroua, Cameroon
Sylvain Iloga
CY Cergy Paris University, ENSEA, CNRS, ETIS UMR 8051, 95000, Cergy, France
Sylvain Iloga
University of Sorbonne, IRD, UMMISCO, 93143, Bondy, France
Sylvain Iloga

Authors

Sylvain Iloga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sylvain Iloga.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iloga, S. Customizable HMM-based measures to accurately compare tree sets. Pattern Anal Applic 24, 1149–1171 (2021). https://doi.org/10.1007/s10044-021-00971-3

Download citation

Received: 14 August 2020
Accepted: 01 March 2021
Published: 31 March 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10044-021-00971-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Customizable HMM-based measures to accurately compare tree sets

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Clustering graph data: the roadmap to spectral techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Customizable HMM-based measures to accurately compare tree sets

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Clustering graph data: the roadmap to spectral techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation