Efficient Similarity Search for Tree-Structured Data

Li, Guoliang; Liu, Xuhui; Feng, Jianhua; Zhou, Lizhu

doi:10.1007/978-3-540-69497-7_11

Guoliang Li¹,
Xuhui Liu¹,
Jianhua Feng¹ &
…
Lizhu Zhou¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5069))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1304 Accesses
4 Citations

Abstract

Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. Although similarity search on textual data has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the similarity between trees, especially for large numbers of tress. In this paper, we propose to transform tree-structured data into strings with a one-to-one mapping. We prove that the edit distance of the corresponding strings forms a bound for the similarity measures between trees, including tree edit distance, largest common subtrees and smallest common super-trees. Based on the theoretical analysis, we can employ any existing algorithm of approximate string search for effective similarity search on trees. Moreover, we embed the bound into a filter-and-refine framework for facilitating similarity search on tree-structured data. The experimental results show that our algorithm achieves high performance and outperforms state-of-the-art methods significantly. Our method is especially suitable for accelerating similarity query processing on large numbers of trees in massive datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A unified framework for string similarity search with edit-distance constraint

Article 17 December 2016

A Novel Method for Finding Similarities between Unordered Trees Using Matrix Data Model

Fast Similarity Search for Graphs by Edit Distance

Article 29 November 2019

References

Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB (2006)
Google Scholar
Augsten, N., Bohlen, M., Gamper, J.: Approximate matching of hierarchical data using pq-grams. In: VLDB (2005)
Google Scholar
Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: WWW (2007)
Google Scholar
Bille, P.: A survey on tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)
Article MATH MathSciNet Google Scholar
Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: SIGMOD (2003)
Google Scholar
Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE (2006)
Google Scholar
Gionis, A., Gunopulos, D., Koudas, N.: Efficient and tunable similar set retrieval. In: SIGMOD (2001)
Google Scholar
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: VLDB, pp. 491–500 (2001)
Google Scholar
Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate xml joins. In: SIGMOD (2002)
Google Scholar
Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: ICDE (2008)
Google Scholar
Kahveci, T., Singh, A.K.: Efficient index structures for string databases. In: VLDB (2001)
Google Scholar
Kailing, K., Kriegel, H.-P., Schonauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)
Google Scholar
Kim, M.-S., Whang, K.-Y., Lee, J.-G., Lee, M.-J.: n-gram/2l: A space and time efficient two-level n-gram inverted index structure. In: VLDB (2005)
Google Scholar
Klein, P.: Computing the edit-distance between unrooted ordered trees. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461. Springer, Heidelberg (1998)
Chapter Google Scholar
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE (2008)
Google Scholar
Li, C., Wang, B., Yang, X.: Vgram: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB (2007)
Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys, 31–88 (2001)
Google Scholar
Prufer, H.: Neuer beweis eines satzes uber permutationen. Archiv fur Mathematik und Physik 27, 142–144 (1918)
Google Scholar
Sahinalp, S.C., Tasan, M., Macker, J., Ozsoyoglu, Z.M.: Distance based indexing for string proximity search. In: ICDE (2003)
Google Scholar
Sarawagi, S., Kirpal, A.: Efficient set joins on similarity predicates. In: SIGMOD (2004)
Google Scholar
Seidl, T., Kriegel, H.-P.: Optimal multi-step k-nearest neighbor search. In: SIGMOD (1998)
Google Scholar
Tai, K.-C.: The tree-to-tree correction problem. Journal of the Association for Computing Machinery (JACM) 26, 422–433 (1979)
MATH MathSciNet Google Scholar
Ukkonen, E.: Approximate string matching with q-grams and maximal matches. Theor. Comput. Sci. 92(1), 191–211 (1992)
Article MATH MathSciNet Google Scholar
Yang, R., Kalnis, P., Tung, A.K.H.: Similarity evaluation on tree-structured data. In: SIGMOD (2005)
Google Scholar
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing 18, 1245–1262 (1989)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Guoliang Li, Xuhui Liu, Jianhua Feng & Lizhu Zhou

Authors

Guoliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuhui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Feng
View author publications
You can also search for this author in PubMed Google Scholar
Lizhu Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bertram Ludäscher Nikos Mamoulis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, G., Liu, X., Feng, J., Zhou, L. (2008). Efficient Similarity Search for Tree-Structured Data. In: Ludäscher, B., Mamoulis, N. (eds) Scientific and Statistical Database Management. SSDBM 2008. Lecture Notes in Computer Science, vol 5069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69497-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-69497-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69476-2
Online ISBN: 978-3-540-69497-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Similarity Search for Tree-Structured Data

Abstract

Access this chapter

Preview

Similar content being viewed by others

A unified framework for string similarity search with edit-distance constraint

A Novel Method for Finding Similarities between Unordered Trees Using Matrix Data Model

Fast Similarity Search for Graphs by Edit Distance

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient Similarity Search for Tree-Structured Data

Abstract

Access this chapter

Preview

Similar content being viewed by others

A unified framework for string similarity search with edit-distance constraint

A Novel Method for Finding Similarities between Unordered Trees Using Matrix Data Model

Fast Similarity Search for Graphs by Edit Distance

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation