Abstract
Location-based services have become widely available on a variety of devices. Due to the errors in user input as well as geo-textual databases, supporting error-tolerant spatio-textual search becomes an important problem in the field of spatial keyword search. Edit distance is the most widely used metrics to capture typographical errors. However, existing techniques for spatio-textual similarity query mainly focused on the set based textual relevance, but they cannot work well for edit distance due to the lack of filter power, which would involve larger overhead of computing edit distance. In this paper, we propose a novel framework to solve the region aware top-\(k\) similarity search problem with edit distance constraint. We first propose a hierarchical index structure to capture signatures of both spatial and textual relevance. We then utilize the prefix filter techniques to support top-\(k\) similarity search on the index. We further propose an estimation based method and a greedy search algorithm to make full use of the filter power of the hierarchical index. Experimental results on real world POI datasets show that our method outperforms state-of-the-art methods by up to two orders of magnitude.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The uniformly distribution used in the \(i^{th}\) time is \(U^i\) where \(i \in [1, K]\).
- 2.
References
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations (extended abstract). In: STOC, pp. 327–336 (1998)
Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE, p. 5 (2006)
Chen, L., Cong, G., Cao, X.: An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD, pp. 749–760 (2013)
Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. PVLDB 6(3), 217–228 (2013)
Cong, G., Jensen, C.S.: Querying geo-textual data: spatial keyword queries and beyond. In: SIGMOD, pp. 2207–2212 (2016)
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1), 337–348 (2009)
Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: ICDE, pp. 656–665 (2008)
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: VLDB, pp. 491–500 (2001)
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE, pp. 257–266 (2008)
Li, G., Wang, Y., Wang, T., Feng, J.: Location-aware publish/subscribe. In: KDD, pp. 802–810 (2013)
Li, Z., Lee, K.C.K., Zheng, B., Lee, W., Lee, D.L., Wang, X.: IR-tree: an efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)
Mazeika, A., Böhlen, M.H., Koudas, N., Srivastava, D.: Estimating the selectivity of approximate string queries. ACM Trans. Database Syst. 32(2), 12 (2007)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvåg, K.: Efficient processing of top-k spatial keyword queries. In: SSTD, pp. 205–222 (2011)
Wang, J., Li, G., Deng, D., Zhang, Y., Feng, J.: Two birds with one stone: an efficient hierarchical framework for top-k and threshold-based string similarity search. In: ICDE, pp. 519–530 (2015)
Wang, J., Lin, C., Li, M., Zaniolo, C.: An efficient sliding window approach for approximate entity extraction with synonyms. In: EDBT (2019)
Wang, X., Ding, X., Tung, A.K.H., Zhang, Z.: Efficient and effective KNN sequence search with approximate n-grams. PVLDB 7(1), 1–12 (2013)
Wu, J., Zhang, Y., Wang, J., Lin, C., Fu, Y., Xing, C.: A scalable framework for metric similarity join using mapreduce. In: ICDE (2019)
Yang, Z., Yu, J., Kitsuregawa, M.: Fast algorithms for top-k approximate string matching. In: AAAI (2010)
Yao, B., Li, F., Hadjieleftheriou, M., Hou, K.: Approximate string search in spatial databases. In: ICDE, pp. 545–556 (2010)
Zhang, C., Zhang, Y., Zhang, W., Lin, X.: Inverted linear quadtree: efficient top K spatial keyword search. In: ICDE, pp. 901–912 (2013)
Zhang, D., Tan, K.L., Tung, A.K.H.: Scalable top-k spatial keyword search. In: EDBT, pp. 359–370 (2013)
Zhang, Y., Li, X., Wang, J., Zhang, Y., Xing, C., Yuan, X.: An efficient framework for exact set similarity search using tree structure indexes. In: ICDE, pp. 759–770 (2017)
Zhang, Y., Wu, J., Wang, J., Xing, C.: A transformation-based framework for KNN set similarity search. IEEE Trans. Knowl. Data Eng. (2019)
Acknowledgement
This work was supported by NSFC (91646202), National Key R&D Program of China (SQ2018YFB140235), and the 1000-Talent program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, J., Zhang, Y., Hu, H., Xing, C. (2019). A Hierarchical Index Structure for Region-Aware Spatial Keyword Search with Edit Distance Constraint. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11447. Springer, Cham. https://doi.org/10.1007/978-3-030-18579-4_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-18579-4_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18578-7
Online ISBN: 978-3-030-18579-4
eBook Packages: Computer ScienceComputer Science (R0)