Abstract
There are a large amount of Location-Based Services widely available on a variety of portable electronic devices. It is critical for them to efficiently support top-kquery considering both spatial and textual relevance. Considering both the errors in user input and the spatial databases, it is necessary to support error-tolerant spatio-textual search for end-users. Previous researches mainly focused on set-based textual relevance, which makes it difficult for them to find reasonable results when the input tokens are not exactly matched with those from the records in spatial database. We design a novel framework to support top-kspatio-textual search with fuzzy token matching. A hierarchical index is proposed to capture signatures of both spatial and textual relevance. Based on it, we devise two algorithms to preferentially access the nodes with more similar objects while those with dissimilar ones can be pruned. We further propose a clustering based approach to construct the index by leveraging textual information. We conduct extensive experiments on real world POI datasets, and the results show that our framework outperforms state-of-the-art methods by a significant margin.
Similar content being viewed by others
References
Arasu, A., Ganti, V., and Kaushik, R.. Efficient exact set-similarity joins. In VLDB, pages 918–929, (2006)
Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. PVLDB. 6(1), 1–12 (2012)
Chaudhuri, S., Ganti, V., and Kaushik, R.. A primitive operator for similarity joins in data cleaning. In ICDE, page 5, (2006)
Chen, L. and Cong, G.. Diversity-aware top-k publish/subscribe for text stream. In SIGMOD, pages 347–362, (2015)
Chen, L., Cong, G., Cao, X.. An efficient query indexing mechanism for filtering geo-textual data. In SIGMOD, pages 749–760, (2013)
Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: An experimental evaluation. PVLDB. 6(3), 217–228 (2013)
Chen, L., Cong, G., Cao, X., Tan, K.. Temporal spatial-keyword top-k publish/subscribe. In ICDE, pages 255–266, (2015)
Chen, L., Shang, S., Zheng, K., and Kalnis, P.. Cluster-based subscription matching for geo-textual data streams. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 890–901, (2019)
Chen, X., Xu, J., Zhou, R., Zhao, P., Liu, C., Fang, J., Zhao, L.. S2r-tree: a pivot-based indexing structure for semantic-aware spatial keyword search. GeoInformatica, 07 (2019
Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., and Suel, T.. Text vs. space: efficient geo-search query processing. In CIKM, pages 423–432, (2011)
Cong, G., Jensen, C.S.. Querying geo-textual data: Spatial keyword queries and beyond. In SIGMOD, pages 2207–2212, (2016)
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB. 2(1), 337–348 (2009)
Deng, D., Li, G., Feng, J., and Li, W.. Top-k string similarity search with edit-distance constraints. In ICDE, pages 925–936, (2013)
Felipe, I.D., Hristidis, V., Rishe, N.. Keyword search on spatial databases. In ICDE, pages 656–665, (2008)
Feng, K., Cong, G., Bhowmick, S.S., Peng, W., Miao, C.. Towards best region search for data exploration. In SIGMOD, pages 1055–1070, (2016)
Guo, T., Feng, K., Cong, G., Bao, Z.. Efficient selection of geospatial data on maps for interactive and visualized exploration. In SIGMOD, pages 567–582, (2018)
Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.. A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In ICDE, pages 711–722, (2015)
Hu, H., Li, G., Bao, Z., Feng, J., Wu, Y., Gong, Z., Xu, Y.: Top-k spatio-textual similarity join. IEEE Trans. Knowl. Data Eng. 28(2), 551–565 (2016)
Hu, S., Xiao, C., Ishikawa, Y.: An efficient algorithm for location-aware query autocompletion. IEICE Trans. Inf. Syst. 101-D(1), 181–192 (2018)
Li, C., Lu, J., and Lu, Y.. Efficient merging and filtering algorithms for approximate string searches. In ICDE, pages 257–266, (2008)
Li, G., Deng, D., Wang, J., Feng, J.: PASS-JOIN: A partition-based method for similarity joins. PVLDB. 5(3), 253–264 (2011)
Li, Z., Lee, K.C.K., Zheng, B., Lee, W., Lee, D.L., Wang, X.: Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)
Li, G., Wang, Y., Wang, T., Feng, J.. Location-aware publish/subscribe. In KDD, pages 802–810, (2013)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvag, K.. Efficient processing of top-k spatial keyword queries. In SSTD, pages 205–222, (2011)
Song, T., Xu, K., Li, J., Li, Y., and Tong, Y.. Multi-skill aware task assignment in real-time spatial crowdsourcing. GeoInformatica, 24, 04 (2019)
Wang, J. and Lin, C.. Fast error-tolerant location-aware query autocompletion. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20–24, 2020, pages 1998–2001. IEEE, (2020)
Wang, J., Li, G., Feng, J.. Fast-join: An efficient method for fuzzy token matching based string similarity join. In ICDE, pages 458–469, (2011)
Wang, X., Ding, X., Tung, A.K.H., Zhang, Z.: Efficient and effective KNN sequence search with approximate n-grams. PVLDB. 7(1), 1–12 (2013)
Wang, J., Li, G., Deng, D., Zhang, Y., and Feng, J.. Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search. In ICDE, pages 519–530, (2015)
Wang, X., Zhang, Y., Zhang, W., Lin, X., Huang, Z.: SKYPE: top-k spatial-keyword publish/subscribe over sliding window. PVLDB. 9(7), 588–599 (2016)
Wang, J., Lin, C., Li, M., and Zaniolo, C.. An efficient sliding window approach for approximate entity extraction with synonyms. In EDBT, pages 109–120, (2019).
Wang, J., Lin, C., Zaniolo, C.. Mf-join: Efficient fuzzy string similarity join with multi-level filtering. In ICDE, pages 386–397, (2019)
Wu, D., Yiu, M.L., Cong, G., Jensen, C.S.: Joint top-k spatial keyword query processing. IEEE Trans. Knowl. Data Eng. 24(10), 1889–1903 (2012)
Xiao, C., Wang, W., Lin, X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB. 1(1), 933–944 (2008)
Xiao, C., Wang, W., Lin, X., Yu, J.X.. Efficient similarity joins for near duplicate detection. In WWW, pages 131–140, (2008)
Yang, Z., Yu, J., Kitsuregawa, M.. Fast algorithms for top-k approximate string matching. In AAAI, (2010)
Yang, C., Chen, L., Shang, S., Zhu, F., Liu, L., Shao, L.. Toward efficient navigation of massive-scale geo-textual streams. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 4838–4845. International Joint Conferences on Artificial Intelligence Organization, 72, (2019)
Yang, J., Zhang, Y., Zhou, X., Wang, J., Hu, H., and Xing, C.. A hierarchical framework for top-k location-aware error-tolerant keyword search. In ICDE, pages 986–997, (2019)
Yao, B., Li, F., Hadjieleftheriou, M., Hou, K.. Approximate string search in spatial databases. In ICDE, pages 545–556, (2010)
C. Zhang, Y. Zhang, W. Zhang, and X. Lin. Inverted linear quadtree: Efficient top k spatial keyword search. In ICDE, pages 901–912, 2013.
Zhang, D., Tan, K.-L., Tung, A.K.H.. Scalable top-k spatial keyword search. In EDBT, pages 359–370, 2013.
Zhang, Y., Li, X., Wang, J., Zhang, Y., Xing, C., and Yuan, X.. An efficient framework for exact set similarity search using tree structure indexes. In ICDE, pages 759–770, 2017.
Zhang, Y., Wu, J., Wang, J., Xing, C.. A transformation-based framework for knn set similarity search. IEEE Trans. Knowl. Data Eng., (2019)
Zhao, J, Gao, Y., Chen, G., Jensen, C.S., Chen, R., Cai, D.. Reverse top-k geo-social keyword queries in road networks. In ICDE, pages 387–398, (2017)
Zheng, K., Su, H., Zheng, B., Shang, S., Xu, J., Liu, J., and X. Zhou. Interactive top-k spatial keyword queries. In ICDE, pages 423–434, (2015)
Zheng, Y., Bao, Z., Shou, L., Tung, A.K.H.: INSPIRE: A framework for incremental spatial prefix query relaxation. IEEE Trans. Knowl. Data Eng. 27(7), 1949–1963 (2015)
B. Zheng, K. Zheng, X. Xiao, H. Su, H. Yin, X. Zhou, and G. Li. Keyword-aware continuous knn query on road networks. In ICDE, pages 871–882, 2016.
Acknowledgments
This work was supported by National Key R&D Program of China(2020AAA0109603), State Key Laboratory of Computer Architecture (ICT,CAS) under Grant No. CARCHA202008.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Y., Chen, Y., Yang, J. et al. Clustering Enhanced Error-tolerant Top-k Spatio-textual Search. World Wide Web 24, 1185–1214 (2021). https://doi.org/10.1007/s11280-021-00883-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-021-00883-6