Skip to main content
Log in

Clustering Enhanced Error-tolerant Top-k Spatio-textual Search

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

There are a large amount of Location-Based Services widely available on a variety of portable electronic devices. It is critical for them to efficiently support top-kquery considering both spatial and textual relevance. Considering both the errors in user input and the spatial databases, it is necessary to support error-tolerant spatio-textual search for end-users. Previous researches mainly focused on set-based textual relevance, which makes it difficult for them to find reasonable results when the input tokens are not exactly matched with those from the records in spatial database. We design a novel framework to support top-kspatio-textual search with fuzzy token matching. A hierarchical index is proposed to capture signatures of both spatial and textual relevance. Based on it, we devise two algorithms to preferentially access the nodes with more similar objects while those with dissimilar ones can be pruned. We further propose a clustering based approach to construct the index by leveraging textual information. We conduct extensive experiments on real world POI datasets, and the results show that our framework outperforms state-of-the-art methods by a significant margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

Notes

  1. http://www.openstreetmap.org/

References

  1. Arasu, A., Ganti, V., and Kaushik, R.. Efficient exact set-similarity joins. In VLDB, pages 918–929, (2006)

  2. Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. PVLDB. 6(1), 1–12 (2012)

    Google Scholar 

  3. Chaudhuri, S., Ganti, V., and Kaushik, R.. A primitive operator for similarity joins in data cleaning. In ICDE, page 5, (2006)

  4. Chen, L. and Cong, G.. Diversity-aware top-k publish/subscribe for text stream. In SIGMOD, pages 347–362, (2015)

  5. Chen, L., Cong, G., Cao, X.. An efficient query indexing mechanism for filtering geo-textual data. In SIGMOD, pages 749–760, (2013)

  6. Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: An experimental evaluation. PVLDB. 6(3), 217–228 (2013)

    Google Scholar 

  7. Chen, L., Cong, G., Cao, X., Tan, K.. Temporal spatial-keyword top-k publish/subscribe. In ICDE, pages 255–266, (2015)

  8. Chen, L., Shang, S., Zheng, K., and Kalnis, P.. Cluster-based subscription matching for geo-textual data streams. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 890–901, (2019)

  9. Chen, X., Xu, J., Zhou, R., Zhao, P., Liu, C., Fang, J., Zhao, L.. S2r-tree: a pivot-based indexing structure for semantic-aware spatial keyword search. GeoInformatica, 07 (2019

  10. Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., and Suel, T.. Text vs. space: efficient geo-search query processing. In CIKM, pages 423–432, (2011)

  11. Cong, G., Jensen, C.S.. Querying geo-textual data: Spatial keyword queries and beyond. In SIGMOD, pages 2207–2212, (2016)

  12. Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB. 2(1), 337–348 (2009)

    Google Scholar 

  13. Deng, D., Li, G., Feng, J., and Li, W.. Top-k string similarity search with edit-distance constraints. In ICDE, pages 925–936, (2013)

  14. Felipe, I.D., Hristidis, V., Rishe, N.. Keyword search on spatial databases. In ICDE, pages 656–665, (2008)

  15. Feng, K., Cong, G., Bhowmick, S.S., Peng, W., Miao, C.. Towards best region search for data exploration. In SIGMOD, pages 1055–1070, (2016)

  16. Guo, T., Feng, K., Cong, G., Bao, Z.. Efficient selection of geospatial data on maps for interactive and visualized exploration. In SIGMOD, pages 567–582, (2018)

  17. Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.. A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In ICDE, pages 711–722, (2015)

  18. Hu, H., Li, G., Bao, Z., Feng, J., Wu, Y., Gong, Z., Xu, Y.: Top-k spatio-textual similarity join. IEEE Trans. Knowl. Data Eng. 28(2), 551–565 (2016)

    Article  Google Scholar 

  19. Hu, S., Xiao, C., Ishikawa, Y.: An efficient algorithm for location-aware query autocompletion. IEICE Trans. Inf. Syst. 101-D(1), 181–192 (2018)

    Article  Google Scholar 

  20. Li, C., Lu, J., and Lu, Y.. Efficient merging and filtering algorithms for approximate string searches. In ICDE, pages 257–266, (2008)

  21. Li, G., Deng, D., Wang, J., Feng, J.: PASS-JOIN: A partition-based method for similarity joins. PVLDB. 5(3), 253–264 (2011)

    Google Scholar 

  22. Li, Z., Lee, K.C.K., Zheng, B., Lee, W., Lee, D.L., Wang, X.: Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)

    Article  Google Scholar 

  23. Li, G., Wang, Y., Wang, T., Feng, J.. Location-aware publish/subscribe. In KDD, pages 802–810, (2013)

  24. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  25. Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvag, K.. Efficient processing of top-k spatial keyword queries. In SSTD, pages 205–222, (2011)

  26. Song, T., Xu, K., Li, J., Li, Y., and Tong, Y.. Multi-skill aware task assignment in real-time spatial crowdsourcing. GeoInformatica, 24, 04 (2019)

  27. Wang, J. and Lin, C.. Fast error-tolerant location-aware query autocompletion. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20–24, 2020, pages 1998–2001. IEEE, (2020)

  28. Wang, J., Li, G., Feng, J.. Fast-join: An efficient method for fuzzy token matching based string similarity join. In ICDE, pages 458–469, (2011)

  29. Wang, X., Ding, X., Tung, A.K.H., Zhang, Z.: Efficient and effective KNN sequence search with approximate n-grams. PVLDB. 7(1), 1–12 (2013)

    Google Scholar 

  30. Wang, J., Li, G., Deng, D., Zhang, Y., and Feng, J.. Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search. In ICDE, pages 519–530, (2015)

  31. Wang, X., Zhang, Y., Zhang, W., Lin, X., Huang, Z.: SKYPE: top-k spatial-keyword publish/subscribe over sliding window. PVLDB. 9(7), 588–599 (2016)

    Google Scholar 

  32. Wang, J., Lin, C., Li, M., and Zaniolo, C.. An efficient sliding window approach for approximate entity extraction with synonyms. In EDBT, pages 109–120, (2019).

  33. Wang, J., Lin, C., Zaniolo, C.. Mf-join: Efficient fuzzy string similarity join with multi-level filtering. In ICDE, pages 386–397, (2019)

  34. Wu, D., Yiu, M.L., Cong, G., Jensen, C.S.: Joint top-k spatial keyword query processing. IEEE Trans. Knowl. Data Eng. 24(10), 1889–1903 (2012)

    Article  Google Scholar 

  35. Xiao, C., Wang, W., Lin, X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB. 1(1), 933–944 (2008)

    MathSciNet  Google Scholar 

  36. Xiao, C., Wang, W., Lin, X., Yu, J.X.. Efficient similarity joins for near duplicate detection. In WWW, pages 131–140, (2008)

  37. Yang, Z., Yu, J., Kitsuregawa, M.. Fast algorithms for top-k approximate string matching. In AAAI, (2010)

  38. Yang, C., Chen, L., Shang, S., Zhu, F., Liu, L., Shao, L.. Toward efficient navigation of massive-scale geo-textual streams. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 4838–4845. International Joint Conferences on Artificial Intelligence Organization, 72, (2019)

  39. Yang, J., Zhang, Y., Zhou, X., Wang, J., Hu, H., and Xing, C.. A hierarchical framework for top-k location-aware error-tolerant keyword search. In ICDE, pages 986–997, (2019)

  40. Yao, B., Li, F., Hadjieleftheriou, M., Hou, K.. Approximate string search in spatial databases. In ICDE, pages 545–556, (2010)

  41. C. Zhang, Y. Zhang, W. Zhang, and X. Lin. Inverted linear quadtree: Efficient top k spatial keyword search. In ICDE, pages 901–912, 2013.

  42. Zhang, D., Tan, K.-L., Tung, A.K.H.. Scalable top-k spatial keyword search. In EDBT, pages 359–370, 2013.

  43. Zhang, Y., Li, X., Wang, J., Zhang, Y., Xing, C., and Yuan, X.. An efficient framework for exact set similarity search using tree structure indexes. In ICDE, pages 759–770, 2017.

  44. Zhang, Y., Wu, J., Wang, J., Xing, C.. A transformation-based framework for knn set similarity search. IEEE Trans. Knowl. Data Eng., (2019)

  45. Zhao, J, Gao, Y., Chen, G., Jensen, C.S., Chen, R., Cai, D.. Reverse top-k geo-social keyword queries in road networks. In ICDE, pages 387–398, (2017)

  46. Zheng, K., Su, H., Zheng, B., Shang, S., Xu, J., Liu, J., and X. Zhou. Interactive top-k spatial keyword queries. In ICDE, pages 423–434, (2015)

  47. Zheng, Y., Bao, Z., Shou, L., Tung, A.K.H.: INSPIRE: A framework for incremental spatial prefix query relaxation. IEEE Trans. Knowl. Data Eng. 27(7), 1949–1963 (2015)

    Article  Google Scholar 

  48. B. Zheng, K. Zheng, X. Xiao, H. Su, H. Yin, X. Zhou, and G. Li. Keyword-aware continuous knn query on road networks. In ICDE, pages 871–882, 2016.

Download references

Acknowledgments

This work was supported by National Key R&D Program of China(2020AAA0109603), State Key Laboratory of Computer Architecture (ICT,CAS) under Grant No. CARCHA202008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Chen, Y., Yang, J. et al. Clustering Enhanced Error-tolerant Top-k Spatio-textual Search. World Wide Web 24, 1185–1214 (2021). https://doi.org/10.1007/s11280-021-00883-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-00883-6

Keywords

Navigation