The VLDB Journal

, Volume 27, Issue 2, pp 225–244 | Cite as

Augmented keyword search on spatial entity databases

  • Dongxiang Zhang
  • Yuchen Li
  • Xin Cao
  • Jie Shao
  • Heng Tao ShenEmail author
Regular Paper


In this paper, we propose a new type of query that augments the spatial keyword search with an additional boolean expression constraint. The query is issued against a corpus of structured or semi-structured spatial entities and is very useful in applications like mobile search and targeted location-aware advertising. We devise three types of indexing and filtering strategies. First, we utilize the hybrid \(\hbox {IR}^2\)-tree and propose a novel hashing scheme for efficient pruning. Second, we propose an inverted index-based solution, named BE-Inv, that is more cache concious and exhibits great pruning power for boolean expression matching. Our third method, named SKB-Inv, adopts a novel two-level partitioning scheme to organize the spatial entities into inverted lists and effectively facilitate the pruning in the spatial, textual, and boolean expression dimensions. In addition, we propose an adaptive query processing strategy that takes into account the selectivity of query keywords and predicates for early termination. We conduct our experiments using two real datasets with 3.5 million Foursquare venues and 50 million Twitter geo-profiles. The results show that the methods based on inverted index are superior to the hybrid \(\hbox {IR}^2\)-tree; and SKB-Inv achieves the best performance.


Spatial keyword search Boolean expression matching IR-tree Inverted index Two-level partitioning 


  1. 1.
    Ahmed, J., Siyal, M.Y., Najam, S., Najam, Z.: Challenges and Issues in Modern Computer Architectures. Springer, Singapore, pp. 23–29 (2017).
  2. 2.
    Asadi, N., Lin, J.: Fast candidate generation for real-time tweet search with bloom filter chains. ACM Trans. Inf. Syst., p 13 (2013)Google Scholar
  3. 3.
    Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Garcia-Molina, H., Jagadish, H.V. (eds.), pp. 322–331. SIGMOD, ACM Press (1990)Google Scholar
  4. 4.
    Chakrabarti, K., Chaudhuri, S., Cheng, T., Xin, D.: A framework for robust discovery of entity synonyms. In: KDD, pp 1384–1392 (2012).
  5. 5.
    Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. PVLDB 6(3), 217–228 (2013)Google Scholar
  6. 6.
    Chen, Y.Y., Suel, T., Markowetz, A.: Efficient query processing in geographic web search engines. In: SIGMOD, pp. 277–288 (2006)Google Scholar
  7. 7.
    Cheng, T., Lauw, H.W., Paparizos, S.: Fuzzy matching of web queries to structured data. In: ICDE, pp. 713–716 (2010)
  8. 8.
    Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1), 337–348 (2009)Google Scholar
  9. 9.
    Ding, B., König, A.C.: Fast set intersection in memory. PVLDB 4(4), 255–266 (2011)Google Scholar
  10. 10.
    Fabret, F., Jacobsen, H.A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In: SIGMOD, pp. 115–126 (2001)Google Scholar
  11. 11.
    Faloutsos, C., Christodoulakis, S.: Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2(4), 267–288 (1984)CrossRefGoogle Scholar
  12. 12.
    Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: ICDE, pp. 656–665 (2008)Google Scholar
  13. 13.
    Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval on composite keys. Acta Inform. 4, 1–9 (1974)CrossRefzbMATHGoogle Scholar
  14. 14.
    Fontoura, M., Josifovski, V., Kumar, R., Olston, C., Tomkins, A., Vassilvitskii, S.: Relaxation in text search using taxonomies. PVLDB 1(1), 672–683 (2008)Google Scholar
  15. 15.
    Gaede, V., Günther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), 170–231 (1998)CrossRefGoogle Scholar
  16. 16.
    Gargantini, I.: An effective way to represent quadtrees. Commun. ACM 25(12), 905–910 (1982)CrossRefzbMATHGoogle Scholar
  17. 17.
    Hariharan, R., Hore, B., Li, C., Mehrotra, S.: Processing spatial-keyword (sk) queries in geographic information retrieval (gir) systems. In: SSDBM, p. 16 (2007)Google Scholar
  18. 18.
    Kolahdouzan, M.R., Shahabi, C.: Voronoi-based \(k\) nearest neighbor search for spatial network databases. In: VLDB, pp 840–851 (2004)Google Scholar
  19. 19.
    Lee, T., Park, J., Lee, S., Hwang, S., Elnikety, S., He, Y.: Processing and optimizing main memory spatial-keyword queries. PVLDB 9(3), 132–143 (2015)Google Scholar
  20. 20.
    Nievergelt, J., Hinterberger, H., Sevcik, K.C.: The grid file: an adaptable, symmetric multikey file structure. ACM Trans. Database Syst. 9(1), 38–71 (1984)CrossRefGoogle Scholar
  21. 21.
    Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient olap operations in spatial data warehouses. In: SSTD, pp. 443–459 (2001)Google Scholar
  22. 22.
    Parameswaran, A.G., Kaushik, R., Arasu, A.: Efficient parsing-based search over structured data. In: CIKM, pp. 49–58 (2013).
  23. 23.
    Rocha-Junior, J.B., Nørvåg, K.: Top-k spatial keyword queries on road networks. In: EDBT, pp. 168–179 (2012)Google Scholar
  24. 24.
    Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvåg, K.: Efficient processing of top-k spatial keyword queries. In: SSTD, pp. 205–222 (2011)Google Scholar
  25. 25.
    Roussopoulos, N., Kelley, S., Vincent, F.: SIGMOD. In: Carey, M.J., Schneider, D.A. (eds.) Nearest Neighbor Queries, pp. 71–79. ACM Press, New York (1995)Google Scholar
  26. 26.
    Sadoghi, M., Jacobsen, H.A.: Be-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space. In: SIGMOD, pp. 637–648 (2011)Google Scholar
  27. 27.
    Sharifzadeh, M., Shahabi, C.: Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. PVLDB 3(1), 1231–1242 (2010)Google Scholar
  28. 28.
    Wang, Y., Zhang, D., Liu, Q., Shen, F., Lee, L.H.: Towards enhancing the last-mile delivery: an effective crowd-tasking model with scalable solutions. Transp. Res. Part E: Logist. Transp. Rev. 93, 279–293 (2016)CrossRefGoogle Scholar
  29. 29.
    Whang, S., Brower, C., Shanmugasundaram, J., Vassilvitskii, S., Vee, E., Yerneni, R., Garcia-Molina, H.: Indexing boolean expressions. PVLDB 2(1), 37–48 (2009)Google Scholar
  30. 30.
    Wu, D., Yiu, M.L., Cong, G., Jensen, C.S.: Joint top-k spatial keyword query processing. TKDE 24(10), 1889–1903 (2012)Google Scholar
  31. 31.
    Xin, D., He, Y., Ganti, V.: Keyword++: a framework to improve keyword search over entity databases. PVLDB 3(1), 711–722 (2010)Google Scholar
  32. 32.
    Yan, T.W., Garcia-Molina, H.: Index structures for selective dissemination of information under the boolean model. ACM Trans. Database Syst. 19(2), 332–364 (1994)CrossRefGoogle Scholar
  33. 33.
    Yu, C., Ooi, B.C., Tan, K., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: VLDB, pp. 421–430 (2001)Google Scholar
  34. 34.
    Zhang, C., Zhang, Y., Zhang, W., Lin, X.: Inverted linear quadtree: efficient top \(k\) spatial keyword search. In: ICDE, pp. 901–912 (2013a)Google Scholar
  35. 35.
    Zhang, D., Chee, Y.M., Mondal, A., Tung, A.K.H., Kitsuregawa, M.: Keyword search in spatial databases: towards searching by document. In: ICDE, pp. 688–699 (2009)Google Scholar
  36. 36.
    Zhang, D., Ooi, B.C., Tung, A.K.H.: Locating mapped resources in web 2.0. In: ICDE, pp. 521–532 (2010)Google Scholar
  37. 37.
    Zhang, D., Tan, K.L., Tung, A.K.H.: Scalable top-k spatial keyword search. In: EDBT, pp. 359–370 (2013b)Google Scholar
  38. 38.
    Zhang, D., Chan, C., Tan, K.: An efficient publish/subscribe index for ecommerce databases. PVLDB 7(8), 613–624 (2014a)Google Scholar
  39. 39.
    Zhang, D., Chan, C., Tan, K.: Processing spatial keyword query as a top-k aggregation query. In: SIGIR, pp. 355–364 (2014b).
  40. 40.
    Zhang, P., Cheng, R., Mamoulis, N., Renz, M., Züfle, A., Tang, Y., Emrich, T.: Voronoi-based nearest neighbor search for multi-dimensional uncertain databases. In: ICDE, pp. 158–169 (2013c)Google Scholar
  41. 41.
    Zhong, R., Fan, J., Li, G., Tan, K-L., Zhou, L.: Location-aware instant search. In: CIKM, pp. 385–394 (2012)Google Scholar
  42. 42.
    Zhong, R., Li, G., Tan, K., Zhou, L.: G-tree: an efficient index for KNN search on road networks. In: CIKM, pp. 39–48 (2013)Google Scholar
  43. 43.
    Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.Y.: Hybrid index structures for location-based web search. In: CIKM, pp. 155–162 (2005)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Dongxiang Zhang
    • 1
  • Yuchen Li
    • 2
  • Xin Cao
    • 3
  • Jie Shao
    • 1
  • Heng Tao Shen
    • 1
    Email author
  1. 1.Center for Future Media and School of Computer Science and EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina
  2. 2.School of Information SystemsSingapore Management UniversitySingaporeSingapore
  3. 3.School of Computer Science and EngineeringThe University of New South WalesSydneyAustralia

Personalised recommendations