Skip to main content
Log in

A framework for efficient spatial web object retrieval

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The conventional Internet is acquiring a geospatial dimension. Web documents are being geo-tagged and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables new kinds of queries that take into account both location proximity and text relevancy. This paper proposes a new indexing framework for top-k spatial text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within this framework. The framework encompasses algorithms that utilize the proposed indexes for computing location-aware as well as region-aware top-k text retrieval queries, thus taking into account both text relevancy and spatial proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper’s proposal is capable of excellent performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: SIGIR, pp. 273–280 (2004)

  2. Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: SIGIR, pp. 35–42 (2001)

  3. Baeza-Yates R., Ribeiro-Neto B.: Modern Information Retrieval. Addison Wesley, Reading, MA (1999)

    Google Scholar 

  4. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD, pp. 322–331 (1990)

  5. Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: ICDE, pp. 369–380 (2002)

  6. Cao, X., Cong, G., Jensen, C.S.: Retrieving top-k prestige-based relevant spatial web objects. In: PVLDB, pp. 373–384 (2010)

  7. Chen, Y.-Y., Suel, T., Markowetz, A.: Efficient query processing in geographic web search engines. In: SIGMOD, pp. 277–288 (2006)

  8. Cong G., Jensen C.S., Wu D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1), 337–348 (2009)

    Google Scholar 

  9. Cong, G., Wang, L., Lin, C.-Y., Song, Y.-I., Sun, Y.: Finding question-answer pairs from online forums. In: SIGIR, pp. 467–474 (2008)

  10. De Felipe, I., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: ICDE, pp. 656–665 (2008)

  11. Ding, J., Gravano, L., Shivakumar, N.: Computing geographical scopes of web resources. In: VLDB, pp. 545–556 (2000)

  12. Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  13. Faloutsos C., Christodoulakis S.: Signature files: an access method for documents and its analytical performance evaluation. ACM TODS 2(4), 267–288 (1984)

    Google Scholar 

  14. Faloutsos, C., Jagadish, H.V.: Hybrid index organizations for text databases. In: EDBT, pp. 310–327 (1992)

  15. Garey M.R., Johnson D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., San Francisco, CA (1979)

    MATH  Google Scholar 

  16. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)

  17. Hariharan, R., Hore, B., Li, C., Mehrotra, S.: Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In: SSDBM, p. 16 (2007)

  18. Hjaltason G.R., Samet H.: Distance browsing in spatial databases. ACM Trans. Database Syst. 24(2), 265–318 (1999)

    Article  Google Scholar 

  19. Hu H., Lee D.L.: Range nearest-neighbor query. IEEE Trans. Knowl. Data Eng. 18(1), 78–91 (2006)

    Article  MathSciNet  Google Scholar 

  20. Katayama, N., Satoh, S.: The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: SIGMOD, pp. 369–380 (1997)

  21. Khodaei, A., Shahabi, C., Li, C.: Hybrid indexing and seamless ranking of spatial and textual features of web documents. In: DEXA, pp. 450–466 (2010)

  22. Li Z., Lee K.C.K., Zheng B., Lee W.-C., Lee D., Wang X.: IR-tree: an efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)

    Article  Google Scholar 

  23. Lloyd S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  24. Martins, B., Silva, M.J., Andrade, L.: Indexing and ranking in geo-IR systems. In: GIR, pp. 31–34 (2005)

  25. McCurley, K.S.: Geospatial mapping and navigation of the web. In: WWW, pp. 221–229 (2001)

  26. Moffat, A., Zobel, J.: Coding for compression in full-text retrieval systems. In: Data Compression Conference, pp. 72–81 (1992)

  27. Persin M., Zobel J., Sacks-Davis R.: Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci. 47(10), 749–764 (1996)

    Article  Google Scholar 

  28. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR, pp. 275–281 (1998)

  29. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: TREC (1994)

  30. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD, pp. 71–79 (1995)

  31. Sanderson, M., Kohler, J.: Analyzing geographic queries. In: SIGIR Workshop on Geographic Information Retrieval (2004)

  32. Schnitzer, B., Leutenegger, S.: Master-client R-trees: a new parallel R-tree architecture. In: SSDBM, pp. 68–77 (1999)

  33. Strohman, T., Turtle, H., Croft, W.B.: Optimization strategies for complex queries. In: SIGIR, pp. 219–225 (2005)

  34. Vaid, S., Jones, C.B., Joho, H., Sanderson, M.: Spatio-textual indexing for geographical search on the web. In: SSTD, pp. 218–235 (2005)

  35. White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: ICDE, pp. 516–523 (1996)

  36. Zhai C., Lafferty J.: A study of smoothing methods for language models applied to information retrieval. ACM TOIS 22(2), 179–214 (2004)

    Article  Google Scholar 

  37. Zhang, D., Chee, Y.M., Mondal, A., Tung, A.K.H., Kitsuregawa, M.: Keyword search in spatial databases: Towards searching by document. In: ICDE, pp. 688–699 (2009)

  38. Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.-Y.: Hybrid index structures for location-based web search. In: CIKM, pp. 155–162 (2005)

  39. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), 1–56 (2006)

    Article  Google Scholar 

  40. Zobel J., Moffat A., Ramamohanarao K.: Inverted files versus signature files for text indexing. ACM TODS 23(4), 453–490 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dingming Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, D., Cong, G. & Jensen, C.S. A framework for efficient spatial web object retrieval. The VLDB Journal 21, 797–822 (2012). https://doi.org/10.1007/s00778-012-0271-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-012-0271-0

Keywords

Navigation