GeoInformatica

, Volume 16, Issue 3, pp 563–596 | Cite as

SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search

Article

Abstract

There is a significant commercial and research interest in location-based web search engines. Given a number of search keywords and one or more locations (geographical points) that a user is interested in, a location-based web search retrieves and ranks the most textually and spatially relevant web pages. In this type of search, both the spatial and textual information should be indexed. Currently, no efficient index structure exists that can handle both the spatial and textual aspects of data simultaneously and accurately. Existing approaches either index space and text separately or use inefficient hybrid index structures with poor performance and inaccurate results. Moreover, most of these approaches cannot accurately rank web-pages based on a combination of space and text and are not easy to integrate into existing search engines. In this paper, we propose a new index structure called Spatial-Keyword Inverted File for Points to handle point-based indexing of web documents in an integrated/efficient manner. To seamlessly find and rank relevant documents, we develop a new distance measure called spatial tf-idf. We propose four variants of spatial-keyword relevance scores and two algorithms to perform top-k searches. As verified by experiments, our proposed techniques outperform existing index structures in terms of search performance and accuracy.

Keywords

Geographical search Spatial databases Indexing Ranking Query processing Information retrieval 

Notes

Acknowledgements

Ali Khodaei and Cyrus Shahabi’s research has been funded in part by NSF grants CNS-0831505 (CyberTrust) and IS-1115153, the USC Integrated Media Systems Center (IMSC), and unrestricted cash and equipment gifts from Google, Microsoft and Qualcomm. Chen Li is partially supported by the US NSF IIS 1030002 award and the National Natural Science Foundation of China (No. 61129002). Any opinions,findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

References

  1. 1.
    Khodaei A, Shahabi C, Li C (2010) Hybrid indexing and seamless ranking of spatial and textual features of web documents. In: DEXA, pp 450–466Google Scholar
  2. 2.
    Zhou Y, Xie X, Wang C, Gong Y, Ma W-Y (2005) Hybrid index structures for location-based web search. In: CIKM, pp 155–162Google Scholar
  3. 3.
    Hariharan R, Hore B, Li C, Mehrotra S (2007) Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In: SSDBM, p 16Google Scholar
  4. 4.
    De Felipe I, Hristidis V, Rishe N (2008) Keyword search on spatial databases. In: ICDEGoogle Scholar
  5. 5.
    Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2):6CrossRefGoogle Scholar
  6. 6.
    Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, ReadingGoogle Scholar
  7. 7.
    Chen Y, Suel T, Markowetz A (2006) Efficient query processing in geographic web search engines. In: SIGMOD, pp 277–288Google Scholar
  8. 8.
    McCurley KS (2001) Geospatial mapping and navigation of the web. In: WWW, pp 221–229Google Scholar
  9. 9.
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Readings in information retrieval. Morgan Kaufmann Publishers IncGoogle Scholar
  10. 10.
    Cong G, Jensen CS, Wu D (2009) Efficient retrieval of the top-k most relevant spatial web objects. In: Proc. VLDB endow. 2, 1 (August 2009), pp 337–348Google Scholar
  11. 11.
    Vaid S, Jones CB, Joho H, Sanderson M (2005) Spatio-textual indexing for geographical search on the web. In: SSTDGoogle Scholar
  12. 12.
    Amitay E, HarEl N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In: SIGIR, pp 273–280Google Scholar
  13. 13.
    Ding J, Gravano L, Shivakumar N (2000) Computing geographical scopes of web resources. In: VLDB, pp 545–556Google Scholar
  14. 14.
    Gao W, Lee HC, Miao Y (2006) Geographically focused collaborative crawling. In: WWWGoogle Scholar
  15. 15.
    Zobel J (1995) Adding compression to a full-text retrieval system. Softw Pract Exp 25(8):891–903CrossRefGoogle Scholar
  16. 16.
    Haveliwala T (2002) Topic-sensitive PageRank. In: WWWGoogle Scholar
  17. 17.
    Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeGoogle Scholar
  18. 18.
    Jeong B-S, Omiecinski E (1995) Inverted file partitioning schemes in multiple disk systems. IEEE Trans Parallel Distrib Syst 6:2Google Scholar
  19. 19.
    Alsubaiee S, Behm A, Chen L (2010) Supporting location-based approximate-keyword queries. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems (GIS ’10), pp 61–70Google Scholar
  20. 20.
    Wang Z, Du M, Le J (2009) gR*-tree: an index for querying approximate keywords in geographic information system. In: Information engineering and computer scienceGoogle Scholar
  21. 21.
    Zhang D, Chee YM, Mondal A, Tung AKH, Kitsuregawa M (2009) Keyword search in spatial databases: towards searching by document. In: ICDE, pp 688–699Google Scholar
  22. 22.
    Cormode R, Shkapenyuk V, Srivastava D, Xu B (2009) Forward decay: a practical time decay model for streaming systems. In: ICDEGoogle Scholar
  23. 23.
    Cohen E, Strauss MJ (2006) Maintaining time-decaying stream aggregates. J Algorithms 59:1CrossRefGoogle Scholar
  24. 24.
    Cao X, Cong G, Jensen CS (2010) Retrieving top-k prestige-based relevant spatial web objects. In: Proc. VLDB endow., vol 3, pp 1–2Google Scholar
  25. 25.
    Tobler W (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46:234–240CrossRefGoogle Scholar
  26. 26.
    Long X, Suel T (2005) Three-level caching for effcient query processing in large web search engines. In: WWW, pp 257–266Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.Department of Computer ScienceUniversity of California-IrvineIrvineUSA

Personalised recommendations