Skip to main content

Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 868))

Included in the following conference series:

Abstract

Full-text search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity full-text search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance full-text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of search query execution is 44–45 times less than that required when using ordinary inverted indexes.

The work was supported by Act 211 Government of the Russian Federation, contract № 02.A03.21.0006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yan, H., Shi, S., Zhang, F., Suel, T., Wen, J.R.: Efficient term proximity search with term-pair indexes. In: CIKM 2010 Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010, pp. 1229–1238 (2010)

    Google Scholar 

  2. Buttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: SIGIR 2006, pp. 621–622 (2006)

    Google Scholar 

  3. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006). Article 6

    Article  Google Scholar 

  4. Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. In: SIGMOD 1994 Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, 24–27 May 1994, pp. 289–300 (1994)

    Google Scholar 

  5. Zipf, G.: Relative frequency as a determinant of phonetic change. Harvard studies in classical philology. 40, 1–95 (1929)

    Article  Google Scholar 

  6. Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Trans. Inf. Syst. (TOIS) 22(4), 573–594 (2004)

    Article  Google Scholar 

  7. Schenkel, R., Broschart, A., Hwang, S., Theobald, M., Weikum, G.: Efficient text proximity search. In: String Processing and Information Retrieval, 14th International Symposium, SPIRE 2007. Lecture Notes in Computer Science, vol. 4726, Santiago de Chile, Chile, 29–31 October 2007, pp. 287–299. Springer, Heidelberg (2007)

    Google Scholar 

  8. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International Conference on World Wide Web (WWW 1998) (1998)

    Article  Google Scholar 

  9. Veretennikov, A.B.: O poiske fraz i naborov slov v polnotekstovom indekse [About phrases search in full-text index]. Sistemy upravleniya i informatsionnye tekhnologii [Control systems and information technologies], 48(2.1), 125–130 (2012). in Russian

    Google Scholar 

  10. Veretennikov, A.B.: Ispol’zovanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova [Using additional indexes for fast full-text searching phrases that contains frequently used words]. Sistemy upravleniya i informatsionnye tekhnologii [Control Systems and Information Technologies]. 52(2), 61–66 (2013). In Russian

    MathSciNet  Google Scholar 

  11. Veretennikov, A.B.: Effektivnyi polnotekstovyi poisk s ispol’zovaniem dopolnitel’nykh indeksov chasto vstrechayushchikhsya slov [Efficient full-text search by means of additional indexes of frequently used words]. Sistemy upravleniya i informatsionnye tekhnologii [Control Systems and Information Technologies]. 66(4), 52–60 (2016). in Russian

    Google Scholar 

  12. Veretennikov, A.B.: Sozdanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova [Creating additional indexes for fast full-text searching phrases that contains frequently used words]. Sistemy upravleniya i informatsionnye tekhnologii [Control systems and information technologies]. 63(1), 27–33 (2016). in Russian

    Google Scholar 

  13. Veretennikov, A.B.: O strukture legko obnovlyaemykh polnotekstovykh indeksov [About a structure of easy updatable full-text indexes]. Sovremennye problemy matematiki i ee prilozhenii. Trudy Mezhdunarodnoi (48-i Vserossiiskoi) molodezhnoi shkoly-konferentsii} [Proceedings of the 48th International Youth School-Conference “Modern Problems in Mathematics and its Applications”], pp. 30–41 (2017). http://ceur-ws.org/Vol-1894/

  14. Bahle, D., Williams, H.E., Zobel, J.: Efficient phrase querying with an auxiliary index. In: SIGIR 2002 Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 11–15 August 2002, pp. 215–221 (2002)

    Google Scholar 

  15. Chang, M., Poon, C.K.: Efficient phrase querying with common phrase index. In: ECIR 2006. LNCS, vol. 3936, pp. 61–71. Springer, Heidelberg (2006)

    Google Scholar 

  16. Gugnani, S., Roul, R.K.: Triple indexing: an efficient technique for fast phrase query evaluation. Int. J. Comput. Appl. 87(13), 9–13 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander B. Veretennikov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Veretennikov, A.B. (2019). Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_66

Download citation

Publish with us

Policies and ethics