Advertisement

Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes

  • Alexander B. Veretennikov
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 868)

Abstract

Full-text search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity full-text search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance full-text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of search query execution is 44–45 times less than that required when using ordinary inverted indexes.

Keywords

Full-text search Search engines Inverted indexes Additional indexes Proximity search Term proximity 

References

  1. 1.
    Yan, H., Shi, S., Zhang, F., Suel, T., Wen, J.R.: Efficient term proximity search with term-pair indexes. In: CIKM 2010 Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010, pp. 1229–1238 (2010)Google Scholar
  2. 2.
    Buttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: SIGIR 2006, pp. 621–622 (2006)Google Scholar
  3. 3.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006). Article 6CrossRefGoogle Scholar
  4. 4.
    Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. In: SIGMOD 1994 Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, 24–27 May 1994, pp. 289–300 (1994)Google Scholar
  5. 5.
    Zipf, G.: Relative frequency as a determinant of phonetic change. Harvard studies in classical philology. 40, 1–95 (1929)CrossRefGoogle Scholar
  6. 6.
    Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Trans. Inf. Syst. (TOIS) 22(4), 573–594 (2004)CrossRefGoogle Scholar
  7. 7.
    Schenkel, R., Broschart, A., Hwang, S., Theobald, M., Weikum, G.: Efficient text proximity search. In: String Processing and Information Retrieval, 14th International Symposium, SPIRE 2007. Lecture Notes in Computer Science, vol. 4726, Santiago de Chile, Chile, 29–31 October 2007, pp. 287–299. Springer, Heidelberg (2007)Google Scholar
  8. 8.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International Conference on World Wide Web (WWW 1998) (1998)CrossRefGoogle Scholar
  9. 9.
    Veretennikov, A.B.: O poiske fraz i naborov slov v polnotekstovom indekse [About phrases search in full-text index]. Sistemy upravleniya i informatsionnye tekhnologii [Control systems and information technologies], 48(2.1), 125–130 (2012). in RussianGoogle Scholar
  10. 10.
    Veretennikov, A.B.: Ispol’zovanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova [Using additional indexes for fast full-text searching phrases that contains frequently used words]. Sistemy upravleniya i informatsionnye tekhnologii [Control Systems and Information Technologies]. 52(2), 61–66 (2013). In RussianMathSciNetGoogle Scholar
  11. 11.
    Veretennikov, A.B.: Effektivnyi polnotekstovyi poisk s ispol’zovaniem dopolnitel’nykh indeksov chasto vstrechayushchikhsya slov [Efficient full-text search by means of additional indexes of frequently used words]. Sistemy upravleniya i informatsionnye tekhnologii [Control Systems and Information Technologies]. 66(4), 52–60 (2016). in RussianGoogle Scholar
  12. 12.
    Veretennikov, A.B.: Sozdanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova [Creating additional indexes for fast full-text searching phrases that contains frequently used words]. Sistemy upravleniya i informatsionnye tekhnologii [Control systems and information technologies]. 63(1), 27–33 (2016). in RussianGoogle Scholar
  13. 13.
    Veretennikov, A.B.: O strukture legko obnovlyaemykh polnotekstovykh indeksov [About a structure of easy updatable full-text indexes]. Sovremennye problemy matematiki i ee prilozhenii. Trudy Mezhdunarodnoi (48-i Vserossiiskoi) molodezhnoi shkoly-konferentsii} [Proceedings of the 48th International Youth School-Conference “Modern Problems in Mathematics and its Applications”], pp. 30–41 (2017). http://ceur-ws.org/Vol-1894/
  14. 14.
    Bahle, D., Williams, H.E., Zobel, J.: Efficient phrase querying with an auxiliary index. In: SIGIR 2002 Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 11–15 August 2002, pp. 215–221 (2002)Google Scholar
  15. 15.
    Chang, M., Poon, C.K.: Efficient phrase querying with common phrase index. In: ECIR 2006. LNCS, vol. 3936, pp. 61–71. Springer, Heidelberg (2006)Google Scholar
  16. 16.
    Gugnani, S., Roul, R.K.: Triple indexing: an efficient technique for fast phrase query evaluation. Int. J. Comput. Appl. 87(13), 9–13 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Chair of Calculation Mathematics and Computer Science, INSMUral Federal UniversityYekaterinburgRussia

Personalised recommendations