Abstract
Full-text search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity full-text search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance full-text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of search query execution is 44–45 times less than that required when using ordinary inverted indexes.
The work was supported by Act 211 Government of the Russian Federation, contract № 02.A03.21.0006.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yan, H., Shi, S., Zhang, F., Suel, T., Wen, J.R.: Efficient term proximity search with term-pair indexes. In: CIKM 2010 Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010, pp. 1229–1238 (2010)
Buttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: SIGIR 2006, pp. 621–622 (2006)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006). Article 6
Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. In: SIGMOD 1994 Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, 24–27 May 1994, pp. 289–300 (1994)
Zipf, G.: Relative frequency as a determinant of phonetic change. Harvard studies in classical philology. 40, 1–95 (1929)
Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Trans. Inf. Syst. (TOIS) 22(4), 573–594 (2004)
Schenkel, R., Broschart, A., Hwang, S., Theobald, M., Weikum, G.: Efficient text proximity search. In: String Processing and Information Retrieval, 14th International Symposium, SPIRE 2007. Lecture Notes in Computer Science, vol. 4726, Santiago de Chile, Chile, 29–31 October 2007, pp. 287–299. Springer, Heidelberg (2007)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International Conference on World Wide Web (WWW 1998) (1998)
Veretennikov, A.B.: O poiske fraz i naborov slov v polnotekstovom indekse [About phrases search in full-text index]. Sistemy upravleniya i informatsionnye tekhnologii [Control systems and information technologies], 48(2.1), 125–130 (2012). in Russian
Veretennikov, A.B.: Ispol’zovanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova [Using additional indexes for fast full-text searching phrases that contains frequently used words]. Sistemy upravleniya i informatsionnye tekhnologii [Control Systems and Information Technologies]. 52(2), 61–66 (2013). In Russian
Veretennikov, A.B.: Effektivnyi polnotekstovyi poisk s ispol’zovaniem dopolnitel’nykh indeksov chasto vstrechayushchikhsya slov [Efficient full-text search by means of additional indexes of frequently used words]. Sistemy upravleniya i informatsionnye tekhnologii [Control Systems and Information Technologies]. 66(4), 52–60 (2016). in Russian
Veretennikov, A.B.: Sozdanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova [Creating additional indexes for fast full-text searching phrases that contains frequently used words]. Sistemy upravleniya i informatsionnye tekhnologii [Control systems and information technologies]. 63(1), 27–33 (2016). in Russian
Veretennikov, A.B.: O strukture legko obnovlyaemykh polnotekstovykh indeksov [About a structure of easy updatable full-text indexes]. Sovremennye problemy matematiki i ee prilozhenii. Trudy Mezhdunarodnoi (48-i Vserossiiskoi) molodezhnoi shkoly-konferentsii} [Proceedings of the 48th International Youth School-Conference “Modern Problems in Mathematics and its Applications”], pp. 30–41 (2017). http://ceur-ws.org/Vol-1894/
Bahle, D., Williams, H.E., Zobel, J.: Efficient phrase querying with an auxiliary index. In: SIGIR 2002 Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 11–15 August 2002, pp. 215–221 (2002)
Chang, M., Poon, C.K.: Efficient phrase querying with common phrase index. In: ECIR 2006. LNCS, vol. 3936, pp. 61–71. Springer, Heidelberg (2006)
Gugnani, S., Roul, R.K.: Triple indexing: an efficient technique for fast phrase query evaluation. Int. J. Comput. Appl. 87(13), 9–13 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Veretennikov, A.B. (2019). Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_66
Download citation
DOI: https://doi.org/10.1007/978-3-030-01054-6_66
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01053-9
Online ISBN: 978-3-030-01054-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)