Abstract
Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in a text, we use additional indexes to store information about nearby words that are at distances from the given word of less than or equal to the MaxDistance parameter. We showed that additional indexes with three-component keys can be used to improve the average query execution time by up to 94.7 times if the queries consist of high-frequency occurring words. In this paper, we present a new search algorithm with even more performance gains. We consider several strategies for selecting multi-component key indexes for a specific query and compare these strategies with the optimal strategy. We also present the results of search experiments, which show that three-component key indexes enable much faster searches in comparison with two-component key indexes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Veretennikov, A.B.: Proximity full-text search with response time guarantee by means of three component keys. Bull. South Ural State Univ. Ser: Comput. Math. Softw. Eng. 7(1), 60–77 (2018). https://doi.org/10.14529/cmse180105. (in Russian)
Buttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: SIGIR 2006, pp. 621–622 (2006). https://doi.org/10.1145/1148170.1148285
Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: European Conference on Information Retrieval (ECIR) 2003: Advances in Information Retrieval, pp. 207–218 (2003). https://doi.org/10.1007/3-540-36618-0_15
Schenkel, R., Broschart, A., Hwang, S., Theobald, M., Weikum, G.: Efficient text proximity search. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 287–299. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75530-2_26
Yan, H., Shi, S., Zhang, F., Suel, T., Wen, J.-R.: Efficient term proximity search with term-pair indexes. In: CIKM 2010 Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010, pp. 1229–1238 (2010). https://doi.org/10.1145/1871437.1871593
Zipf, G.: Relative frequency as a determinant of phonetic change. Harv. Stud. Class. Philol. 40, 1–95 (1929). https://doi.org/10.2307/408772
Luk, R.W.P.: Scalable, statistical storage allocation for extensible inverted file construction. J. Syst. Softw. 84(7), 1082–1088 (2011). https://doi.org/10.1016/j.jss.2011.01.049
Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. In: SIGMOD 1994 Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, 24–27 May 1994, pp. 289–300 (1994). https://doi.org/10.1145/191839.191896
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), Article no. 6 (2006). https://doi.org/10.1145/1132956.1132959
Miller, R.B.: Response time in man-computer conversational transactions. In: Proceedings: AFIPS Fall Joint Computer Conference. San Francisco, California, 09–11 December 1968, vol. 33, pp. 267–277 (1968). https://doi.org/10.1145/1476589.1476628
Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: SIGIR 2001 Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, 9–12 September 2001, pp. 35–42 (2001). https://doi.org/10.1145/383952.383957
Garcia, S., Williams, H.E., Cannane, A.: Access-ordered indexes. In: ACSC 2004 Proceedings of the 27th Australasian Conference on Computer Science, Dunedin, New Zealand, 18–22 January 2004, pp. 7–14 (2004)
Bahle, D., Williams, H.E., Zobel, J.: Efficient phrase querying with an auxiliary index. In: SIGIR 2002 Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 11–15 August 2002, pp. 215–221 (2002). https://doi.org/10.1145/564376.564415
Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Trans. Inf. Syst. (TOIS) 22(4), 573–594 (2004). https://doi.org/10.1145/1028099.1028102
Veretennikov, A.B.: Proximity full-text search with a response time guarantee by means of additional indexes with multi-component keys. In: Selected Papers of the XX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2018), Moscow, Russia, 9–12 October 2018, pp. 123–130 (2018). http://ceur-ws.org/Vol-2277
Veretennikov, A.B.: O poiske fraz i naborov slov v polnotekstovom indekse (About phrases search in full-text index). Control Syst. Inf. Technol. 48(2.1), 125–130 (2012). (in Russian)
Veretennikov, A.B.: Effektivnyi polnotekstovyi poisk s uchetom blizosti slov pri pomoshchi trekhkomponentnykh klyuchei (Efficient full-text proximity search by means of three component keys). Control Syst. Inf. Technol. 69(3), 25–32 (2017). (in Russian)
Veretennikov, A.B.: Ispol’zovanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova (Using additional indexes for fast full-text searching phrases that contains frequently used words). Control Syst. Inf. Technol. 52(2), 61–66 (2013). (in Russian)
Veretennikov, A.B.: Effektivnyi polnotekstovyi poisk s ispol’zovaniem dopolnitel’nykh indeksov chasto vstrechayushchikhsya slov (Efficient full-text search by means of additional indexes of frequently used words). Control Syst. Inf. Technol. 66(4), 52–60 (2016). (in Russian)
Veretennikov, A.B.: Sozdanie dopolnitel’nykh indeksov dlya bolee bystrogo polnotekstovogo poiska fraz, vklyuchayushchikh chasto vstrechayushchiesya slova (Creating additional indexes for fast full-text searching phrases that contains frequently used words). Control Syst. Inf. Technol. 63(1), 27–33 (2016). (in Russian)
Veretennikov, A.B.: Proximity full-text search with a response time guarantee by means of additional indexes. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 868, pp. 936–954. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01054-6_66
Williams, J.W.J.: Algorithm 232 – Heapsort. Commun. ACM 7(6), 347–348 (1964)
Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users and real needs: a study and analysis of user queries on the Web. Inf. Process. Manag. 36(2), 207–227 (2000). https://doi.org/10.1016/S0306-4573(99)00056-4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Veretennikov, A.B. (2019). Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance. In: Manolopoulos, Y., Stupnikov, S. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2018. Communications in Computer and Information Science, vol 1003. Springer, Cham. https://doi.org/10.1007/978-3-030-23584-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-23584-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23583-3
Online ISBN: 978-3-030-23584-0
eBook Packages: Computer ScienceComputer Science (R0)