An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-component Key Indexes

Veretennikov, Alexander B.

doi:10.1007/978-3-030-55187-2_37

Alexander B. Veretennikov¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1251))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

917 Accesses

Abstract

A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we cannot avoid this task by excluding high-frequently occurring words from consideration by declaring them as stop words, then we can optimize our solution by introducing additional indexes for faster execution. In a previous work, we discussed how to decrease the search time with multi-component key indexes. We had shown that additional indexes can be used to improve the average query execution time up to 130 times if queries consisted of high-frequently occurring words. In this paper, we present another search algorithm that overcomes some limitations of our previous algorithm and provides even more performance gain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Proximity Full-Text Searches of Frequently Occurring Words with a Response Time Guarantee

Shortening the Candidate List for Similarity Searching Using Inverted Index

References

Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: SIGIR 2001 Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, pp. 35–42 (2001). https://doi.org/10.1145/383952.383957
Borodin, A., Mirvoda, S., Porshnev, S., Ponomareva, O.: Improving generalized inverted index lock wait times. J. Phys.: Conf. Ser. 944(1), Article no. 012022 (2018). https://doi.org/10.1088/1742-6596/944/1/012022
Büttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: SIGIR 2006 Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 621–622 (2006). https://doi.org/10.1145/1148170.1148285
Daoud, C.M., de Moura, E.S., Carvalho, A., da Silva, A.S., Fernandes, D., Rossi, C.: Fast top-k preserving query processing using two-tier indexes. Inf. Process. Manag. 52(5), 855–872 (2016). https://doi.org/10.1016/j.ipm.2016.03.005
Article Google Scholar
Fox, C.: A stop list for general text. ACM SIGIR Forum 24, 19–35 (1989). https://doi.org/10.1145/378881.378888
Article Google Scholar
Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manag. 36(2), 207–227 (2000). https://doi.org/10.1016/S0306-4573(99)00056-4
Article Google Scholar
Jiang, D., Leung, K.W.-T., Yang, L. and Ng, W.: TEII: topic enhanced inverted index for top-k document retrieval. Know.-Based Syst. 89(C), 346–358 (2015). https://doi.org/10.1016/j.knosys.2015.07.014
Gall, M., Brost, G.: K-word proximity search on encrypted data. In: 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 365-372 (2016). https://doi.org/10.1109/WAINA.2016.104
Garcia, S., Williams, H.E., Cannane, A.: Access-ordered indexes. In: ACSC 2004 Proceedings of the 27th Australasian Conference on Computer Science, Dunedin, New Zealand, pp. 7–14 (2004)
Google Scholar
Lu, X., Moffat, A., Culpepper, J.S.: Efficient and effective higher order proximity modeling. In: ICTIR 2016 Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, pp. 21–30 (2016). https://doi.org/10.1145/2970398.2970404
Luk, R.W.P.: Scalable, statistical storage allocation for extensible inverted file construction. J. Syst. Softw. Archive 84(7), 1082–1088 (2011). https://doi.org/10.1016/j.jss.2011.01.049
Article Google Scholar
Sadakane, K.: Fast algorithms for k-word proximity search. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 84(9), 2311–2318 (2001)
Google Scholar
Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: European Conference on Information Retrieval (ECIR) 2003: Advances in Information Retrieval, pp. 207–218 (2003). https://doi.org/10.1007/3-540-36618-0_15
Veretennikov, A.B.: Proximity full-text search with a response time guarantee by means of additional indexes with multi-component keys. In: Selected Papers of the XX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2018), Moscow, Russia, 9–12 October 2018, pp. 123–130 (2018). http://ceur-ws.org/Vol-2277
Veretennikov, A.B.: Proximity full-text search by means of additional indexes with multi-component keys: in pursuit of optimal performance. In: Manolopoulos, Y., Stupnikov, S. (eds.) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2018. Communications in Computer and Information Science, vol. 1003, pp. 111–130 (2019). Springer, Cham. https://doi.org/10.1007/978-3-030-23584-0_7
Veretennikov, A.B.: Proximity full-text search with a response time guarantee by means of additional indexes. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol. 868, pp. 936–954 (2019). Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_66
Veretennikov, A.B.: Proximity full-text search with response time guarantee by means of three component keys. Bull. South Ural State Univ. Ser.: Comput. Math. Softw. Eng. 7(1), 60–77 (2018). (in Russian)
Google Scholar
Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Trans. Inf. Syst. (TOIS) 22(4), 573–594 (2004). https://doi.org/10.1145/1028099.1028102
Article Google Scholar
Williams, J.W.J.: Algorithm 232 heapsort. Commun. ACM 7(6), 347–348 (1964). https://doi.org/10.2307/408772
Article Google Scholar
Yan, H., Shi, S., Zhang, F., Suel, T., Wen, J.-R.: Efficient term proximity search with term-pair indexes. In: CIKM 2010 Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, pp. 1229–1238 (2010). https://doi.org/10.1145/1871437.1871593
Yang, Y. Ning, H.: Block linked list index structure for large data full text retrieval. In: 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 2123-2128 (2017)
Google Scholar
Zipf, G.: Relative frequency as a determinant of phonetic change. Harv. Stud. Class. Philol. 40, 1–95 (1929). https://doi.org/10.2307/408772
Article Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), Article no. 6 (2006). https://doi.org/10.1145/1132956.1132959

Download references

Acknowledgement

The work was supported by Act 211 Government of the Russian Federation, contract no. 02.A03.21.0006.

Author information

Authors and Affiliations

Chair of Calculation Mathematics and Computer Science, Ural Federal University, Yekaterinburg, Russia
Alexander B. Veretennikov

Authors

Alexander B. Veretennikov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander B. Veretennikov .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veretennikov, A.B. (2021). An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-component Key Indexes. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-55187-2_37
Published: 25 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55186-5
Online ISBN: 978-3-030-55187-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-component Key Indexes

Abstract

Access this chapter

Similar content being viewed by others

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Proximity Full-Text Searches of Frequently Occurring Words with a Response Time Guarantee

Shortening the Candidate List for Similarity Searching Using Inverted Index

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-component Key Indexes

Abstract

Access this chapter

Similar content being viewed by others

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Proximity Full-Text Searches of Frequently Occurring Words with a Response Time Guarantee

Shortening the Candidate List for Similarity Searching Using Inverted Index

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation