An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-component Key Indexes

  • Alexander B. VeretennikovEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1251)


A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we cannot avoid this task by excluding high-frequently occurring words from consideration by declaring them as stop words, then we can optimize our solution by introducing additional indexes for faster execution. In a previous work, we discussed how to decrease the search time with multi-component key indexes. We had shown that additional indexes can be used to improve the average query execution time up to 130 times if queries consisted of high-frequently occurring words. In this paper, we present another search algorithm that overcomes some limitations of our previous algorithm and provides even more performance gain.


Full-text search Search engines Inverted indexes Additional indexes Proximity search Term proximity Information retrieval Query processing Document-At-A-Time DAAT 



The work was supported by Act 211 Government of the Russian Federation, contract no. 02.A03.21.0006.


  1. 1.
    Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: SIGIR 2001 Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, pp. 35–42 (2001).
  2. 2.
    Borodin, A., Mirvoda, S., Porshnev, S., Ponomareva, O.: Improving generalized inverted index lock wait times. J. Phys.: Conf. Ser. 944(1), Article no. 012022 (2018).
  3. 3.
    Büttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: SIGIR 2006 Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 621–622 (2006).
  4. 4.
    Daoud, C.M., de Moura, E.S., Carvalho, A., da Silva, A.S., Fernandes, D., Rossi, C.: Fast top-k preserving query processing using two-tier indexes. Inf. Process. Manag. 52(5), 855–872 (2016). Scholar
  5. 5.
    Fox, C.: A stop list for general text. ACM SIGIR Forum 24, 19–35 (1989). Scholar
  6. 6.
    Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manag. 36(2), 207–227 (2000). Scholar
  7. 7.
    Jiang, D., Leung, K.W.-T., Yang, L. and Ng, W.: TEII: topic enhanced inverted index for top-k document retrieval. Know.-Based Syst. 89(C), 346–358 (2015).
  8. 8.
    Gall, M., Brost, G.: K-word proximity search on encrypted data. In: 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 365-372 (2016).
  9. 9.
    Garcia, S., Williams, H.E., Cannane, A.: Access-ordered indexes. In: ACSC 2004 Proceedings of the 27th Australasian Conference on Computer Science, Dunedin, New Zealand, pp. 7–14 (2004)Google Scholar
  10. 10.
    Lu, X., Moffat, A., Culpepper, J.S.: Efficient and effective higher order proximity modeling. In: ICTIR 2016 Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, pp. 21–30 (2016).
  11. 11.
    Luk, R.W.P.: Scalable, statistical storage allocation for extensible inverted file construction. J. Syst. Softw. Archive 84(7), 1082–1088 (2011). Scholar
  12. 12.
    Sadakane, K.: Fast algorithms for k-word proximity search. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 84(9), 2311–2318 (2001)Google Scholar
  13. 13.
    Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: European Conference on Information Retrieval (ECIR) 2003: Advances in Information Retrieval, pp. 207–218 (2003).
  14. 14.
    Veretennikov, A.B.: Proximity full-text search with a response time guarantee by means of additional indexes with multi-component keys. In: Selected Papers of the XX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2018), Moscow, Russia, 9–12 October 2018, pp. 123–130 (2018).
  15. 15.
    Veretennikov, A.B.: Proximity full-text search by means of additional indexes with multi-component keys: in pursuit of optimal performance. In: Manolopoulos, Y., Stupnikov, S. (eds.) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2018. Communications in Computer and Information Science, vol. 1003, pp. 111–130 (2019). Springer, Cham.
  16. 16.
    Veretennikov, A.B.: Proximity full-text search with a response time guarantee by means of additional indexes. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol. 868, pp. 936–954 (2019). Springer, Cham.
  17. 17.
    Veretennikov, A.B.: Proximity full-text search with response time guarantee by means of three component keys. Bull. South Ural State Univ. Ser.: Comput. Math. Softw. Eng. 7(1), 60–77 (2018). (in Russian)Google Scholar
  18. 18.
    Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Trans. Inf. Syst. (TOIS) 22(4), 573–594 (2004). Scholar
  19. 19.
    Williams, J.W.J.: Algorithm 232 heapsort. Commun. ACM 7(6), 347–348 (1964). Scholar
  20. 20.
    Yan, H., Shi, S., Zhang, F., Suel, T., Wen, J.-R.: Efficient term proximity search with term-pair indexes. In: CIKM 2010 Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, pp. 1229–1238 (2010).
  21. 21.
    Yang, Y. Ning, H.: Block linked list index structure for large data full text retrieval. In: 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 2123-2128 (2017)Google Scholar
  22. 22.
    Zipf, G.: Relative frequency as a determinant of phonetic change. Harv. Stud. Class. Philol. 40, 1–95 (1929). Scholar
  23. 23.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), Article no. 6 (2006).

Copyright information

© Springer Nature Switzerland AG 2021

Authors and Affiliations

  1. 1.Chair of Calculation Mathematics and Computer ScienceUral Federal UniversityYekaterinburgRussia

Personalised recommendations