Skip to main content

Proximity Full-Text Searches of Frequently Occurring Words with a Response Time Guarantee

  • Conference paper
  • First Online:
Mathematical Analysis With Applications (CONCORD-90 2018)

Abstract

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in the text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to MaxDistance, which is a parameter. A search algorithm for the case when the query consists of high-frequently occurring words is discussed. In addition, we present results of experiments with different values of MaxDistance to evaluate the search speed dependence on the value of MaxDistance. These results show that the average time of the query execution with our indexes is 94.7–45.9 times (depending on the value of MaxDistance) less than that with standard inverted files when queries that contain high-frequently occurring words are evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 128.39
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 168.79
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
EUR 168.79
Price includes VAT (France)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anh, V.N., de Kretser, O., Moffat, A.: Vector-Space ranking with effective early termination. In: SIGIR 2001 Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, pp. 35–42 (2001). https://doi.org/10.1145/383952.383957

  2. Bahle, D., Williams, H.E., Zobel, J.: Efficient phrase querying with an auxiliary index. In: SIGIR 2002 Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 215–221 (2002). https://doi.org/10.1145/564376.564415

  3. Buttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: SIGIR 2006 Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 621–622 (2006). https://doi.org/10.1145/1148170.1148285

  4. Garcia, S., Williams, H.E., Cannane, A.: Access-Ordered indexes. In: ACSC 2004 Proceedings of the 27th Australasian Conference on Computer Science, Dunedin, New Zealand, pp. 7–14 (2004)

    Google Scholar 

  5. Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users and real needs: a study and analysis of user queries on the Web. Inf. Process. Manag. 36(2), 207–227 (2000). https://doi.org/10.1016/S0306-4573(99)00056-4

    Article  Google Scholar 

  6. Miller, R.B.: Response time in man-computer conversational transactions. AFIPS Fall Joint Computer Conference, San Francisco, California 33, 267–277 (1968). https://doi.org/10.1145/1476589.1476628

    Article  Google Scholar 

  7. Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: European Conference on Information Retrieval (ECIR) 2003: Advances in Information Retrieval, pp. 207–218 (2003). https://doi.org/10.1007/3-540-36618-0_15

    Google Scholar 

  8. Schenkel, R., Broschart, A., Hwang, S., Theobald, M., Weikum, G.: Efficient text proximity search. In: String Processing and Information Retrieval, 14th International Symposium, SPIRE 2007. Lecture Notes in Computer Science, vol. 4726, Santiago de Chile, Oct 29–31, pp. 287–299. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75530-2_26

  9. Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. In: SIGMOD ’94 Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, 24–27 May 1994, pp. 289–300 (1994). https://doi.org/10.1145/191839.191896

  10. Veretennikov, A.B.: Proximity full-text search with response time guarantee by means of three component keys. Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering, 7(1), 60–77 (2018). In Russian. https://doi.org/10.14529/cmse180105

  11. Veretennikov, A.B.: About phrases search in full-text index. Control Syst. Inf. Tech. 48(2.1), 125–130 (2012). In Russian

    Google Scholar 

  12. Veretennikov, A.B.: Using additional indexes for fast full-text searching phrases that contains frequently used words. Control Syst. Inf. Technol. 52(2), 61–66 (2013). In Russian

    MathSciNet  Google Scholar 

  13. Veretennikov, A.B.: Efficient full-text search by means of additional indexes of frequently used words. Control Syst. Inf. Technol. 66(4), 52–60 (2016). In Russian

    Google Scholar 

  14. Veretennikov, A.B.: Creating additional indexes for fast full-text searching phrases that contains frequently used words. Control Syst. Inf. Technol. 63(1), 27–33 (2016). In Russian

    Google Scholar 

  15. Veretennikov, A.B.: About a structure of easy updatable full-text indexes. In: Proceedings of the 48th International Youth School-Conference “Modern Problems in Mathematics and its Applications”, CEUR-WS, 1894, pp. 30–41 (2017). In Russian

    Google Scholar 

  16. Veretennikov, A.B.: Efficient full-text proximity search by means of three component keys. Control Syst. Inf. Technol. 69(3), 25–32 (2017). In Russian

    Google Scholar 

  17. Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Trans. Inf. Syst. (TOIS) 22(4), 573–594 (2004). https://doi.org/10.1145/1028099.1028102

    Article  Google Scholar 

  18. Williams, J.W.J.: Algorithm 232—Heapsort. Commun. ACM 7(6), 347–348 (1964)

    Google Scholar 

  19. Yan, H., Shi, S., Zhang, F., Suel, T., Wen, J.-R.: Efficient term proximity search with term-pair indexes. In: CIKM 2010 Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, pp. 1229–1238 (2010). https://doi.org/10.1145/1871437.1871593

  20. Zipf, G.: Relative frequency as a determinant of phonetic change. Harv. Stud. Class. Philol. 40, 1–95 (1929). https://doi.org/10.2307/408772

    Article  Google Scholar 

  21. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006). Article 6. https://doi.org/10.1145/1132956.1132959

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by Act 211 Government of the Russian Federation, contract no. 02.A03.21.0006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. B. Veretennikov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Veretennikov, A.B. (2020). Proximity Full-Text Searches of Frequently Occurring Words with a Response Time Guarantee. In: Pinelas, S., Kim, A., Vlasov, V. (eds) Mathematical Analysis With Applications. CONCORD-90 2018. Springer Proceedings in Mathematics & Statistics, vol 318. Springer, Cham. https://doi.org/10.1007/978-3-030-42176-2_37

Download citation

Publish with us

Policies and ethics