Skip to main content
Log in

trie-nlg: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Query auto-completion (QAC) aims at suggesting plausible completions for a given query prefix. Traditionally, QAC systems have leveraged tries curated from historical query logs to suggest most popular completions. In this context, there are two specific scenarios that are difficult to handle for any QAC system: short prefixes (which are inherently ambiguous) and unseen prefixes. Recently, personalized Natural Language Generation (NLG) models have been proposed to leverage previous session queries as context for addressing these two challenges. However, such NLG models suffer from two drawbacks: (1) some of the previous session queries could be noisy and irrelevant to the user intent for the current prefix, and (2) NLG models cannot directly incorporate historical query popularity. This motivates us to propose a novel NLG model for QAC, Trie-NLG, which jointly leverages popularity signals from trie and personalization signals from previous session queries. We train the Trie-NLG model by augmenting the prefix with rich context comprising of recent session queries and top trie completions. This simple modeling approach overcomes the limitations of trie-based and NLG-based approaches, and leads to state-of-the-art performance. We evaluate the Trie-NLG model using two large QAC datasets. On average, our model achieves huge \(\sim\)57% and \(\sim\)14% boost in MRR over the popular trie-based lookup and the strong BART-based baseline methods, respectively. We make our code publicly available at https://github.com/kaushal0494/Trie-NLG.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of data and materials

We make our code available as supplementary information. It is in line with our claims in the paper and can be trusted to reproduce the results. We use one public dataset (AOL) and a Bing dataset. Since Bing data is proprietary, it cannot be made publicly available.

Code availability

We make our code publicly available and it is attached as supplementary data.

Notes

  1. A small percent of suggestions are not prefix preserving; in this work, we focus on prefix-preserving suggestions only.

  2. https://github.com/jma127/pyltr.

  3. https://github.com/asyml/texar.

  4. https://huggingface.co/.

  5. https://huggingface.co/blog/how-to-generate#appendix.

  6. https://github.com/kaushal0494/Trie-NLG.

References

  • Bar-Yossef Z, Kraus N (2011) Context-sensitive query auto-completion. In: Proceedings of the 20th international conference on world wide web (pp. 107–116)

  • Bhatia S,Majumdar D, Mitra P (2011) Query suggestions in the absence of query logs. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (pp. 795–804)

  • Bonchi F, Perego R, Silvestri F, Vahabi H, Venturini R (2012) Efficient query recommendations in the long tail via center-piece subgraphs. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (pp. 345–354)

  • Burges CJ (2010) From ranknet to lambdarank to lambdamart: an overview. Learning 11:23–581

    Google Scholar 

  • Cai F, De Rijke M (2016) Query auto completion in information retrieval. Universiteit van Amsterdam [Host]

  • Cai F , Liang S , De Rijke, M (2014) Time-sensitive personalized query auto-completion. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1599–1608)

  • Dehghani M , Rothe S , Alfonseca E , Fleury P (2017) Learning to attend, copy, and generate for session-based query suggestion. Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 1747–1756)

  • Gehman S , Gururangan S , Sap M , Choi Y , Smith NA (2020) Real Toxicity Prompts: evaluating neural toxic degeneration in language models. Findings of the association for computational linguistics: Emnlp 2020 (pp. 3356–3369). Online Association for Computational Linguistics

  • Hofmann K, Mitra B, Radlinski F, Shokouhi M (2014) An eye-tracking study of user interactions with query auto completion. Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 549–558)

  • Hsu B-J, Ottaviano G (2013) Space-efficient data structures for top-k completion. In: Proceedings of the 22nd international conference on world wide web (pp. 583–594)

  • Huang C-K, Chien L-F, Oyang Y-J (2003) Relevant term suggestion in interactive web search based on contextual information in query session logs. J Am Soc Inf Sci Technol 54(7):638–649

    Article  Google Scholar 

  • Jiang J-Y , Ke Y-Y , Chien P-Y , Cheng P-J (2014) Learning user reformulation behavior for query auto-completion. In: Proceedings of the 37th international ACM SIGIIR conference on research & development in information retrieval (pp. 445–454)

  • Lewis M , Liu Y , Goyal N , Ghazvininejad M , Mohamed A , Levy O, Zettlemoyer, L (2020) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880)

  • Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N et al (2020) Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv Neural Inf Process Syst 33:9459–9474

    Google Scholar 

  • Maxwell D , Bailey P , Hawking D (2017) Large-scale generative query autocompletion. In: Proceedings of the 22nd australasian document computing symposium (pp. 1–8)

  • Mei Q , Zhou D , Church K (2008) Query suggestion using hitting time. In:: Proceedings of the 17th acm conference on information and knowledge management (pp. 469–478)

  • Mitra B, Craswell N (2015) Query auto-completion for rare prefixes. In: Proceedings of the 24th acm international on conference on information and knowledge management (pp. 1755–1758)

  • Mitra B , Shokouhi M , Radlinski F, Hofmann K (2014) On user interactions with query auto-completion. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval (pp. 1055–1058)

  • Mustar A, Lamprier S, Piwowarski B (2020) Using bert and bart for query suggestion. Circle

  • Park DH, Chiba R(2017) A neural language model for query auto-completion. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval (pp. 1189–1192)

  • Pass G , Chowdhury A , Torgeson C (2006) A picture of search. In: Proceedings of the 1st international conference on scalable information systems (pp. 1–es)

  • Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67

    MathSciNet  Google Scholar 

  • Rosset C , Jose D , Ghosh G , Mitra B Tiwary S (2018) Optimizing query evaluations using reinforcement learning for web search. In: The 41st international ACM SIGIR conference on research & development in information retrieval (pp. 1193–1196)

  • Sadikov E , Madhavan J , Wang L , Halevy A (2010) Clustering query refinements by user intent. In: Proceedings of the 19th international conference on world wide web (pp. 841–850)

  • Shokouhi M (2013) Learning to personalize query auto-completion. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (pp. 103–112)

  • Shokouhi M, Radinsky K (2012) Time-sensitive query auto-completion. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (pp. 601–610)

  • Sordoni A , Bengio Y , Vahabi H , Lioma C , Grue Simonsen J , Nie J-Y (2015) A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In: Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 553–562)

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al (2017) Attention is all you need. Adv Neural Inf Process Syst. Vol 30

  • Wang P-W, Zhang H, Mohan V, Dhillon IS, Kolter JZ (2018) Realtime query completion via deep language models. ecom@ sigir

  • Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retrieval 13(3):254–270

    Article  Google Scholar 

  • Yadav N, Sen R, Hill DN, Mazumdar A, Dhillon IS (2021) Session-aware query auto-completion using extreme multi-label ranking. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 3835–3844)

  • Yin D, Tan J, Zhang Z, Deng H, Huang S, Chen J (2020) Learning to generate personalized query auto-completions via a multi-view multi-task attentive approach. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2998–3007)

  • Zhou G, Zhu X, Song C, Fan Y, Zhu H , Ma X, et al (2018) Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1059–1068)

Download references

Funding

This research was partially funded by Microsoft Academic Partnership Grant 2022.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the design of the project. Specifically, Conceptualization, Methodology, Data preparation: [KKM]; Writing—original draft preparation: [KKM, MG, MSD]; Writing—review and editing: [everyone]; Funding acquisition, Resources, Supervision: [MG, MSD, PA].

Corresponding author

Correspondence to Kaushal Kumar Maurya.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare relevant to this article’s content.

Consent for publication

All the authors of this work have been informed of the submission, and everyone supports it.

Additional information

Responsible editor: Charalampos Tsourakakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maurya, K.K., Desarkar, M.S., Gupta, M. et al. trie-nlg: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes. Data Min Knowl Disc 37, 2306–2329 (2023). https://doi.org/10.1007/s10618-023-00966-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00966-0

Keywords

Navigation