Advertisement

Efficient query autocompletion with edit distance-based error tolerance

  • Jianbin QinEmail author
  • Chuan Xiao
  • Sheng Hu
  • Jie Zhang
  • Wei Wang
  • Yoshiharu Ishikawa
  • Koji Tsuda
  • Kunihiko Sadakane
Regular Paper
  • 34 Downloads

Abstract

Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper, we study the problem of query autocompletion that tolerates errors in users’ input using edit distance constraints. Previous approaches index data strings in a trie, and continuously maintain all the prefixes of data strings whose edit distances from the query string are within the given threshold. The major inherent drawback of these approaches is that the number of such prefixes is huge for the first few characters of the query string and is exponential in the alphabet size. This results in slow query response even if the entire query approximately matches only few prefixes. We propose a novel neighborhood generation-based method to process error-tolerant query autocompletion. Our proposed method only maintains a small set of active nodes, thus saving both space and time to process the query. We also study efficient duplicate removal, a core problem in fetching query answers, and extend our method to support top-k queries. Optimization techniques are proposed to reduce the index size. The efficiency of our method is demonstrated through extensive experiments on real datasets.

Keywords

Query autocompletion Similarity Search Database Neighbourhood generation tree 

Notes

Acknowledgements

Chuan Xiao was supported by JSPS Kakenhi 16H01722, 17H06099, 18H04093, and NSFC 61702409. Sheng Hu and Yoshiharu Ishikawa were supported by JSPS Kakenhi 16H01722. Jie Zhang was supported by NSFC 61702409. Wei Wang was supported by ARC DPs 170103710 and 180103411, and D2DCRC DC25002 and DC25003. We thank the authors of [23] for kindly providing their source codes.

References

  1. 1.
    Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Boston (1974)zbMATHGoogle Scholar
  2. 2.
    Aoe, J.-I.: An efficient digital search algorithm by using a double-array structure. IEEE Trans. Softw. Eng. 15(9), 1066–1077 (1989)CrossRefGoogle Scholar
  3. 3.
    Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M.: Improving search engines by query clustering. JASIST 58(12), 1793–1804 (2007)CrossRefGoogle Scholar
  4. 4.
    Bar-Yossef, Z., Kraus, N.: Context-sensitive query auto-completion. In: WWW, pp. 107–116 (2011)Google Scholar
  5. 5.
    Bast, H., Weber, I.: Type less, find more: fast autocompletion search with a succinct index. In: SIGIR, pp. 364–371 (2006)Google Scholar
  6. 6.
    Bhatia, S., Majumdar, D., Mitra, P.: Query suggestions in the absence of query logs. In: SIGIR, pp. 795–804 (2011)Google Scholar
  7. 7.
    Bocek, T., Hunt, E., Stiller, B.: Fast similarity search in large dictionaries. Technical Report ifi-2007.02. Department of Informatics, University of Zurich (2007)Google Scholar
  8. 8.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)Google Scholar
  9. 9.
    Boytsov, L.: Indexing methods for approximate dictionary searching: comparative analysis. ACM J. Exp. Algorithm. 16(1), 1 (2011)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Cai, F., Chen, H.: Term-level semantic similarity helps time-aware term popularity based query completion. J. Intell. Fuzzy Syst. 32(6), 3999–4008 (2017)CrossRefGoogle Scholar
  11. 11.
    Cai, F., Chen, W., Ou, X.: Learning search popularity for personalized query completion in information retrieval. J. Intell. Fuzzy Syst. 33(4), 2427–2435 (2017)CrossRefGoogle Scholar
  12. 12.
    Cai, F., de Rijke, M.: Selectively personalizing query auto-completion. In: SIGIR, pp. 993–996 (2016)Google Scholar
  13. 13.
    Cai, F., Liang, S., de Rijke, M.: Prefix-adaptive and time-sensitive personalized query auto completion. IEEE Trans. Knowl. Data Eng. 28(9), 2452–2466 (2016)CrossRefGoogle Scholar
  14. 14.
    Cao, H., Jiang, D., Pei, J., Chen, E., Li, H.: Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In: WWW, pp. 191–200 (2009)Google Scholar
  15. 15.
    Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: Context-aware query suggestion by mining click-through and session data. In: KDD, pp. 875–883 (2008)Google Scholar
  16. 16.
    Cetindil, I., Esmaelnezhad, J., Kim, T., Li, C.: Efficient instant-fuzzy search with proximity ranking. In: ICDE, pp. 328–339 (2014)Google Scholar
  17. 17.
    Chaudhuri, S., Kaushik, R.: Extending autocompletion to tolerate errors. In: SIGMOD, pp. 707–718 (2009)Google Scholar
  18. 18.
    Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)Google Scholar
  19. 19.
    Daciuk, J.: Comparison of construction algorithms for minimal, acyclic, deterministic, finite-state automata from sets of strings. In: CIAA, pp. 255–261 (2002)Google Scholar
  20. 20.
    Darragh, J.J., Witten, I.H., James, M.L.: The reactive keyboard: a predicive typing aid. IEEE Comput. 23(11), 41–49 (1990)CrossRefGoogle Scholar
  21. 21.
    Deng, D., Li, G., Feng. J.: A pivotal prefix based filtering algorithm for string similarity search. In: SIGMOD, pp. 673–684 (2014)Google Scholar
  22. 22.
    Deng, D., Li, G., Feng, J., Duan, Y., Gong, Z.: A unified framework for approximate dictionary-based entity extraction. VLDB J. 24(1), 143–167 (2015)CrossRefGoogle Scholar
  23. 23.
    Deng, D., Li, G., Wen, H., Jagadish, H.V., Feng, J.: META: an efficient matching-based method for error-tolerant autocompletion. PVLDB 9(10), 828–839 (2016)Google Scholar
  24. 24.
    Duan, H., Hsu, B.-J.P.: Online spelling correction for query completion. In: WWW, pp. 117–126 (2011)Google Scholar
  25. 25.
    Duan, H., Li, Y., Zhai, C., Roth, D.: A discriminative model for query spelling correction with latent structural SVM. In: EMNLP-CoNLL, pp. 1511–1521 (2012)Google Scholar
  26. 26.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)Google Scholar
  27. 27.
    Fan, J., Wu, H., Li, G., Zhou, L.: Suggesting topic-based query terms as you type. In: APWeb, pp. 61–67 (2010)Google Scholar
  28. 28.
    Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)CrossRefGoogle Scholar
  29. 29.
    Gao, J., Li, X., Micol, D., Quirk, C., Sun, X.: A large scale ranker-based system for search query spelling correction. In: COLING, pp. 358–366 (2010)Google Scholar
  30. 30.
    Grabski, K., Scheffer, T.: Sentence completion. In: SIGIR, pp. 433–439 (2004)Google Scholar
  31. 31.
    Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: VLDB, pp. 491–500 (2001)Google Scholar
  32. 32.
    He, Q., Jiang, D., Liao, Z., Hoi, S.C.H., Chang, K., Lim, E.-P., Li, H.: Web query recommendation via sequential query prediction. In: ICDE, pp. 1443–1454 (2009)Google Scholar
  33. 33.
    Hofmann, K., Mitra, B., Radlinski, F., Shokouhi, M.: An eye-tracking study of user interactions with query auto completion. In: CIKM, pp. 549–558 (2014)Google Scholar
  34. 34.
    Hsu, B.P., Ottaviano, G.: Space-efficient data structures for top-\(k\) completion. In: WWW, pp. 583–594 (2013)Google Scholar
  35. 35.
    Hu, S., Xiao, C., Ishikawa, Y.: An efficient algorithm for location-aware query autocompletion. IEICE Trans. 101–D(1), 181–192 (2018)CrossRefGoogle Scholar
  36. 36.
    Ji, S., Li, C.: Location-based instant search. In: SSDBM, pp. 17–36 (2011)Google Scholar
  37. 37.
    Ji, S., Li, G., Li, C., Feng, J.: Efficient interactive fuzzy keyword search. In: WWW, pp. 371–380 (2009)Google Scholar
  38. 38.
    Jiang, J., Ke, Y., Chien, P., Cheng, P.: Learning user reformulation behavior for query auto-completion. In: SIGIR, pp. 445–454 (2014)Google Scholar
  39. 39.
    Krishnan, U., Moffat, A., Zobel, J.: A taxonomy of query auto completion modes. In: ADCS, pp. 6:1–6:8 (2017)Google Scholar
  40. 40.
    Li, C., Wang, B., Yang, X.: VGRAM: improving performance of approximate queries on string collections using variable-length grams. In: VLDB, pp. 303–314 (2007)Google Scholar
  41. 41.
    Li, G., Deng, D., Feng, J.: A partition-based method for string similarity joins with edit-distance constraints. ACM Trans. Database Syst. 38(2), 9:1–9:33 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  42. 42.
    Li, G., Ji, S., Li, C., Feng, J.: Efficient type-ahead search on relational data: a tastier approach. In: SIGMOD, pp. 695–706 (2009)Google Scholar
  43. 43.
    Li, G., Ji, S., Li, C., Feng, J.: Efficient fuzzy full-text type-ahead search. VLDB J. 20(4), 617–640 (2011)CrossRefGoogle Scholar
  44. 44.
    Li, G., Wang, J., Li, C., Feng, J.: Supporting efficient top-k queries in type-ahead search. In: SIGIR, pp. 355–364 (2012)Google Scholar
  45. 45.
    Li, L., Deng, H., Dong, A., Chang, Y., Baeza-Yates, R.A., Zha, H.: Exploring query auto-completion and click logs for contextual-aware web search and query suggestion. In: WWW, pp. 539–548 (2017)Google Scholar
  46. 46.
    Li, L., Deng, H., Dong, A., Chang, Y., Zha, H., Baeza-Yates, R.A.: Analyzing user’s sequential behavior in query auto-completion via Markov processes. In: SIGIR, pp. 123–132 (2015)Google Scholar
  47. 47.
    Li, Y., Dong, A., Wang, H., Deng, H., Chang, Y., Zhai, C.: A two-dimensional click model for query auto-completion. In: SIGIR, pp. 455–464 (2014)Google Scholar
  48. 48.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)zbMATHCrossRefGoogle Scholar
  49. 49.
    Mitra, B., Shokouhi, M., Radlinski, F., Hofmann, K.: On user interactions with query auto-completion. In: SIGIR, pp. 1055–1058 (2014)Google Scholar
  50. 50.
    Mor, M., Fraenkel, A.S.: A hash code method for detecting and correcting spelling errors. Commun. ACM 25(12), 935–938 (1982)CrossRefGoogle Scholar
  51. 51.
    Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: SODA, pp. 657–666 (2002)Google Scholar
  52. 52.
    Myers, E.W.: A sublinear algorithm for approximate keyword searching. Algorithmica 12(4/5), 345–374 (1994)MathSciNetzbMATHCrossRefGoogle Scholar
  53. 53.
    Nandi, A., Jagadish, H.V.: Effective phrase prediction. In: VLDB, pp. 219–230 (2007)Google Scholar
  54. 54.
    Qin, J., Wang, W., Xiao, C., Lu, Y., Lin, X., Wang, H.: Asymmetric signature schemes for efficient exact edit similarity query processing. ACM Trans. Database Syst. 38(3), 16 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  55. 55.
    Roy, S.B., Chakrabarti, K.: Location-aware type ahead search on spatial databases: semantics and efficiency. In: SIGMOD, pp. 361–372 (2011)Google Scholar
  56. 56.
    Sadikov, E., Madhavan, J., Wang, L., Halevy, A.Y.: Clustering query refinements by user intent. In: WWW, pp. 841–850 (2010)Google Scholar
  57. 57.
    Shokouhi, M.: Learning to personalize query auto-completion. In: SIGIR, pp. 103–112 (2013)Google Scholar
  58. 58.
    Shokouhi, M., Radinsky, K.: Time-sensitive query auto-completion. In: SIGIR, pp. 601–610 (2012)Google Scholar
  59. 59.
    Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Simonsen, J.G., Nie, J.: A hierarchical recurrent encoder–decoder for generative context-aware query suggestion. In: CIKM, pp. 553–562 (2015)Google Scholar
  60. 60.
    Tsur, D.: Fast index for approximate string matching. J. Discrete Algorithms 8(4), 339–345 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  61. 61.
    Tyler, S.K., Teevan, J.: Large scale query log analysis of re-finding. In: WSDM, pp. 191–200 (2010)Google Scholar
  62. 62.
    Ukkonen, E.: Algorithms for approximate string matching. Inf. Control 64(1–3), 100–118 (1985)MathSciNetzbMATHCrossRefGoogle Scholar
  63. 63.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)MathSciNetzbMATHCrossRefGoogle Scholar
  64. 64.
    Wang, W., Qin, J., Xiao, C., Lin, X., Shen, H.T.: Vchunkjoin: an efficient algorithm for edit similarity joins. IEEE Trans. Knowl. Data Eng. 25(8), 1916–1929 (2013)CrossRefGoogle Scholar
  65. 65.
    Wang, W., Xiao, C., Lin, X., Zhang, C.: Efficient approximate entity extraction with edit constraints. In: SIMGOD, pp. 759–770 (2009)Google Scholar
  66. 66.
    Wang, Y., Ouyang, H., Deng, H., Chang, Y.: Learning online trends for interactive query auto-completion. IEEE Trans. Knowl. Data Eng. 29(11), 2442–2454 (2017)CrossRefGoogle Scholar
  67. 67.
    Wei, H., Yu, J.X., Lu, C.: String similarity search: a hash-based approach. IEEE Trans. Knowl. Data Eng. 30(1), 170–184 (2018)CrossRefGoogle Scholar
  68. 68.
    Wen, J., Zhang, H., Nie, J.: Query clustering using content words and user feedback. In: SIGIR, pp. 442–443 (2001)Google Scholar
  69. 69.
    Whiting, S., Jose, J.M.: Recent and robust query auto-completion. In: WWW, pp. 971–982 (2014)Google Scholar
  70. 70.
    Xiao, C., Qin, J., Wang, W., Ishikawa, Y., Tsuda, K., Sadakane, K.: Efficient error-tolerant query autocompletion. PVLDB 6(6), 373–384 (2013)Google Scholar
  71. 71.
    Xiao, C., Wang, W., Lin, X.: Ed-Join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB 1(1), 933–944 (2008)MathSciNetGoogle Scholar
  72. 72.
    Yu, M., Wang, J., Li, G., Zhang, Y., Deng, D., Feng, J.: A unified framework for string similarity search with edit-distance constraint. VLDB J. 26(2), 249–274 (2017)CrossRefGoogle Scholar
  73. 73.
    Zhang, A., Goyal, A., Kong, W., Deng, H., Dong, A., Chang, Y., Gunter, C.A., Han, J.: adaqac: adaptive query auto-completion via implicit negative feedback. In: SIGIR, pp. 143–152 (2015)Google Scholar
  74. 74.
    Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. In: SIGMOD, pp. 425–436 (2001)Google Scholar
  75. 75.
    Zheng, Y., Bao, Z., Shou, L., Tung, A.K.H.: INSPIRE: a framework for incremental spatial prefix query relaxation. IEEE Trans. Knowl. Data Eng. 27(7), 1949–1963 (2015)CrossRefGoogle Scholar
  76. 76.
    Zhong, R., Fan, J., Li, G., Tan, K., Zhou, L.: Location-aware instant search. In: CIKM, pp. 385–394 (2012)Google Scholar
  77. 77.
    Zhou, X., Qin, J., Xiao, C., Wang, W., Lin, X., Ishikawa, Y.: BEVA: an efficient query processing algorithm for error-tolerant autocompletion. ACM Trans. Database Syst. 41(1), 5:1–5:44 (2016)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Jianbin Qin
    • 1
    Email author
  • Chuan Xiao
    • 2
    • 3
  • Sheng Hu
    • 3
    • 4
  • Jie Zhang
    • 5
  • Wei Wang
    • 6
  • Yoshiharu Ishikawa
    • 3
  • Koji Tsuda
    • 7
  • Kunihiko Sadakane
    • 7
  1. 1.Shenzhen Institute of Computing SciencesShenzhen UniversityShenzhenChina
  2. 2.Osaka UniversityOsakaJapan
  3. 3.Nagoya UniversityNagoyaJapan
  4. 4.Kyoto UniversityKyotoJapan
  5. 5.Xi’an University of TechnologyXi’anChina
  6. 6.The University of New South WalesSydneyAustralia
  7. 7.The University of TokyoTokyoJapan

Personalised recommendations