Skip to main content

Keyword Query Cleaning with Query Logs

  • Conference paper
Web-Age Information Management (WAIM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Included in the following conference series:

Abstract

Keyword queries over databases are often dirty with some irrelevant or incorrect words, which has a negative impact on the efficiency and accuracy of keyword query processing. In addition, the keywords in a given query often form natural segments. For example, the query “Tom Hanks Green Mile” can be considered as consisting of two segments, “Tom Hanks” and “Green Mile”. The goal of keyword query cleaning is to identify the optimal segmentation of the query, with semantic linkage and spelling corrections also considered. Query cleaning not only helps obtaining queries of higher quality, but also improves the efficiency of query processing by reducing the search space. The seminal work along this direction by Pu and Yu does not consider the role of query logs in performing query cleaning. Query logs contain user-issued queries together with the segmentations chosen by the user, and thus convey important information that reflects user preferences. In this paper, we explore the use of query logs to improve the quality of keyword query cleaning. We propose two methods to adapt the scoring functions of segmentations to account for information gathered from the logs. The effectiveness of our approach are verified with extensive experiments conducted on real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: enabling keyword search over relational databases. In: SIGMOD (2002)

    Google Scholar 

  2. Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE (2007)

    Google Scholar 

  3. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB (2003)

    Google Scholar 

  4. Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB (2002)

    Google Scholar 

  5. Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD (2006)

    Google Scholar 

  6. Luo, Y., Lin, X., Wang, W., Zhou, X.: SPARK: top- keyword query in relational databases. In: SIGMOD (2007)

    Google Scholar 

  7. Wu, P., Sismanis, Y., Reinwald, B.: Towards keyword-driven analytical processing. In: SIGMOD (2007)

    Google Scholar 

  8. Pu, K.Q., Yu, X.: Keyword query cleaning. In: Proc. VLDB Endow. (2008)

    Google Scholar 

  9. Tan, B., Peng, F.: Unsupervised query segmentation using generative language models and wikipedia. In: WWW (2008)

    Google Scholar 

  10. Sayyadian, M., LeKhac, H., Doan, A., Gravano, L.: Efficient keyword search across heterogeneous relational databases. In: ICDE (2007)

    Google Scholar 

  11. Vu, Q.H., Ooi, B.C., Papadias, D., Tung, A.K.H.: A graph method for keyword-based selection of the top-k databases. In: SIGMOD (2008)

    Google Scholar 

  12. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over xml documents. In: SIGMOD (2003)

    Google Scholar 

  13. Yu, B., Li, G., Sollins, K., Tung, A.K.H.: Effective keyword-based selection of relational databases. In: SIGMOD (2007)

    Google Scholar 

  14. Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in xml trees. TKDE (2006)

    Google Scholar 

  15. Markowetz, A., Yang, Y., Papadias, D.: Keyword search on relational data streams. In: SIGMOD (2007)

    Google Scholar 

  16. Zhang, D., Chee, Y.M., Mondal, A., Tung, A.K.H., Kitsuregawa, M.: Keyword Search in Spatial Databases: Towards Searching by Document. In: ICDE (2009)

    Google Scholar 

  17. Yu, X., Shi, H.: Query Segmentation Using Conditional Random Fields. In: KEYS (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gao, L., Yu, X., Liu, Y. (2011). Keyword Query Cleaning with Query Logs. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23535-1_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23534-4

  • Online ISBN: 978-3-642-23535-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics