Keyword Query Cleaning with Query Logs

Gao, Lei; Yu, Xiaohui; Liu, Yang

doi:10.1007/978-3-642-23535-1_5

Lei Gao²¹,
Xiaohui Yu^21,22 &
Yang Liu²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Included in the following conference series:

International Conference on Web-Age Information Management

1731 Accesses
2 Citations

Abstract

Keyword queries over databases are often dirty with some irrelevant or incorrect words, which has a negative impact on the efficiency and accuracy of keyword query processing. In addition, the keywords in a given query often form natural segments. For example, the query “Tom Hanks Green Mile” can be considered as consisting of two segments, “Tom Hanks” and “Green Mile”. The goal of keyword query cleaning is to identify the optimal segmentation of the query, with semantic linkage and spelling corrections also considered. Query cleaning not only helps obtaining queries of higher quality, but also improves the efficiency of query processing by reducing the search space. The seminal work along this direction by Pu and Yu does not consider the role of query logs in performing query cleaning. Query logs contain user-issued queries together with the segmentations chosen by the user, and thus convey important information that reflects user preferences. In this paper, we explore the use of query logs to improve the quality of keyword query cleaning. We propose two methods to adapt the scoring functions of segmentations to account for information gathered from the logs. The effectiveness of our approach are verified with extensive experiments conducted on real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: enabling keyword search over relational databases. In: SIGMOD (2002)
Google Scholar
Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE (2007)
Google Scholar
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB (2003)
Google Scholar
Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB (2002)
Google Scholar
Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD (2006)
Google Scholar
Luo, Y., Lin, X., Wang, W., Zhou, X.: SPARK: top- keyword query in relational databases. In: SIGMOD (2007)
Google Scholar
Wu, P., Sismanis, Y., Reinwald, B.: Towards keyword-driven analytical processing. In: SIGMOD (2007)
Google Scholar
Pu, K.Q., Yu, X.: Keyword query cleaning. In: Proc. VLDB Endow. (2008)
Google Scholar
Tan, B., Peng, F.: Unsupervised query segmentation using generative language models and wikipedia. In: WWW (2008)
Google Scholar
Sayyadian, M., LeKhac, H., Doan, A., Gravano, L.: Efficient keyword search across heterogeneous relational databases. In: ICDE (2007)
Google Scholar
Vu, Q.H., Ooi, B.C., Papadias, D., Tung, A.K.H.: A graph method for keyword-based selection of the top-k databases. In: SIGMOD (2008)
Google Scholar
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over xml documents. In: SIGMOD (2003)
Google Scholar
Yu, B., Li, G., Sollins, K., Tung, A.K.H.: Effective keyword-based selection of relational databases. In: SIGMOD (2007)
Google Scholar
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in xml trees. TKDE (2006)
Google Scholar
Markowetz, A., Yang, Y., Papadias, D.: Keyword search on relational data streams. In: SIGMOD (2007)
Google Scholar
Zhang, D., Chee, Y.M., Mondal, A., Tung, A.K.H., Kitsuregawa, M.: Keyword Search in Spatial Databases: Towards Searching by Document. In: ICDE (2009)
Google Scholar
Yu, X., Shi, H.: Query Segmentation Using Conditional Random Fields. In: KEYS (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Technology, Shandong University, Jinan, China
Lei Gao, Xiaohui Yu & Yang Liu
School of Information Technology, York University, Toronto, Canada
Xiaohui Yu

Authors

Lei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Asia, 5 Danling Rd., Haidian District, 100190, Beijing, China
Haixun Wang
Computer School, Wuhan University, 16 Luojiashan Road, 430072, Hubei, China
Shijun Li
Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, 060-0814, Hokkaido, Sapporo, Japan
Satoshi Oyama
College of Information Science and Technology, Drexel University, 19104, Philadelphia, PA, USA
Xiaohua Hu
State Key Laboratory of Software Engineering, Wuhan University, 16 Luojiashan Road, 430072, Wuhan, Hubei, China
Tieyun Qian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, L., Yu, X., Liu, Y. (2011). Keyword Query Cleaning with Query Logs. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-23535-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics