Abstract
Keyword queries over databases are often dirty with some irrelevant or incorrect words, which has a negative impact on the efficiency and accuracy of keyword query processing. In addition, the keywords in a given query often form natural segments. For example, the query “Tom Hanks Green Mile” can be considered as consisting of two segments, “Tom Hanks” and “Green Mile”. The goal of keyword query cleaning is to identify the optimal segmentation of the query, with semantic linkage and spelling corrections also considered. Query cleaning not only helps obtaining queries of higher quality, but also improves the efficiency of query processing by reducing the search space. The seminal work along this direction by Pu and Yu does not consider the role of query logs in performing query cleaning. Query logs contain user-issued queries together with the segmentations chosen by the user, and thus convey important information that reflects user preferences. In this paper, we explore the use of query logs to improve the quality of keyword query cleaning. We propose two methods to adapt the scoring functions of segmentations to account for information gathered from the logs. The effectiveness of our approach are verified with extensive experiments conducted on real data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: enabling keyword search over relational databases. In: SIGMOD (2002)
Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE (2007)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB (2003)
Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB (2002)
Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD (2006)
Luo, Y., Lin, X., Wang, W., Zhou, X.: SPARK: top- keyword query in relational databases. In: SIGMOD (2007)
Wu, P., Sismanis, Y., Reinwald, B.: Towards keyword-driven analytical processing. In: SIGMOD (2007)
Pu, K.Q., Yu, X.: Keyword query cleaning. In: Proc. VLDB Endow. (2008)
Tan, B., Peng, F.: Unsupervised query segmentation using generative language models and wikipedia. In: WWW (2008)
Sayyadian, M., LeKhac, H., Doan, A., Gravano, L.: Efficient keyword search across heterogeneous relational databases. In: ICDE (2007)
Vu, Q.H., Ooi, B.C., Papadias, D., Tung, A.K.H.: A graph method for keyword-based selection of the top-k databases. In: SIGMOD (2008)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over xml documents. In: SIGMOD (2003)
Yu, B., Li, G., Sollins, K., Tung, A.K.H.: Effective keyword-based selection of relational databases. In: SIGMOD (2007)
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in xml trees. TKDE (2006)
Markowetz, A., Yang, Y., Papadias, D.: Keyword search on relational data streams. In: SIGMOD (2007)
Zhang, D., Chee, Y.M., Mondal, A., Tung, A.K.H., Kitsuregawa, M.: Keyword Search in Spatial Databases: Towards Searching by Document. In: ICDE (2009)
Yu, X., Shi, H.: Query Segmentation Using Conditional Random Fields. In: KEYS (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, L., Yu, X., Liu, Y. (2011). Keyword Query Cleaning with Query Logs. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-23535-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)