Key Concepts Identification and Weighting in Search Engine Queries

Liu, Jiawang; Ren, Peng

doi:10.1007/978-3-642-20291-9_37

Key Concepts Identification and Weighting in Search Engine Queries

Jiawang Liu²¹ &
Peng Ren²¹

Conference paper

1081 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6612))

Abstract

It has been widely observed that queries of search engine are becoming longer and closer to natural language. Actually, current search engines do not perform well with natural language queries. Accurately discovering the key concepts of these queries can dramatically improve the effectiveness of search engines. It has been shown that queries seem to be composed in a way that how users summarize documents, which is so much similar to anchor texts. In this paper, we present a technique for automatic extraction of key concepts from queries with anchor texts analysis. Compared with using web counts of documents, we proposed a supervised machine learning model to classify the concepts of queries into 3 sets according to their importance and types. In the end of this paper, we also demonstrate that our method has remarkable improvement over the baseline.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hu, J., et al.: Understanding User’s Query Intent with Wikipedia. In: WWW 2009 (2009)
Google Scholar
Bendersky, et al.: Learning Concept Importance Using a Weighted Dependence Model. In: WSDM 2010 (2010)
Google Scholar
Pickens, J., Croft, W.B.: An exploratory analysis of phrases in text retrieval. In: Proc. of RIAO 2000 (1999)
Google Scholar
Mishne, G., et al.: Boosting web retrieval through query operations. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 502–516. Springer, Heidelberg (2005)
Chapter Google Scholar
Auria1, et al.: Support Vector Machines as a Technique for Solvency Analysis (2008)
Google Scholar
Bendersky, M., et al.: Discovering Key Concepts in Verbose Queries. In: SIGIR 2008 (2008)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156 (1996)
Google Scholar
Peng, J., et al.: Incorporating term dependency in the dfr framework. In: SIGIR 2007 (2007)
Google Scholar
Kumaran, et al.: Reducing Long Queries Using Query Quality Predictors. In: SIGIR 2009 (2009)
Google Scholar
Hiemstra, D.: Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term. In: SIGIR 2002 (2002)
Google Scholar
Mei, Q., Fang, H., Zhai, C.: A study of poisson query generation model for information retrieval. In: SIGIR 2007 (2007)
Google Scholar
Tao, et al.: An exploration of proximity measures in information retrieval. In: SIGIR 2007 (2007)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Huang, J., Gao, J., Miao, J., Li, X., Wang, K., Behr, F.: Exploring Web Scale Language Models for Search Query Processing. In: WWW 2010 (2010)
Google Scholar
Ren, P., Yu, Y.: Web site traffic ranking estimation via SVM. In: Huang, D.-S., Zhang, X., Reyes García, C.A., Zhang, L. (eds.) ICIC 2010. LNCS, vol. 6216, pp. 487–494. Springer, Heidelberg (2010)
Google Scholar
Bai, J., Chang, Y., et al.: Investigation of partial query proximity in web search. In: WWW 2008 (2008)
Google Scholar
Bendersky, M., Croft, W.B., Smith, D.A.: Two-stage query segmentation for information retrieval. In: Proc. SIGIR 2009 (2009)
Google Scholar
Cummins, R., O’Riordan, C.: Learning in a pairwise term-term proximity framework for information retrieval
Google Scholar
Metzler, D., et al.: A Markov Random Field model for term dependencies. In: SIGIR 2005 (2005)
Google Scholar
Allan, J., Callan, J., Bruce Croft, W., Ballesteros, L., Broglio, J., Xu, J., Shu, H.: INQUERY at TREC-5. pp. 119-132. NIST (1997)
Google Scholar
Pairwise Comparison, http://en.wikipedia.org/wiki/Pairwise_comparison
Kenneth, et al.: Poisson mixtures. Natural Language Engineering 1(2), 163–190 (1995)
Google Scholar
Deng, H., King, I., Lyu, M.R.: Entropy-biased Models for Query Representation on the Click Graph. In: SIGIR 2009 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Shanghai Jiao Tong Uninversity, China
Jiawang Liu & Peng Ren

Authors

Jiawang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Ren
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information, Renmin University of China, 100872, Beijing, China
Xiaoyong Du
LFCS, School of Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, Edinburgh, Scotland, UK
Wenfei Fan
School of Software, Tsinghua University, Room 819, Main Building, 100084, Beijing, China
Jianmin Wang
Computer School, Wuhan University, Luojiashan Road, 430072, Wuhan, Hubei, China
Zhiyong Peng
School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, St. Lucia, Australia
Mohamed A. Sharaf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Ren, P. (2011). Key Concepts Identification and Weighting in Search Engine Queries. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-20291-9_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20290-2
Online ISBN: 978-3-642-20291-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics