Learning Query Ambiguity Models by Using Search Logs
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
dentifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.
- Song R, Luo Z, Nie J Y, Yu Y, Hon HW. Identification of ambiguous queries in Web search. Information Processing and Management, 2008, 45(2): 216-229. CrossRef
- Dou Z, Song R,Wen J R. A large-scale evaluation and analysis of personalized search strategies. In Proc. the 16th International Conference on World Wide Web (WWW2007), Banff, Canada, May 8-12, 2007, pp.581-590.
- Sanderson M. Ambiguous queries: Test collections need more sense. In Proc. the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, July 20-24, 2008, pp.499-506.
- Radlinski F, Dumais S. Improving personalized Web search using result diversification. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), Seattle, USA, Aug. 6-11, 2006, pp.691-692.
- Li Y, Zheng Z, Dai K. KDD CUP-2005 report: Facing a great challenge. SIGKDD Explor. Newsl., 2005, 7(2): pp.91-99. CrossRef
- Vapnik V N. Principles of Risk Minimization for Learning Theory. Advances in Neural Information Processing Systems 4, Morgan Kaufmann, 1992, pp.831-838.
- Mihalcea R, Pedersen T. Advances in word sense disambiguation. In Tutorials at the 20th National Conference on Artificial Intelligence, Pittsburgh, USA, July 9-13, 2005.
- Krovetz R, Croft B W. Lexical ambiguity and information retrieval. ACM Trans. Inf. Syst., 1992, 10(2): 115-141. CrossRef
- Voorhees E M. Using WordNet to disambiguate word senses for text retrieval. In Proc. the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993), Pittsburgh, USA, June 27-July 1, 1993, pp.171-180.
- Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia, Aug. 24-28, 1998, pp.335-336.
- Zhai C X, Cohen W W, Lafferty J. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proc. the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, Canada, Jul. 28-Aug. 1, 2003, pp.10-17.
- Zhai C X, Lafferty J. A risk minimization framework for information retrieval. Information Processing and Management, 2006, 42(1): 31-55. CrossRef
- Chen H, Karger D R. Less is more: Probabilistic models for retrieving fewer relevant documents. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), Seattle, USA, Aug. 6-11, 2006, pp.429-436.
- Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying search results. In Proc. the Second ACM International Conference on Web Search and Data Mining (WSDM2009), Barcelona, Spain, Feb. 9-12, 2009, pp.5-14.
- Lee U, Liu Z, Cho J. Automatic identification of user goals in Web search. In Proc. the 14th International Conference on World Wide Web (WWW2005), Chiba, Japan, May 10-14, 2005, pp.391-400.
- Dai H (Kathy), Zhao L, Nie Z, Wen J R, Wang L, Li Y. Detecting online commercial intention (OCI). In Proc. the 15th International Conference on World Wide Web (WWW2006), Edinburgh, UK, May 23-26, 2006, pp.829-837.
- Gravano L, Hatzivassiloglou V, Lichtenstein R. Categorizing web queries according to geographical locality. In Proc. the Twelfth International Conference on Information and Knowledge Management (CIKM2003), New Orleans, USA, Nov. 2-8, 2003, pp.325-333.
- Platt J C. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Advanced in Kernel Methods: Support Vector Learning, MIT Press, 1998.
- Cao H, Jiang D, Pei J, He Q, Liao Z, Chen E, Li H. Contextaware query suggestion by mining click-through and session data. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2008), Las Vegas, USA, Aug. 24-27, 2008, pp.875-883.
- Shen D, Pan R, Sun J T, Pan J J, Wu K, Yin J, Yang Q. Q2C@UST: Our winning solution to query classification in KDDCUP 2005. SIGKDD Explor. Newsl., 2005, 7(2): 100-110. CrossRef
- Lin J. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 1991, 37(1): 145-151. CrossRef
- Learning Query Ambiguity Models by Using Search Logs
Journal of Computer Science and Technology
Volume 25, Issue 4 , pp 728-738
- Cover Date
- Print ISSN
- Online ISSN
- Springer US
- Additional Links
- ambiguous query
- log mining
- query classification
- Industry Sectors