Advertisement

Journal of Intelligent Information Systems

, Volume 50, Issue 3, pp 455–478 | Cite as

Query expansion using pseudo relevance feedback on wikipedia

  • Andisheh Keikha
  • Faezeh Ensan
  • Ebrahim Bagheri
Article

Abstract

One of the major challenges in Web search pertains to the correct interpretation of users’ intent. Query Expansion is one of the well-known approaches for determining the intent of the user by addressing the vocabulary mismatch problem. A limitation of the current query expansion approaches is that the relations between the query terms and the expanded terms is limited. In this paper, we capture users’ intent through query expansion. We build on earlier work in the area by adopting a pseudo-relevance feedback approach; however, we advance the state of the art by proposing an approach for feature learning within the process of query expansion. In our work, we specifically consider the Wikipedia corpus as the feedback collection space and identify the best features within this context for term selection in two supervised and unsupervised models. We compare our work with state of the art query expansion techniques, the results of which show promising robustness and improved precision.

Keywords

Query suggestion Query expansion Wikipedia Web search 

References

  1. Aha, D.W., & Bankert, R.L. (1996). A comparative evaluation of sequential feature selection algorithms. In Learning from data (pp. 199–206). Springer.Google Scholar
  2. Al-Shboul, B., & Myaeng, S.H. (2011). Query phrase expansion using wikipedia in patent class search. In Information retrieval technology (pp. 115126). Springer.Google Scholar
  3. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). Dbpedia: a nucleus for a web of open data. Springer.Google Scholar
  4. Bendersky, M., Metzler, D., & Croft, W.B. (2012). Effective query formulation with multiple information sources. In Proceedings of the fifth ACM international conference on web search and data mining, ACM (pp. 443–452).Google Scholar
  5. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM (pp. 1247–1250).Google Scholar
  6. Bruce, C., Gao, X., Andreae, P., & Jabeen, S. (2012). Query expansion powered by wikipedia hyperlinks. In AI 2012: advances in artificial intelligence (pp. 421–432). Springer.Google Scholar
  7. Buckley, C., & Voorhees, E.M. (2004). Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 25–32).Google Scholar
  8. Carpineto, C., De Mori, R., Romano, G., & Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems (TOIS), 19(1), 1–27.CrossRefGoogle Scholar
  9. Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1), 1.CrossRefMATHGoogle Scholar
  10. Chakaravarthy, V.T., Gupta, H., Roy, P., & Mohania, M. (2006). Efficiently linking text documents with relevant structured information. In Proceedings of the 32nd international conference on very large data bases, VLDB endowment (pp. 667–678).Google Scholar
  11. Cheung, J.C.K., & Li, X. (2012). Sequence clustering and labeling for unsupervised query intent discovery. In Proceedings of the fifth ACM international conference on web search and data mining, ACM (pp. 383–392).Google Scholar
  12. Crabtree, D.W., Andreae, P., & Gao, X. (2007). Exploiting underrepresented query aspects for automatic query expansion. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 191–200).Google Scholar
  13. Crabtree, D.W., Andreae, P., & Gao, X. (2007). Exploiting underrepresented query aspects for automatic query expansion. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 191–200).Google Scholar
  14. Craswell, N., & Szummer, M. (2007). Random walks on the click graph. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 239–246).Google Scholar
  15. Croft, W.B., Metzler, D., & Strohman, T. (2010). Search engines: information retrieval in practice. Reading: Addison-Wesley.Google Scholar
  16. Dalton, J., Dietz, L., & Allan, J. (2014). Entity query feature expansion using knowledge base links. In Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, ACM (pp. 365–374).Google Scholar
  17. Dang, V., & Croft, B.W. (2010). Query reformulation using anchor text. In Proceedings of the third ACM international conference on web search and data mining, ACM (pp. 41–50).Google Scholar
  18. Di Marco, A., & Navigli, R. (2013). Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics, 39(3), 709–754.CrossRefGoogle Scholar
  19. Doszkocs, T.E. (1978). Aid, an associative interactive dictionary for online searching. Online Review, 2(2), 163–173.CrossRefGoogle Scholar
  20. Fellbaum, C. (1998). Wordnet. Wiley Online Library.Google Scholar
  21. Ferragina, P., & Scaiella, U. (2010). Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on information and knowledge management, ACM (pp. 1625–1628).Google Scholar
  22. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.MATHGoogle Scholar
  23. Hatcher, E., & Gospodnetic, O. (2004). Lucene in action. Manning Publications. ISBN: 1932394281.Google Scholar
  24. Hu, J., Wang, G., Lochovsky, F., Sun, J.t., & Chen, Z. (2009). Understanding user’s query intent with wikipedia. In Proceedings of the 18th international conference on world wide web, ACM (pp. 471–480).Google Scholar
  25. Jain, A., & Zongker, D. (1997). Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 153–158.CrossRefGoogle Scholar
  26. Järvelin, K., & Kekäläinen, J. (2000). Ir evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 41–48).Google Scholar
  27. Jovanovic, J., Bagheri, E., Cuzzola, J., Gasevic, D., Jeremic, Z., & Bashash, R. (2014). Automated semantic tagging of textual content. IT Professional, 16(6), 38–46.CrossRefGoogle Scholar
  28. Lavrenko, V., & Croft, W.B. (2001). Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 120–127).Google Scholar
  29. Li, Y., Luk, W.P.R., Ho, K.S.E., & Chung, F.L.K. (2007). Improving weak ad-hoc queries using wikipedia asexternal corpus. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 797–798).Google Scholar
  30. Liu, S., Liu, F., Yu, C., & Meng, W. (2004). An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 266–272).Google Scholar
  31. Liu, X., Bouchoucha, A., Sordoni, A., & Nie, J.Y. (2014). Compact aspect embedding for diversified query expansions. In Proceedings of AAAI (Vol. 14, pp. 115–121).Google Scholar
  32. Meij, E., Bron, M., Hollink, L., Huurnink, B., & De Rijke, M. (2009). Learning semantic query suggestions. The Semantic Web-ISWC, 2009, 424–440.Google Scholar
  33. Mendes, P.N., Jakob, M., García-Silva, A., & Bizer, C. (2011). Dbpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems, ACM (pp. 1–8).Google Scholar
  34. Pass, G., Chowdhury, A., & Torgeson, C. (2006). A picture of search. In Infoscale (Vol. 152, p. 1).Google Scholar
  35. Radlinski, F., Szummer, M., & Craswell, N. (2010). Inferring query intent from reformulations and clicks. In Proceedings of the 19th international conference on world wide web, ACM (pp. 1171–1172).Google Scholar
  36. Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning.Google Scholar
  37. Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv:cmp-lg/9511007.
  38. Robertson, S.E., & Jones, K.S. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–146.CrossRefGoogle Scholar
  39. Robertson, S.E., Walker, S., Beaulieu, M., & Willett, P. (1999). Okapi at trec-7: automatic ad hoc, filtering, vlc and interactive track. Nist Special Publication SP, 253–264.Google Scholar
  40. Rocchio, J.J. (1971). Prentice-Hall series in automatic computation, relevance feedback in information retrieval. In G. Salton (Ed.), The SMART retrieval system: experiments in automatic document processing, chap 14 (pp. 313–323). Englewood Cliffs NJ: Prentice-Hall.Google Scholar
  41. Ruiz, R., Riquelme, J.C., & Aguilar-Ruiz, J.S. (2008). Best agglomerative ranked subset for feature selection, FSDM (pp. 148–162).Google Scholar
  42. Salton, G., & Buckley, C. (1997). Improving retrieval performance by relevance feedback. Readings in Information Retrieval, 24(5), 355–363.Google Scholar
  43. Santamaría, C., Gonzalo, J., & Artiles, J. (2010). Wikipedia as sense inventory to improve diversity in web search results. In Proceedings of the 48th annual meeting of the association for computational linguistics, association for computational linguistics (pp. 1357–1366).Google Scholar
  44. Spink, A., Wolfram, D., Jansen, M.B., & Saracevic, T. (2001). Searching the web: the public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234.CrossRefGoogle Scholar
  45. Xu, J., & Croft, W.B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems (TOIS), 18(1), 79–112.CrossRefGoogle Scholar
  46. Xu, Y., Jones, G.J., & Wang, B. (2009). Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 59–66).Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Andisheh Keikha
    • 1
  • Faezeh Ensan
    • 2
  • Ebrahim Bagheri
    • 1
  1. 1.Ryerson UniversityTorontoCanada
  2. 2.Ferdowsi University of MashhadMashhadIran

Personalised recommendations