Skip to main content

SMERA: Semantic Mixed Approach for Web Query Expansion and Reformulation

  • Chapter
  • First Online:
Advances in Knowledge Discovery and Management

Part of the book series: Studies in Computational Intelligence ((SCI,volume 665))

Abstract

Matching users’ information needs and relevant documents is the basic goal of information retrieval systems. However, relevant documents do not necessarily contain the same terms as the ones in users’ queries. In this paper, we use semantics to better express users’ queries. Furthermore, we distinguish between two types of concepts: those extracted from a set of pseudo relevance documents, and those extracted from a semantic resource such as an ontology. With this distinction in mind we propose a Semantic Mixed query Expansion and Reformulation Approach (SMERA) that uses these two types of concepts to improve web queries. This approach considers several challenges such as the selective choice of expansion terms, the treatment of named entities, and the reformulation of the query in a user-friendly way. We evaluate SMERA on four standard web collections from INEX and TREC evaluation campaigns. Our experiments show that SMERA improves the performance of an information retrieval system compared to non-modified original queries. In addition, our approach provides a statistically significant improvement in precision over a competitive query expansion method while generating concept-based queries that are more comprehensive and easy to interpret.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this paper we define an effective query is the one that obtains good results with standard measures used in evaluation campaigns, in particular, precision measures for the case of web queries.

  2. 2.

    LSI: Latent Semantic Indexing (Deerwester et al. 1990).

  3. 3.

    Our experiments showed no significant difference between using euclidian and cosine distances, in this paper we used euclidian distance because it is more clear for our graphical demonstration in Figs. 3 and 4.

  4. 4.

    http://sourceforge.net/p/lemur/wiki/The%20Indri%20Query%20Language.

  5. 5.

    http://sourceforge.net/p/lemur/wiki/Belief%20Operations/.

References

  • Audeh, B., Beaune, P., & Beigbeder, M. (2013). Recall-oriented evaluation for information retrieval systems. In: Information Retrieval Facility Conference (IRFC), Limassol, Chypre.

    Google Scholar 

  • Barr, C., Jones, R., & Regelson, M. (2008). The linguistic structure of english web-search queries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1021–1030). Association for Computational Linguistics.

    Google Scholar 

  • Bendersky, M., & Croft, W. B. (2008). Discovering key concepts in verbose queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 491–498). ACM.

    Google Scholar 

  • Bendersky, M., Metzler, D., & Croft, W. B. (2012). Effective query formulation with multiple information sources. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (pp. 443–452). ACM.

    Google Scholar 

  • Bendersky, M., Rey, M., & Croft, W. B. (2011). Parameterized concept weighting in verbose queries. In SIGIR. ACM Press.

    Google Scholar 

  • Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Brandao, W., Silva, A., Moura, E., & Ziviani, N. (2011). Exploiting entity semantics for query expansion. In IADIS International Conference WWW/Internet, Rio de Janeiro.

    Google Scholar 

  • Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 299).

    Google Scholar 

  • Deerwester, S., Dumais, S. T., Furnas, G. W., & Landauer, T. K. (1990). Indexing by latent semantic analysis. Society, 41, 391–407.

    Google Scholar 

  • Deveaud, R., Bonnefoy, L., & Bellot, P. (2013). Quantification et identification des concepts implicites d’une requête. In CORIA 2013, La dixième édition de la COnférence en Recherche d’Information et Applications, Neuchâtel.

    Google Scholar 

  • Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI.

    Google Scholar 

  • Hoffart, J., Yosef, M. A., Bordino, I., Furstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., & Weikum, G. (2011). Robust disambiguation of named entities in text. In EMNLP 2011 Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782–792).

    Google Scholar 

  • Huston, S., & Croft, W. B. (2010). Evaluating verbose query processing techniques. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 291–298). ACM.

    Google Scholar 

  • Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management, 36, 207–227.

    Article  Google Scholar 

  • Kumaran, G., & Carvalho, V. R. (2009). Reducing long queries using query quality predictors. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 564). NY, USA: ACM Press.

    Google Scholar 

  • Lavrenko, V., & Croft, W. B. (2001). Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 120–127). NY, USA: ACM Press.

    Google Scholar 

  • Maxwell, K. T., & Croft, W. B. (2013). Compact query term selection using topically related text. In Proceedings of the 36th International ACM SIGIR (pp. 583–592).

    Google Scholar 

  • Metzler, D., & Croft, W. B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing and Management, 40, 735–750.

    Article  Google Scholar 

  • Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 472). NY, USA: ACM Press.

    Google Scholar 

  • Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–244.

    Article  Google Scholar 

  • Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281). ACM.

    Google Scholar 

  • Qiu, Y., & Frei, H. (1993). Concept based query expansion. In Proceedings of the International ACM SIGIR Conference on Research and Development in Informaion Retrieval (Vol. 11, p. 212). NY: ACM.

    Google Scholar 

  • Rocchio, J. J., & Salton, G. (1965). Information search optimization and iterative retrieval techniques. In Fall Joint Computer Conference (pp. 293–305).

    Google Scholar 

  • Shah, C., & Croft, W. B. (2004). Evaluating high accuracy retrieval techniques chirag shah. In SIGIR. ACM Press.

    Google Scholar 

  • Strohman, T., Metzler, D., Turtle, H., & Croft, W. (2004). Indri: A language-model based search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis.

    Google Scholar 

  • Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web (pp. 697–706). ACM.

    Google Scholar 

  • Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In SIGIR 1994. ACM Press.

    Google Scholar 

  • Xu, Y., Ding, F., & Wang, B. (2008). Entity-based query reformulation using wikipedia. In Proceeding of the 17th ACM Conference on Information and Knowledge Mining - CIKM 2008 (p. 1441). NY, USA: ACM Press.

    Google Scholar 

  • Zhao, L., & Callan, J. (2010). Term necessity prediction. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (pp. 259–268). ACM.

    Google Scholar 

  • Zobel, J. (2004). Questioning query expansion: An examination of behaviour and parameters. In SIGIR. ACM Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bissan Audeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Audeh, B., Beaune, P., Beigbeder, M. (2017). SMERA: Semantic Mixed Approach for Web Query Expansion and Reformulation. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45763-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45762-8

  • Online ISBN: 978-3-319-45763-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics