Advertisement

Overcoming low-utility facets for complex answer retrieval

  • Sean MacAvaney
  • Andrew Yates
  • Arman Cohan
  • Luca Soldaini
  • Kai Hui
  • Nazli Goharian
  • Ophir Frieder
Knowledge Graphs and Semantics in Text Analysis and Retrieval
  • 41 Downloads

Abstract

Many questions cannot be answered simply; their answers must include numerous nuanced details and context. Complex Answer Retrieval (CAR) is the retrieval of answers to such questions. These questions can be constructed from a topic entity (e.g., ‘cheese’) and a facet (e.g., ‘health effects’). While topic matching has been thoroughly explored, we observe that some facets use general language that is unlikely to appear verbatim in answers, exhibiting low utility. In this work, we present an approach to CAR that identifies and addresses low-utility facets. First, we propose two estimators of facet utility: the hierarchical structure of CAR queries, and facet frequency information from training data. Then, to improve the retrieval performance on low-utility headings, we include entity similarity scores using embeddings trained from a CAR knowledge graph, which captures the context of facets. We show that our methods are effective by applying them to two leading neural ranking techniques, and evaluating them on the TREC CAR dataset. We find that our approach perform significantly better than the unmodified neural ranker and other leading CAR techniques, yielding state-of-the-art results. We also provide a detailed analysis of our results, verify that low-utility facets are indeed difficult to match, and that our approach improves the performance for these difficult queries.

Keywords

Complex answer retrieval Knowledge graphs Neural information retrieval Reranking 

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In: The semantic web (pp. 722–735).CrossRefGoogle Scholar
  2. Bollacker, K. D., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (pp. 1247–1250).Google Scholar
  3. Bordes, A., Usunier, N., García-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems (pp. 2787–2795.Google Scholar
  4. Dai, Z., Xiong, C., Callan, J. P., & Liu, Z. (2018). Convolutional neural networks for soft-matching n-grams in ad-hoc search. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 126–134).Google Scholar
  5. Daiber, J., Jakob, M., Hokamp, C., & Mendes, P. N. (2013). Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems (pp. 121–124).Google Scholar
  6. Dalton, J., Dietz, L., & Allan, J. (2014). Entity query feature expansion using knowledge base links. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, ACM (pp. 365–374).Google Scholar
  7. Dietz, L., & Gamari, B. (2017). TREC CAR: A data set for complex answer retrieval (version 1.5). http://trec-car.cs.unh.edu. Accessed 2 May 2018.
  8. Dietz, L., Verma, M., Radlinski, F., & Craswell, N. (2017). TREC complex answer retrieval overview. In: Proceedings of TREC.Google Scholar
  9. Guo, J., Fan, Y., Ai, Q., & Croft, W. B. (2016). A deep relevance matching model for Ad-hoc retrieval. In: Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 55–64).Google Scholar
  10. Heilman, J. M., & West, A. G. (2015). Wikipedia and medicine: quantifying readership, editors, and the significance of natural language. Journal of medical Internet research, 17(3), e62.CrossRefGoogle Scholar
  11. Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management (pp. 2333–2338).Google Scholar
  12. Hui, K., Yates, A., Berberich, K., & de Melo, G. (2017). PACRR: A position-aware neural IR model for relevance matching. In: Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 1049–1058).Google Scholar
  13. Hui, K., Yates, A., Berberich, K., & de Melo, G. (2018). Co-PACRR: A context-aware neural IR model for ad-hoc retrieval. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 279–287).Google Scholar
  14. Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 2: Short Papers) (vol 2, pp. 302–308).Google Scholar
  15. Lin, X., & Lam, W. (2017), CUIS team for TREC 2017 CAR track. In: Proceedings of TREC.Google Scholar
  16. MacAvaney, S., Hui, K., & Yates, A. (2017a). An approach for weakly-supervised deep information retrieval. In: SIGIR 2017 workshop on neural information retrieval.Google Scholar
  17. MacAvaney, S., Yates, A., & Hui, K. (2017b). Contextualized PACRR for complex answer retrieval. In: Proceedings of TREC. .Google Scholar
  18. MacAvaney, S., Yates, A., Cohan, A., Soldaini, L., Hui, K., Goharian, N., & Frieder, O. (2018). Characterizing question facets for complex answer retrieval. In: Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval (pp. 1205–1208).Google Scholar
  19. Maldonado, R., Taylor, S., & Harabagiu, S. M. (2017). UTD HLTRI at TREC 2017: Complex answer retrieval track. In: Proceedings of TREC.Google Scholar
  20. Metzler, D., & Croft, W. B. (2005). A markov random field model for term dependencies. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 472–479).Google Scholar
  21. Mitra, B., Diaz, F., & Craswell, N. (2017). Learning to match using local and distributed representations of text for web search. In: Proceedings of the 26th International Conference on World Wide Web (pp. 1291–1299).Google Scholar
  22. Nanni, F., Mitra, B., Magnusson, M., & Dietz, L. (2017). Benchmark for complex answer retrieval. In: Proceedings of the ACM SIGIR international conference on theory of information retrieval (pp. 293–296).Google Scholar
  23. Nickel, M., Rosasco, L., & Poggio, T. A. (2016). Holographic embeddings of knowledge graphs. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (pp. 1955–1961).Google Scholar
  24. Nogueira, R., & Cho, K. (2017). Task-oriented query reformulation with reinforcement learning. In: Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 574–583).Google Scholar
  25. Nogueira, R., Cho, K., Patel, U., & Chabot, V. (2017). New york university submission to TREC-CAR 2017. In: Proceedings of TREC.Google Scholar
  26. Pang, L., Lan, Y., Guo, J., Xu, J., & Cheng, X. (2016). 2016. A study of MatchPyramid models on ad-hoc retrieval. In: NeuIR at SIGIR.Google Scholar
  27. Pang, L., Lan, Y., Guo, J., Xu, J., Xu, J., & Cheng, X. (2017). DeepRank: A new deep architecture for relevance ranking in information retrieval. In: Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 257–266).Google Scholar
  28. Sakai, T., & Kando, N. (2008). On information retrieval metrics designed for evaluation with incomplete relevance assessments. Information Retrieval, 11(5), 447–470.CrossRefGoogle Scholar
  29. Schuhmacher, M., Dietz, L., & Ponzetto, S. P. (2015). Ranking entities for web queries through text and knowledge. In: Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1461–1470).Google Scholar
  30. Singh, A. (2012). Entity based Q&A retrieval. In: Proceedings of the 2012 Joint conference on empirical methods in natural language processing and computational natural language learning (pp. 1266–1277.Google Scholar
  31. Singh, S., Subramanya, A., Pereira, F., & McCallum, A. (2012). Wikilinks: A large-scale cross-document coreference corpus labeled via links to wikipedia. University of Massachusetts, Amherst, Technical Report UM-CS-2012 15.Google Scholar
  32. Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge graph embedding by translating on hyperplanes. In: Twenty-Eighth AAAI conference on artificial intelligence.Google Scholar
  33. Xiong, C., & Callan, J. (2015). Query expansion with Freebase. In: Proceedings of the 2015 international conference on the theory of information retrieval, ACM (pp. 111–120).Google Scholar
  34. Xiong, C., Callan, J. P., & Liu, T. -Y. (2017). Word-entity duet representations for document ranking. In: Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval.Google Scholar
  35. Xiong, C., Dai, Z., Callan, J., Liu, Z., & Power, R. (2017). End-to-end neural ad-hoc ranking with kernel pooling. In: Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval, ACM (pp. 55–64).Google Scholar
  36. Yih, W.-t., Chang, M.-W., He, X., & Gao, J. (2015). Semantic parsing via staged query graph generation: Question answering with knowledge base. In: Proceedings of the 53rd annual meeting of the association for computational linguistics (pp. 1321–1331).Google Scholar
  37. Zamani, H., Mitra, B., Song, X., Craswell, N., & Tiwary, S. (2018). Neural ranking models with multiple document fields. In: Proceedings of the eleventh ACM international conference on web search and data mining (pp. 700–708).Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  1. 1.Information Retrieval Laboratory, Computer Science DepartmentGeorgetown UniversityWashingtonUSA
  2. 2.Max Planck Institute for InformaticsSaarbrückenGermany
  3. 3.Allen Institute for Artificial IntelligenceSeattleUSA
  4. 4.SAP SE Machine Learning R&DBerlinGermany

Personalised recommendations