Skip to main content

Open Domain Question Answering Based on Retriever-Reader Architecture

  • Conference paper
  • First Online:
Business Intelligence and Information Technology (BIIT 2021)

Abstract

Open-domain Question Answering (OpenQA) is a trendy research hotspot in the field of Natural Language Processing (NLP). In the past few years, with the in-depth research on Deep Learning (DL) and Pre-training Language Modules (PLM), the research on OpenQA has made rapid progress. OpenQA aims to find relevant documents from a large-scale text corpus and then extract or generate correct answers. In this paper, we introduce the development and research status of OpenQA based on “Retriever-Reader.” Firstly, we briefly expound on the relevant theories and traditional architecture. Then the function and category of the Retriever and Reader and the related existing systems are analyzed. Retriever can be regarded as an Information retrieval (IR) system. The reader can be viewed as a machine reading comprehension system (MRC). The fourth section gives a relatively simple introduction to the latest end-to-end training methods and their related types. Finally, we summarize the content of the paper and look forward to the future development trend.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Voorhees, E.M.: The TREC-8 question answering track report. In: Trec, Vol. 99, pp 77–82. Citeseer (1999)

    Google Scholar 

  2. Chen, D.Q., Fisch, A., Weston, J., Bordes, A.: Reading wikipedia to answer open-domain questions. In: 55th ACL, pp. 1870–1879. ACL, Stroudsburg (2017)

    Google Scholar 

  3. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on EMNLP, pp. 2383–2392. ACL, Texas (2016)

    Google Scholar 

  4. Wang, S., et al.: R3: reinforced ranker-reader for open-domain question answering. In: AAAI-18. Louisiana (2018)

    Google Scholar 

  5. Wang, S., et al.: Evidence aggregation for answer re-ranking in open-domain question answering. In: 6th ICLR. ICLR, Vancouver (2018)

    Google Scholar 

  6. Lee, J., Yun, S., Kim, H., Ko, M., Kang, J.: Ranking paragraphs for improving answer recall in open-domain question answering. In: Proceedings of the 2018 conference on EMNLP, pp. 565–569. ACL, Brussels (2018)

    Google Scholar 

  7. Kratzwald, B., Eigenmann, A., Feuerriegel, S.: RankQA: neural question answering with answer re-ranking. In: 57th ACL, pp. 6076–6085. ACL, Florence (2019)

    Google Scholar 

  8. Kratzwald, B., Feuerriegel, S.: Adaptive document retrieval for deep question answering. In: Proceedings of the 2018 conference on EMNLP, pp. 576–581. ACL, Brussels (2018)

    Google Scholar 

  9. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pretraining of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the NAACL, pp 4171–4186. ACL, Minnesota (2019)

    Google Scholar 

  10. Yang, W., et al.: End-to-end open-domain question answering with BERTserini. In: Proceedings of the 2019 conference of the NAACL, pp. 72–77. ACL, Minnesota (2019)

    Google Scholar 

  11. Yang, P., Fang, H., Lin, J.J.: Anserini: enabling the use of lucene for information retrieval research. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 1253–1256. ACM, Tokyo (2017)

    Google Scholar 

  12. Wang, Z., Ng, P., Ma, X., Nallapati, R., Xiang, B.: Multi-passage BERT: a globally normalized BERT Model for open-domain question answering. In: Proceedings of the 2019 Conference on EMNLP and 9th IJCNLP), pp 5878–5882. ACL, Hong Kong (2019

    Google Scholar 

  13. Lee, K., Chang, M., Toutanova, K.: Latent retrieval for weakly supervised open domain question answering. In: 57th ACL, pp. 6086–6096. ACL, Florence (2019)

    Google Scholar 

  14. Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 conference on EMNLP, pp. 6769–6781. ACL, Online (2020)

    Google Scholar 

  15. Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: Hal, D., III, Aarti, S. (eds.) Proceedings of the 37th international conference on machine learning, vol. 119, pp. 3929-3938. PMLR, Proceedings of Machine Learning Research (2020)

    Google Scholar 

  16. Nishida, K., Saito, I., Otsuka, A., Asano, H., Tomita, J.: Retrieve-and-read: multi-task learning of information retrieval and reading comprehension. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp. 647–656. ACM, Torino (2018)

    Google Scholar 

  17. Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016)

  18. Nie, Y., Wang, S., Bansal, M.: Revealing the importance of semantic retrieval for machine reading at scale. In: Proceedings of the 2019 conference on EMNLP and 9th IJCNLP, pp. 2553–2566. ACL, Hong Kong (2019)

    Google Scholar 

  19. Zhang, Y., et al.: DC-BERT: decoupling question and document for efficient contextual encoding. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp. 1829–1832. Association for Computing Machinery (2020)

    Google Scholar 

  20. Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of the 43rd international acm sigir conference on research and development in information retrieval, pp. 39–48. Association for Computing Machinery (2020)

    Google Scholar 

  21. Zhao, T., Lu, X., Lee, K.: SPARTA: efficient open-domain question answering via sparse transformer matching retrieval. In: Proceedings of the 2021 conference of the NAACL, pp. 565–575. ACL, Online (2021)

    Google Scholar 

  22. Yang, Z., et al.: HotpotQA: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 conference on EMNLP, pp. 2369–2380. ACL, Brussels (2018)

    Google Scholar 

  23. Qi, P,. Lin, X., Mehr, L., Wang, Z., Manning, C.D.: Answering complex open-domain questions through iterative query generation. In: Proceedings of the 2019 conference on EMNLP and 9th IJCNLP, pp. 2590–2602. ACL, Hong Kong (2019)

    Google Scholar 

  24. Das, R., Dhuliawala, S., Zaheer, M., McCallum, A.: Multi-step retriever-reader interaction for scalable open-domain question answering. In: 7th ICLR. ICLR, Louisiana (2019)

    Google Scholar 

  25. Feldman, Y., El-Yaniv, R.: Multi-Hop paragraph retrieval for open-domain question answering. In: 57th ACL, pp. 2296–2309. ACL, Florence (2019)

    Google Scholar 

  26. Asai, A., Hashimoto, K., Hajishirzi, H., Socher, R., Xiong, C.: Learning to retrieve reasoning paths over wikipedia graph for question answering. In: 7th ICLR. ICLR, Louisiana (2019)

    Google Scholar 

  27. Lin, Y., Ji, H., Liu, Z., Sun, M.: Denoising distantly supervised open-domain question answering. In: 56th ACL, pp. 1736–1745. ACL, Melbourne (2018)

    Google Scholar 

  28. Min, S., Chen, D., Zettlemoyer, L., Hajishirzi, H.: Knowledge guided text retrieval and reading for open domain question answering. arXiv preprint arXiv:1911.03868 (2019)

  29. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692(2019)

  30. Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: improving pretraining by representing and predicting spans. Trans. Assoc Comput. Linguistics 8, 64–77 (2019)

    Article  Google Scholar 

  31. Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)

  32. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)

  33. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: 58th ACL, pp. 7871–7880. ACL, Online (2020)

    Google Scholar 

  34. Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive nlp tasks. arXiv preprint arXiv:2005.11401 (2020)

  35. Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering. In: 16th EACL, pp. 874–880. ACL, Online (2021)

    Google Scholar 

  36. Qu, Y., et al.: RocketQA: an optimized training approach to dense passage retrieval for open-domain question answering. In: Proceedings of the 2021 conference of the NAACL, pp. 5835–5847. ACL, Online (2021)

    Google Scholar 

  37. Seo, M., et al.: Real-time open-domain question answering with dense-sparse phrase index. In: 57th ACL, pp. 4430–4441. ACL, Florence (2019)

    Google Scholar 

  38. Seo, M., Kwiatkowski, T., Parikh, A.P., Farhadi, A., Hajishirzi, H.: Phrase-indexed question answering: a new challenge for scalable document comprehension. In: Proceedings of the 2018 conference on EMNLP, pp. 559–564. ACL, Brussels (2018)

    Google Scholar 

  39. Lee, J., Seo, M., Hajishirzi, H., Kang, J.: Contextualized sparse representations for real-time open-domain question answering. In: 58th ACL, pp. 912–919. ACL, Online (2020)

    Google Scholar 

  40. Lee, J., Sung, M., Kang, J., Chen, D.: Learning dense representations of phrases at scale. In: 59th ACL and 11th IJCNLP, pp. 6634–6647. ACL, Online (2021)

    Google Scholar 

  41. Yamada, I., Asai, A., Hajishirzi, H.: Efficient passage retrieval with hashing for open-domain question answering. In: 59th ACL and 11th IJCNLP, pp. 979–986. ACL, Online (2021)

    Google Scholar 

  42. Baudiš, P., Šedivý, J.: Modeling of the question answering task in the yodaqa system. In: International conference of the cross-language evaluation forum for European languages, pp. 222–228. Springer (2015)

    Google Scholar 

  43. Berant, J., Chou, A.K., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: Proceedings of the 2013 Conference on EMNLP, pp. 1533–1544. ACL, Washington (2013)

    Google Scholar 

  44. Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research, vol. 7, pp. 452–466 (2019)

    Google Scholar 

  45. Dhingra, B., Mazaitis, K., Cohen, W.W.: Quasar: datasets for question answering by search and reading. arXiv preprint arXiv:1707.03904 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dequan Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, D., Yang, J., Yong, B. (2022). Open Domain Question Answering Based on Retriever-Reader Architecture. In: Hassanien, A.E., Xu, Y., Zhao, Z., Mohammed, S., Fan, Z. (eds) Business Intelligence and Information Technology. BIIT 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 107. Springer, Cham. https://doi.org/10.1007/978-3-030-92632-8_68

Download citation

Publish with us

Policies and ethics