Open Domain Question Answering Based on Retriever-Reader Architecture

Zheng, Dequan; Yang, Jing; Yong, Baishuo

doi:10.1007/978-3-030-92632-8_68

Dequan Zheng^7,8,
Jing Yang⁷ &
Baishuo Yong⁷

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 107))

Included in the following conference series:

International Conference on Business Intelligence and Information Technology

1185 Accesses

Abstract

Open-domain Question Answering (OpenQA) is a trendy research hotspot in the field of Natural Language Processing (NLP). In the past few years, with the in-depth research on Deep Learning (DL) and Pre-training Language Modules (PLM), the research on OpenQA has made rapid progress. OpenQA aims to find relevant documents from a large-scale text corpus and then extract or generate correct answers. In this paper, we introduce the development and research status of OpenQA based on “Retriever-Reader.” Firstly, we briefly expound on the relevant theories and traditional architecture. Then the function and category of the Retriever and Reader and the related existing systems are analyzed. Retriever can be regarded as an Information retrieval (IR) system. The reader can be viewed as a machine reading comprehension system (MRC). The fourth section gives a relatively simple introduction to the latest end-to-end training methods and their related types. Finally, we summarize the content of the paper and look forward to the future development trend.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-Based Textual Knowledge Source

Open-Domain Textual Question Answering Systems

Open-Domain Long-Form Question–Answering Using Transformer-Based Pipeline

Article 03 August 2023

References

Voorhees, E.M.: The TREC-8 question answering track report. In: Trec, Vol. 99, pp 77–82. Citeseer (1999)
Google Scholar
Chen, D.Q., Fisch, A., Weston, J., Bordes, A.: Reading wikipedia to answer open-domain questions. In: 55th ACL, pp. 1870–1879. ACL, Stroudsburg (2017)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on EMNLP, pp. 2383–2392. ACL, Texas (2016)
Google Scholar
Wang, S., et al.: R3: reinforced ranker-reader for open-domain question answering. In: AAAI-18. Louisiana (2018)
Google Scholar
Wang, S., et al.: Evidence aggregation for answer re-ranking in open-domain question answering. In: 6th ICLR. ICLR, Vancouver (2018)
Google Scholar
Lee, J., Yun, S., Kim, H., Ko, M., Kang, J.: Ranking paragraphs for improving answer recall in open-domain question answering. In: Proceedings of the 2018 conference on EMNLP, pp. 565–569. ACL, Brussels (2018)
Google Scholar
Kratzwald, B., Eigenmann, A., Feuerriegel, S.: RankQA: neural question answering with answer re-ranking. In: 57th ACL, pp. 6076–6085. ACL, Florence (2019)
Google Scholar
Kratzwald, B., Feuerriegel, S.: Adaptive document retrieval for deep question answering. In: Proceedings of the 2018 conference on EMNLP, pp. 576–581. ACL, Brussels (2018)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pretraining of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the NAACL, pp 4171–4186. ACL, Minnesota (2019)
Google Scholar
Yang, W., et al.: End-to-end open-domain question answering with BERTserini. In: Proceedings of the 2019 conference of the NAACL, pp. 72–77. ACL, Minnesota (2019)
Google Scholar
Yang, P., Fang, H., Lin, J.J.: Anserini: enabling the use of lucene for information retrieval research. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 1253–1256. ACM, Tokyo (2017)
Google Scholar
Wang, Z., Ng, P., Ma, X., Nallapati, R., Xiang, B.: Multi-passage BERT: a globally normalized BERT Model for open-domain question answering. In: Proceedings of the 2019 Conference on EMNLP and 9th IJCNLP), pp 5878–5882. ACL, Hong Kong (2019
Google Scholar
Lee, K., Chang, M., Toutanova, K.: Latent retrieval for weakly supervised open domain question answering. In: 57th ACL, pp. 6086–6096. ACL, Florence (2019)
Google Scholar
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 conference on EMNLP, pp. 6769–6781. ACL, Online (2020)
Google Scholar
Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: Hal, D., III, Aarti, S. (eds.) Proceedings of the 37th international conference on machine learning, vol. 119, pp. 3929-3938. PMLR, Proceedings of Machine Learning Research (2020)
Google Scholar
Nishida, K., Saito, I., Otsuka, A., Asano, H., Tomita, J.: Retrieve-and-read: multi-task learning of information retrieval and reading comprehension. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp. 647–656. ACM, Torino (2018)
Google Scholar
Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016)
Nie, Y., Wang, S., Bansal, M.: Revealing the importance of semantic retrieval for machine reading at scale. In: Proceedings of the 2019 conference on EMNLP and 9th IJCNLP, pp. 2553–2566. ACL, Hong Kong (2019)
Google Scholar
Zhang, Y., et al.: DC-BERT: decoupling question and document for efficient contextual encoding. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp. 1829–1832. Association for Computing Machinery (2020)
Google Scholar
Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of the 43rd international acm sigir conference on research and development in information retrieval, pp. 39–48. Association for Computing Machinery (2020)
Google Scholar
Zhao, T., Lu, X., Lee, K.: SPARTA: efficient open-domain question answering via sparse transformer matching retrieval. In: Proceedings of the 2021 conference of the NAACL, pp. 565–575. ACL, Online (2021)
Google Scholar
Yang, Z., et al.: HotpotQA: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 conference on EMNLP, pp. 2369–2380. ACL, Brussels (2018)
Google Scholar
Qi, P,. Lin, X., Mehr, L., Wang, Z., Manning, C.D.: Answering complex open-domain questions through iterative query generation. In: Proceedings of the 2019 conference on EMNLP and 9th IJCNLP, pp. 2590–2602. ACL, Hong Kong (2019)
Google Scholar
Das, R., Dhuliawala, S., Zaheer, M., McCallum, A.: Multi-step retriever-reader interaction for scalable open-domain question answering. In: 7th ICLR. ICLR, Louisiana (2019)
Google Scholar
Feldman, Y., El-Yaniv, R.: Multi-Hop paragraph retrieval for open-domain question answering. In: 57th ACL, pp. 2296–2309. ACL, Florence (2019)
Google Scholar
Asai, A., Hashimoto, K., Hajishirzi, H., Socher, R., Xiong, C.: Learning to retrieve reasoning paths over wikipedia graph for question answering. In: 7th ICLR. ICLR, Louisiana (2019)
Google Scholar
Lin, Y., Ji, H., Liu, Z., Sun, M.: Denoising distantly supervised open-domain question answering. In: 56th ACL, pp. 1736–1745. ACL, Melbourne (2018)
Google Scholar
Min, S., Chen, D., Zettlemoyer, L., Hajishirzi, H.: Knowledge guided text retrieval and reading for open domain question answering. arXiv preprint arXiv:1911.03868 (2019)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692(2019)
Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: improving pretraining by representing and predicting spans. Trans. Assoc Comput. Linguistics 8, 64–77 (2019)
Article Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: 58th ACL, pp. 7871–7880. ACL, Online (2020)
Google Scholar
Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive nlp tasks. arXiv preprint arXiv:2005.11401 (2020)
Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering. In: 16th EACL, pp. 874–880. ACL, Online (2021)
Google Scholar
Qu, Y., et al.: RocketQA: an optimized training approach to dense passage retrieval for open-domain question answering. In: Proceedings of the 2021 conference of the NAACL, pp. 5835–5847. ACL, Online (2021)
Google Scholar
Seo, M., et al.: Real-time open-domain question answering with dense-sparse phrase index. In: 57th ACL, pp. 4430–4441. ACL, Florence (2019)
Google Scholar
Seo, M., Kwiatkowski, T., Parikh, A.P., Farhadi, A., Hajishirzi, H.: Phrase-indexed question answering: a new challenge for scalable document comprehension. In: Proceedings of the 2018 conference on EMNLP, pp. 559–564. ACL, Brussels (2018)
Google Scholar
Lee, J., Seo, M., Hajishirzi, H., Kang, J.: Contextualized sparse representations for real-time open-domain question answering. In: 58th ACL, pp. 912–919. ACL, Online (2020)
Google Scholar
Lee, J., Sung, M., Kang, J., Chen, D.: Learning dense representations of phrases at scale. In: 59th ACL and 11th IJCNLP, pp. 6634–6647. ACL, Online (2021)
Google Scholar
Yamada, I., Asai, A., Hajishirzi, H.: Efficient passage retrieval with hashing for open-domain question answering. In: 59th ACL and 11th IJCNLP, pp. 979–986. ACL, Online (2021)
Google Scholar
Baudiš, P., Šedivý, J.: Modeling of the question answering task in the yodaqa system. In: International conference of the cross-language evaluation forum for European languages, pp. 222–228. Springer (2015)
Google Scholar
Berant, J., Chou, A.K., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: Proceedings of the 2013 Conference on EMNLP, pp. 1533–1544. ACL, Washington (2013)
Google Scholar
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research, vol. 7, pp. 452–466 (2019)
Google Scholar
Dhingra, B., Mazaitis, K., Cohen, W.W.: Quasar: datasets for question answering by search and reading. arXiv preprint arXiv:1707.03904 (2017)

Download references

Author information

Authors and Affiliations

School of Computer and Information Engineering, Harbin University of Commerce, Harbin, 150028, Heilongjiang, China
Dequan Zheng, Jing Yang & Baishuo Yong
Institue of System Engineering, Harbin University of Commerce, Harbin, 150028, China
Dequan Zheng

Authors

Dequan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Baishuo Yong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dequan Zheng .

Editor information

Editors and Affiliations

Information Technology Department, Cairo University, Giza, Egypt
Aboul Ella Hassanien
School of Computer and Information Engineering, Harbin University of Commerce, Harbin, Heilongjiang, China
Yaoqun Xu
School of Computer and Information Engineering, Harbin University of Commerce, Harbin, Heilongjiang, China
Zhijie Zhao
Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
Sabah Mohammed
School of Computer and Information Engineering, Harbin University of Commerce, Harbin, Heilongjiang, China
Zhipeng Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, D., Yang, J., Yong, B. (2022). Open Domain Question Answering Based on Retriever-Reader Architecture. In: Hassanien, A.E., Xu, Y., Zhao, Z., Mohammed, S., Fan, Z. (eds) Business Intelligence and Information Technology. BIIT 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 107. Springer, Cham. https://doi.org/10.1007/978-3-030-92632-8_68

Download citation

DOI: https://doi.org/10.1007/978-3-030-92632-8_68
Published: 16 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92631-1
Online ISBN: 978-3-030-92632-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Open Domain Question Answering Based on Retriever-Reader Architecture

Abstract

Access this chapter

Similar content being viewed by others

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-Based Textual Knowledge Source

Open-Domain Textual Question Answering Systems

Open-Domain Long-Form Question–Answering Using Transformer-Based Pipeline

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Open Domain Question Answering Based on Retriever-Reader Architecture

Abstract

Access this chapter

Similar content being viewed by others

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-Based Textual Knowledge Source

Open-Domain Textual Question Answering Systems

Open-Domain Long-Form Question–Answering Using Transformer-Based Pipeline

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation