Abstract
Transformer-based language models (LMs) have been shown to perform question answering (QA) competitively even when removing context and using only questions as input (called closed-book QA). Previous work that studied closed-book has mainly used simple questions that require a single reasoning step (i.e., single-hop questions). In this study, we find that using multi-hop questions requiring multiple reasoning steps drastically drops the performance. We investigate how to close this gap using two methods: fine-tuning with explicit question decomposition using three decomposition systems, or few-shot learning with chain-of-thoughts (CoT) for implicit question decomposition. We experiment on three multi-hop datasets, considering different multi-hop question types (i.e., compositional, comparison, etc.). We demonstrate when the methods fail and identify future directions that are most promising to closing the gap between single-hop and multi-hop closed-book QA. We release the code: https://github.com/talkhaldi/mh_cbqa.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We tried zero-shot mode, but without few-shot examples, the LM gave a full sentence as the answer, mismatching the target. Therefore, we exclude zero-shot experiments.
- 2.
We also tried T5-11b but achieved similar accuracy.
- 3.
Previous work experimented on datasets whose answers were numerical, yes/no, or multiple choice.
References
Alkhaldi, T., Chu, C., Kurohashi, S.: Flexibly focusing on supporting facts, using bridge links, and jointly training specialized modules for multi-hop question answering. IEEE/ACM Trans. Audio, Speech, Lang. Process. 29, 3216–3225 (2021). https://doi.org/10.1109/TASLP.2021.3120643
Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on Freebase from question-answer pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1533–1544. Association for Computational Linguistics, Seattle, Washington, USA (2013). https://aclanthology.org/D13-1160
Brown, T., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Fang, Y., Sun, S., Gan, Z., Pillai, R., Wang, S., Liu, J.: Hierarchical graph network for multi-hop question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8823–8838. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.710, https://aclanthology.org/2020.emnlp-main.710
Ho, X., Duong Nguyen, A.K., Sugawara, S., Aizawa, A.: Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6609–6625. International Committee on Computational Linguistics, Barcelona, Spain (Online) (2020). https://doi.org/10.18653/v1/2020.coling-main.580, https://aclanthology.org/2020.coling-main.580
Jiang, Z., Araki, J., Ding, H., Neubig, G.: Understanding and improving zero-shot multi-hop reasoning in generative question answering. In: International Conference on Computational Linguistics (COLING). Gyeongju, Korea (2022)
Joshi, M., Choi, E., Weld, D., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (vol.1: Long Papers), pp. 1601–1611. Association for Computational Linguistics, Vancouver, Canada (2017). https://doi.org/10.18653/v1/P17-1147, https://aclanthology.org/P17-1147
Khot, T., Khashabi, D., Richardson, K., Clark, P., Sabharwal, A.: Text modular networks: learning to decompose tasks in the language of existing models. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1264–1279. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.99, https://aclanthology.org/2021.naacl-main.99
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. In: ICML 2022 Workshop on Knowledge Retrieval and Language Models (2022). https://openreview.net/forum?id=6p3AuaHAFiN
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276, https://aclanthology.org/Q19-1026
Min, S., Zhong, V., Zettlemoyer, L., Hajishirzi, H.: Multi-hop reading comprehension through question decomposition and rescoring. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6097–6109. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1613, https://aclanthology.org/P19-1613
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020), http://jmlr.org/papers/v21/20-074.html
Roberts, A., Raffel, C., Shazeer, N.: How much knowledge can you pack into the parameters of a language model? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5418–5426. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.437, https://aclanthology.org/2020.emnlp-main.437
Trivedi, H., Balasubramanian, N., Khot, T., Sabharwal, A.: MuSiQue: multihop questions via single-hop question composition. Trans. Assoc. Comput. Linguist. 10, 539–554 (2022). https://doi.org/10.1162/tacl_a_00475, https://aclanthology.org/2022.tacl-1.31
Wei, J., et al.: Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 (2022)
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., Manning, C.D.: HotpotQA: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2369–2380. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1259, https://aclanthology.org/D18-1259
Ye, Q., et al.: Studying strategically: learning to mask for closed-book QA. arXiv preprint arXiv:2012.15856 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Alkhaldi, T., Chu, C., Kurohashi, S. (2023). Investigating the Gap Between Single-Hop and Multi-Hop Questions in Closed-Book Question Answering via Question Decomposition. In: Mehmood, R., et al. Distributed Computing and Artificial Intelligence, Special Sessions I, 20th International Conference. DCAI 2023. Lecture Notes in Networks and Systems, vol 741. Springer, Cham. https://doi.org/10.1007/978-3-031-38318-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-38318-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38317-5
Online ISBN: 978-3-031-38318-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)