qaAskeR $$^+$$ : a novel testing method for question answering software via asking recursive questions

Xie, Xiaoyuan; Jin, Shuo; Chen, Songqiang

doi:10.1007/s10515-023-00380-2

qaAskeR$^+$: a novel testing method for question answering software via asking recursive questions

Published: 28 March 2023

Volume 30, article number 14, (2023)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Xiaoyuan Xie¹,
Shuo Jin¹ &
Songqiang Chen¹

310 Accesses
Explore all metrics

Abstract

Question Answering (QA) is an attractive and challenging area in NLP community. With the development of QA technique, plenty of QA software has been applied in daily human life to provide convenient access of information retrieval. To investigate the performance of QA software, many benchmark datasets have been constructed to provide various test cases. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases are mandatory to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this work, we propose a novel testing method, qaAskeR $^+$, with five new Metamorphic Relations for QA software. qaAskeR $^+$ does not refer to the annotated labels of test cases. Instead, based on the idea that a correct answer should imply a piece of reliable knowledge that always conforms with any other correct answer, qaAskeR $^+$ tests QA software by inspecting its behaviors on multiple recursively asked questions that are relevant to the same or some further enriched knowledge. Experimental results show that qaAskeR $^+$ can reveal quite a few violations that indicate actual answering issues on various mainstream QA software without using any pre-annotated labels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Counterfactual explanations and how to find them: literature review and benchmarking

Article Open access 28 April 2022

Riccardo Guidotti

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

Article 30 January 2023

Nikahat Mulla & Prachi Gharpure

A Systematic Review of Automatic Question Generation for Educational Purposes

Article Open access 21 November 2019

Ghader Kurdi, Jared Leo, … Salam Al-Emari

Notes

Some studies refer them as “open-domain QA” and “closed-domain QA”.
https://spacy.io/
aux is a non-main verb of the clause, including a modal auxiliary and a form of be, do or have in a periphrastic tense. Details could be found at https://universaldependencies.org/.
The similarity score is defined as $s^R(a, b)=\min ($R1Precision(a, b), R1Recall(a, b)), where a and b are two strings while R1Precision and R1Recall are two sub-metrics in ROUGE-1 score (token-wise ROUGE similarity between two sentences).
NatQA includes some questions in miscellaneous and informal forms, e.g., “total number of death row inmates in the us?”.
The official models can be found at their replication packages as follows:
NSM+h: https://github.com/RichardHGL/WSDM2021_NSM.
Multi-hop Complex KBQA: https://github.com/lanyunshi/Multi-hopComplexKBQA.
Macaw: https://github.com/allenai/macaw.
Our method solves the test oracle problem and thus in its practical application, the size of the source test suite for each MR, as well as the eligible ones, can be adjusted freely. In such a case, we report the violation rate to reflect the average violation detection ability of our MRs as their overall effectiveness.
https://www.google.com/
All the questions in MKQA can be answered with the public knowledge from the web.
Based on the search results obtained on April 10, 2021.
Based on the search results obtained on April 8, 2022.

References

Azmy, M., Shi, P., Lin, J. et al.: Farewell freebase: Migrating the simplequestions dataset to dbpedia. In: Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. Association for Computational Linguistics, pp 2093–2103 (2018)
Bao, J., Duan, N., Yan, Z., et al .: Constraint-based question answering with knowledge graph. In: COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan. ACL, pp 2503–2514 (2016)
Berant J, Chou A, Frostig R, et al.: Semantic parsing on freebase from question-answer pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, pp. 1533–1544 (2013)
Bollacker, K.D., Evans, C., Paritosh, P.K., et al .: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008. ACM, pp. 1247–1250, (2008)https://doi.org/10.1145/1376616.1376746
Chandar S, Ahn S, Larochelle H, et al (2016) Hierarchical memory networks. CoRR abs/1605.07427. arXiv: org/abs/1605.07427,
Chen D, Fisch A, Weston J, et al (2017) Reading wikipedia to answer open-domain questions. In: Proceedings of the 2017 Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. Association for Computational Linguistics, pp. 1870–1879, https://doi.org/10.18653/v1/P17-1171
Chen, S., Jin, S., Xie, X.: Testing your question answering software via asking recursively. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, pp. 104–116, (2021a)https://doi.org/10.1109/ASE51524.2021.9678670
Chen, S., Jin, S., Xie, X.: Validation on machine reading comprehension software without annotated labels: A property-based method. In: Proceedings of the 2021 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, Athens, Greece, August 23-28, 2021. ACM, pp. 590–602, (2021b) https://doi.org/10.1145/3468264.3468569
Chen, T.Y., Cheung, S.C., Yiu, S.M.: Metamorphic testing: a new approach for generating next test cases. Tech. Rep. HKUST-CS98-01, Department of Computer Science, The Hong Kong University of Science and Technology (1998)
Chen, T.Y., Kuo, F., Tse, T.H., et. al.: Metamorphic testing and beyond. In: Proceedings of the 2003 International Workshop on Software Technology and Engineering Practice, 19-21 September 2003, Amsterdam, The Netherlands. IEEE Computer Society, pp. 94–100, (2003)https://doi.org/10.1109/STEP.2003.18
Chen, T.Y., Kuo, F.C., Liu, H., et. al.: Metamorphic testing: a review of challenges and opportunities. ACM Comput. Surv. 51(1), 1–27 (2018). https://doi.org/10.1145/3143561
Clark, C., Lee, K., Chang, M. et. al.: Boolq: Exploring the surprising difficulty of natural yes/no questions. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp. 2924–2936, (2019) https://doi.org/10.18653/v1/n19-1300
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
Dong, L., Yang, N., Wang, W., et. al.: Unified language model pre-training for natural language understanding and generation. In: Proceedings of the 2019 Annual Conference on Neural Information Processing Systems, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 13,042–13,054 (2019)
Dzendzik, D., Vogel, C., Foster, J.: English machine reading comprehension datasets: A survey. CoRR abs/2101.10421. (2021) arXiv: 2101.10421
Eger, S., Benz, Y.: From hero to zéroe: A benchmark of low-level adversarial attacks. In: Proceedings of the 2020 Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2020, Suzhou, China, December 4-7, 2020. Association for Computational Linguistics, pp. 786–803 (2020)
Gardner, M., Artzi, Y., Basmova, V., et. al: Evaluating models’ local decision boundaries via contrast sets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16-20 November 2020. Association for Computational Linguistics, pp. 1307–1323, (2020) https://doi.org/10.18653/v1/2020.findings-emnlp.117
Gupta, M., Kulkarni, N., Chanda, R., et. al.: Amazonqa: A review-based question answering task. In: Proceedings of the 2019 International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. ijcai.org, pp. 4996–5002, (2019) https://doi.org/10.24963/ijcai.2019/694
Gupta, S., He, P., Meister, C., et. al.: Machine translation testing via pathological invariance. In: Proceedings of the 2020 Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, ESEC/FSE 2020, USA, November 8-13, 2020. ACM, pp. 863–875, https://doi.org/10.1145/3368089.3409756 (2020)
Han, J., Cheng, B., Wang, X.: Open domain question answering based on text enhanced knowledge graph with hyperedge infusion. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, Findings of ACL, vol EMNLP 2020. Association for Computational Linguistics, pp. 1475–1481, (2020)https://doi.org/10.18653/v1/2020.findings-emnlp.133
He, G., Lan, Y., Jiang, J., et. al.: Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In: WSDM ’21, The Fourteenth ACM International Conference on Web Search and Data Mining, Virtual Event, Israel, March 8-12, 2021. ACM, pp. 553–561, (2021a)https://doi.org/10.1145/3437963.3441753
He, P., Meister, C., Su, Z.: Structure-invariant testing for machine translation. In: Proceedings of the 2020 International Conference on Software Engineering, Seoul, ICSE 2020, South Korea, 27 June - 19 July, 2020. ACM, pp. 961–973, (2020) https://doi.org/10.1145/3377811.3380339
He, P., Meister, C., Su, Z.: Testing machine translation via referential transparency. In: Proceedings of the 2021 International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, pp. 410–422, (2021b) https://doi.org/10.1109/ICSE43902.2021.00047
He, W., Liu, K., Liu, J., et al.: Dureader: a chinese machine reading comprehension dataset from real-world applications. In: Proceedings of 2018 the Workshop on Machine Reading for Question Answering@ACL 2018, Melbourne, Australia, July 19, 2018. Association for Computational Linguistics, pp. 37–46, (2018) https://doi.org/10.18653/v1/W18-2605
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017. Association for Computational Linguistics, pp. 2021–2031, (2017) https://doi.org/10.18653/v1/d17-1215
Jin, Q., Dhingra, B., Liu, Z., et. al.: Pubmedqa: A dataset for biomedical research question answering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. Association for Computational Linguistics, pp. 2567–2577, (2019) https://doi.org/10.18653/v1/D19-1259
Khashabi, D, Chaturvedi, S., Roth, M., et al.: Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). Association for Computational Linguistics, pp. 252–262, (2018) https://doi.org/10.18653/v1/n18-1023
Khashabi, D., Min, S., Khot, T., et al.: Unifiedqa: Crossing format boundaries with a single QA system. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16-20 November 2020. Association for Computational Linguistics, pp. 1896–1907, (2020) https://doi.org/10.18653/v1/2020.findings-emnlp.171
Kitaev, N., Klein, D.: Constituency parsing with a self-attentive encoder. In: Proceedings of the 2018 Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers. Association for Computational Linguistics, pp. 2676–2686, (2018) https://doi.org/10.18653/v1/P18-1249
Kwiatkowski, T., Palomaki, J., Redfield, O., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 452–466 (2019)
Google Scholar
Lai, G., Xie, Q., Liu, H., et al.: RACE: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017. Association for Computational Linguistics, pp. 785–794, (2017) https://doi.org/10.18653/v1/d17-1082
Lan, Y., Jiang, J.: Query graph generation for answering multi-hop complex questions from knowledge bases. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp. 969–974, (2020) 10.18653/v1/2020.acl-main.91
Lan, Y., He, G., Jiang, J., et. al.: A survey on complex knowledge base question answering: Methods, challenges and solutions. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021. ijcai.org, pp. 4483–4491, (2021) https://doi.org/10.24963/ijcai.2021/611
Lehmann, J., Isele, R., Jakob, M., et al.: Dbpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web. 6(2), 167–195 (2015). https://doi.org/10.3233/SW-140134
Article Google Scholar
Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out. Association for Computational Linguistics, pp 74–81 (2004)
Liu, Y., Ott, M., Goyal, N., et. al.: Roberta: A robustly optimized BERT pretraining approach. (2019) CoRR abs/1907.11692. arXiv: org/abs/arXiv:1907.11692
Liu, Z., Feng, Y., Chen, Z.: Dialtest: automated testing for recurrent-neural-network-driven dialogue systems. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, Denmark, July 11-17, 2021. ACM, pp. 115–126, https://doi.org/10.1145/3460319.3464829 (2021)
Longpre, S., Lu, Y., Daiber, J.: MKQA: A linguistically diverse benchmark for multilingual open domain question answering. CoRR abs/2007.15207. arXiv: org/abs/arXiv:2007.15207 (2020)
Nguyen, T., Rosenberg, M., Song, X., et. al.: MS MARCO: A human generated machine reading comprehension dataset. In: Proceedings of the 2016 Workshop on Cognitive Computation: Integrating neural and symbolic approaches co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, CEUR Workshop Proceedings, Vol 1773. CEUR-WS.org (2016)
Northcutt, C.G., Athalye, A., Mueller, J.: Pervasive label errors in test sets destabilize machine learning benchmarks. CoRR abs/2103.14749. arXiv: org/abs/arXiv:2103.14749 (2021)
Onishi, T., Wang, H., Bansal, M., et. al.: Who did what: A large-scale person-centered cloze dataset. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. The Association for Computational Linguistics, pp. 2230–2235, (2016) https://doi.org/10.18653/v1/d16-1241
Raffel, C., Shazeer, N., Roberts, A., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1-140:67 (2020)
MathSciNet MATH Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., et. al.: Squad: 100, 000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. The Association for Computational Linguistics, pp. 2383–2392, (2016) https://doi.org/10.18653/v1/d16-1264
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: Unanswerable questions for squad. In: Proceedings of the 2018 Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers. Association for Computational Linguistics, pp. 784–789, (2018) https://doi.org/10.18653/v1/P18-2124
Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. (2010) http://is.muni.cz/publication/884893/en
Ribeiro, M.T., Wu, T., Guestrin, C., et. al.: Beyond accuracy: Behavioral testing of NLP models with checklist. In: Proceedings of the 2020 Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp. 4902–4912, (2020) https://doi.org/10.18653/v1/2020.acl-main.442
Smedt, T.D., Daelemans, W.: Pattern for python. J. Mach. Learn. Res. 13, 2063–2067 (2012)
Google Scholar
Sun, Z., Zhang, J.M., Harman, M., et. al.: Automatic testing and improvement of machine translation. In: Proceedings of the 2020 International Conference on Software Engineering, ICSE 2020, Seoul, South Korea, 27 June - 19 July, 2020. ACM, pp. 974–985, (2020) https://doi.org/10.1145/3377811.3380420
Sun, Z., Zhang, J.M., Xiong, Y., et. al.: Improving machine translation systems via isotopic replacement. In: Proceedings of the 2022 International Conference on Software Engineering, ICSE 2022, Pittsburgh, USA, 21 May - 29 May, 2022. ACM, (2022) https://doi.org/10.1145/3510003.3510206
Suster, S., Daelemans, W.: Clicr: a dataset of clinical case reports for machine reading comprehension. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Vol. 1 (Long Papers). Association for Computational Linguistics, pp. 1551–1563, (2018a) https://doi.org/10.18653/v1/n18-1140
Suster, S., Daelemans, W.: Clicr: a dataset of clinical case reports for machine reading comprehension. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Vol. 1 (Long Papers). Association for Computational Linguistics, pp. 1551–1563, (2018b) https://doi.org/10.18653/v1/n18-1140
Tafjord, O., Clark, P.: General-purpose question-answering with macaw. (2021) CoRR abs/2109.02593 arXiv: org/abs/2109.02593
Tanon, T.P., Vrandecic, D., Schaffert, S., et. al.: From freebase to wikidata: The great migration. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11 - 15, 2016. ACM, pp. 1419–1428, (2016) https://doi.org/10.1145/2872427.2874809
Tian, Y., Pei, K., Jana, S., et. al.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 2018 International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. ACM, pp. 303–314, (2018) https://doi.org/10.1145/3180155.3180220
Trivedi, P., Maheshwari, G., Dubey, M., et. al.: Lc-quad: A corpus for complex question answering over knowledge graphs. In: The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II, Lecture Notes in Computer Science, Vol. 10588. Springer, pp. 210–218, (2017) https://doi.org/10.1007/978-3-319-68204-4_22
Wang, S., Su, Z.: Metamorphic object insertion for testing object detection systems. In: Proceedings of the 2020 International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, pp. 1053–1065, (2020) https://doi.org/10.1145/3324884.3416584
Wang, X., Zhao, S., Han, J., et al.: Modelling long-distance node relations for KBQA with global dynamic graph. In: Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020. International Committee on Computational Linguistics, pp. 2572–2582, (2020) https://doi.org/10.18653/v1/2020.coling-main.231
Xie, X., Ho, J.W., Murphy, C., et al.: Esting and validating machine learning classifiers by metamorphic testing. J. Syst. Softw. 84(4), 544–58 (2011)
Article Google Scholar
Xie, X., Zhang, Z., Chen, T.Y., et al.: METTLE: a metamorphic testing approach to assessing and validating unsupervised machine learning systems. IEEE Trans. Reliab. 69(4), 1293–322 (2020). https://doi.org/10.1109/TR.2020.2972266
Article Google Scholar
Yan, B., Yecies, B., Zhou, Z.Q.: Metamorphic relations for data validation: a case study of translated text messages. In: Proceedings of the 2019 International Workshop on Metamorphic Testing, MET@ICSE 2019, Montreal, QC, Canada, May 26, 2019. IEEE / ACM, pp 70–75, (2019) https://doi.org/10.1109/MET.2019.00018
Yang, Z., Qi, P., Zhang, S., et al.: Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, Brussels, Belgium, October 31 - November 4, 2018. Association for Computational Linguistics, pp. 2369–2380, (2018) https://doi.org/10.18653/v1/d18-1259
Yani, M., Krisnadhi, A.A.: Challenges, techniques, and trends of simple knowledge graph question answering: a survey. Inf. 12(7), 271 (2021). https://doi.org/10.3390/info12070271
Article Google Scholar
Yih, W., Richardson, M., Meek, C., et al.: The value of semantic parse labeling for knowledge base question answering. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers. The Association for Computer Linguistics, (2016) https://doi.org/10.18653/v1/p16-2033
Zhang, M., Zhang, Y., Zhang, L., et. al.: Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 2018 International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018. ACM, pp. 132–142, (2018) https://doi.org/10.1145/3238147.3238187
Zhang, Z., Zhao, H., Wang, R.: Machine reading comprehension: The role of contextualized language models and beyond. (2020) arXiv: org/abs/arXiv:2005.06249
Zhou, Z., Xiang, S., Chen, T.Y.: Metamorphic testing for software quality assessment: A study of search engines. IEEE Trans. Softw. Eng. 42(3), 264–284 (2016). https://doi.org/10.1109/TSE.2015.2478001
Article Google Scholar
Zhou, Z.Q., Sun, L.: Metamorphic testing of driverless cars. Commun. ACM 62(3), 61–67 (2019). https://doi.org/10.1145/3241979
Article Google Scholar

Download references

Acknowledgements

We first sincerely appreciate the positive acknowledgment and the very kind suggestions from the anonymous reviewers for both our conference paper and this extended journal paper. This work was partially supported by the National Natural Science Foundation of China under the grant numbers 62250610224, 61972289, and 61832009. And the numerical calculations in this work have been partially done on the supercomputing system in the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, China
Xiaoyuan Xie, Shuo Jin & Songqiang Chen

Authors

Xiaoyuan Xie
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Jin
View author publications
You can also search for this author in PubMed Google Scholar
Songqiang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xiaoyuan Xie or Songqiang Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xie, X., Jin, S. & Chen, S. qaAskeR$^+$: a novel testing method for question answering software via asking recursive questions. Autom Softw Eng 30, 14 (2023). https://doi.org/10.1007/s10515-023-00380-2

Download citation

Received: 16 April 2022
Accepted: 26 February 2023
Published: 28 March 2023
DOI: https://doi.org/10.1007/s10515-023-00380-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

qaAskeR\(^+\): a novel testing method for question answering software via asking recursive questions

Abstract

Access this article

Similar content being viewed by others

Counterfactual explanations and how to find them: literature review and benchmarking

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

A Systematic Review of Automatic Question Generation for Educational Purposes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

qaAskeR\(^+\): a novel testing method for question answering software via asking recursive questions

Abstract

Access this article

Similar content being viewed by others

Counterfactual explanations and how to find them: literature review and benchmarking

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

A Systematic Review of Automatic Question Generation for Educational Purposes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation