Skip to main content
Log in

qaAskeR\(^+\): a novel testing method for question answering software via asking recursive questions

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Question Answering (QA) is an attractive and challenging area in NLP community. With the development of QA technique, plenty of QA software has been applied in daily human life to provide convenient access of information retrieval. To investigate the performance of QA software, many benchmark datasets have been constructed to provide various test cases. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases are mandatory to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this work, we propose a novel testing method, qaAskeR \(^+\), with five new Metamorphic Relations for QA software. qaAskeR \(^+\) does not refer to the annotated labels of test cases. Instead, based on the idea that a correct answer should imply a piece of reliable knowledge that always conforms with any other correct answer, qaAskeR \(^+\) tests QA software by inspecting its behaviors on multiple recursively asked questions that are relevant to the same or some further enriched knowledge. Experimental results show that qaAskeR \(^+\) can reveal quite a few violations that indicate actual answering issues on various mainstream QA software without using any pre-annotated labels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Some studies refer them as “open-domain QA” and “closed-domain QA”.

  2. https://spacy.io/

  3. aux is a non-main verb of the clause, including a modal auxiliary and a form of be, do or have in a periphrastic tense. Details could be found at https://universaldependencies.org/.

  4. The similarity score is defined as \(s^R(a, b)=\min (\)R1Precision(ab), R1Recall(ab)), where a and b are two strings while R1Precision and R1Recall are two sub-metrics in ROUGE-1 score (token-wise ROUGE similarity between two sentences).

  5. NatQA includes some questions in miscellaneous and informal forms, e.g., “total number of death row inmates in the us?”.

  6. The official models can be found at their replication packages as follows:

    NSM+h: https://github.com/RichardHGL/WSDM2021_NSM.

    Multi-hop Complex KBQA: https://github.com/lanyunshi/Multi-hopComplexKBQA.

    Macaw: https://github.com/allenai/macaw.

  7. Our method solves the test oracle problem and thus in its practical application, the size of the source test suite for each MR, as well as the eligible ones, can be adjusted freely. In such a case, we report the violation rate to reflect the average violation detection ability of our MRs as their overall effectiveness.

  8. https://www.google.com/

  9. All the questions in MKQA can be answered with the public knowledge from the web.

  10. Based on the search results obtained on April 10, 2021.

  11. Based on the search results obtained on April 8, 2022.

References

  • Azmy, M., Shi, P., Lin, J. et al.: Farewell freebase: Migrating the simplequestions dataset to dbpedia. In: Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. Association for Computational Linguistics, pp 2093–2103 (2018)

  • Bao, J., Duan, N., Yan, Z., et al .: Constraint-based question answering with knowledge graph. In: COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan. ACL, pp 2503–2514 (2016)

  • Berant J, Chou A, Frostig R, et al.: Semantic parsing on freebase from question-answer pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, pp. 1533–1544 (2013)

  • Bollacker, K.D., Evans, C., Paritosh, P.K., et al .: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008. ACM, pp. 1247–1250, (2008)https://doi.org/10.1145/1376616.1376746

  • Chandar S, Ahn S, Larochelle H, et al (2016) Hierarchical memory networks. CoRR abs/1605.07427. arXiv: org/abs/1605.07427,

  • Chen D, Fisch A, Weston J, et al (2017) Reading wikipedia to answer open-domain questions. In: Proceedings of the 2017 Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. Association for Computational Linguistics, pp. 1870–1879, https://doi.org/10.18653/v1/P17-1171

  • Chen, S., Jin, S., Xie, X.: Testing your question answering software via asking recursively. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, pp. 104–116, (2021a)https://doi.org/10.1109/ASE51524.2021.9678670

  • Chen, S., Jin, S., Xie, X.: Validation on machine reading comprehension software without annotated labels: A property-based method. In: Proceedings of the 2021 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, Athens, Greece, August 23-28, 2021. ACM, pp. 590–602, (2021b) https://doi.org/10.1145/3468264.3468569

  • Chen, T.Y., Cheung, S.C., Yiu, S.M.: Metamorphic testing: a new approach for generating next test cases. Tech. Rep. HKUST-CS98-01, Department of Computer Science, The Hong Kong University of Science and Technology (1998)

  • Chen, T.Y., Kuo, F., Tse, T.H., et. al.: Metamorphic testing and beyond. In: Proceedings of the 2003 International Workshop on Software Technology and Engineering Practice, 19-21 September 2003, Amsterdam, The Netherlands. IEEE Computer Society, pp. 94–100, (2003)https://doi.org/10.1109/STEP.2003.18

  • Chen, T.Y., Kuo, F.C., Liu, H., et. al.: Metamorphic testing: a review of challenges and opportunities. ACM Comput. Surv. 51(1), 1–27 (2018). https://doi.org/10.1145/3143561

  • Clark, C., Lee, K., Chang, M. et. al.: Boolq: Exploring the surprising difficulty of natural yes/no questions. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp. 2924–2936, (2019) https://doi.org/10.18653/v1/n19-1300

  • Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)

    Article  Google Scholar 

  • Dong, L., Yang, N., Wang, W., et. al.: Unified language model pre-training for natural language understanding and generation. In: Proceedings of the 2019 Annual Conference on Neural Information Processing Systems, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 13,042–13,054 (2019)

  • Dzendzik, D., Vogel, C., Foster, J.: English machine reading comprehension datasets: A survey. CoRR abs/2101.10421. (2021) arXiv: 2101.10421

  • Eger, S., Benz, Y.: From hero to zéroe: A benchmark of low-level adversarial attacks. In: Proceedings of the 2020 Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2020, Suzhou, China, December 4-7, 2020. Association for Computational Linguistics, pp. 786–803 (2020)

  • Gardner, M., Artzi, Y., Basmova, V., et. al: Evaluating models’ local decision boundaries via contrast sets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16-20 November 2020. Association for Computational Linguistics, pp. 1307–1323, (2020) https://doi.org/10.18653/v1/2020.findings-emnlp.117

  • Gupta, M., Kulkarni, N., Chanda, R., et. al.: Amazonqa: A review-based question answering task. In: Proceedings of the 2019 International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. ijcai.org, pp. 4996–5002, (2019) https://doi.org/10.24963/ijcai.2019/694

  • Gupta, S., He, P., Meister, C., et. al.: Machine translation testing via pathological invariance. In: Proceedings of the 2020 Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, ESEC/FSE 2020, USA, November 8-13, 2020. ACM, pp. 863–875, https://doi.org/10.1145/3368089.3409756 (2020)

  • Han, J., Cheng, B., Wang, X.: Open domain question answering based on text enhanced knowledge graph with hyperedge infusion. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, Findings of ACL, vol EMNLP 2020. Association for Computational Linguistics, pp. 1475–1481, (2020)https://doi.org/10.18653/v1/2020.findings-emnlp.133

  • He, G., Lan, Y., Jiang, J., et. al.: Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In: WSDM ’21, The Fourteenth ACM International Conference on Web Search and Data Mining, Virtual Event, Israel, March 8-12, 2021. ACM, pp. 553–561, (2021a)https://doi.org/10.1145/3437963.3441753

  • He, P., Meister, C., Su, Z.: Structure-invariant testing for machine translation. In: Proceedings of the 2020 International Conference on Software Engineering, Seoul, ICSE 2020, South Korea, 27 June - 19 July, 2020. ACM, pp. 961–973, (2020) https://doi.org/10.1145/3377811.3380339

  • He, P., Meister, C., Su, Z.: Testing machine translation via referential transparency. In: Proceedings of the 2021 International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, pp. 410–422, (2021b) https://doi.org/10.1109/ICSE43902.2021.00047

  • He, W., Liu, K., Liu, J., et al.: Dureader: a chinese machine reading comprehension dataset from real-world applications. In: Proceedings of 2018 the Workshop on Machine Reading for Question Answering@ACL 2018, Melbourne, Australia, July 19, 2018. Association for Computational Linguistics, pp. 37–46, (2018) https://doi.org/10.18653/v1/W18-2605

  • Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017. Association for Computational Linguistics, pp. 2021–2031, (2017) https://doi.org/10.18653/v1/d17-1215

  • Jin, Q., Dhingra, B., Liu, Z., et. al.: Pubmedqa: A dataset for biomedical research question answering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. Association for Computational Linguistics, pp. 2567–2577, (2019) https://doi.org/10.18653/v1/D19-1259

  • Khashabi, D, Chaturvedi, S., Roth, M., et al.: Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). Association for Computational Linguistics, pp. 252–262, (2018) https://doi.org/10.18653/v1/n18-1023

  • Khashabi, D., Min, S., Khot, T., et al.: Unifiedqa: Crossing format boundaries with a single QA system. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16-20 November 2020. Association for Computational Linguistics, pp. 1896–1907, (2020) https://doi.org/10.18653/v1/2020.findings-emnlp.171

  • Kitaev, N., Klein, D.: Constituency parsing with a self-attentive encoder. In: Proceedings of the 2018 Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers. Association for Computational Linguistics, pp. 2676–2686, (2018) https://doi.org/10.18653/v1/P18-1249

  • Kwiatkowski, T., Palomaki, J., Redfield, O., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 452–466 (2019)

    Google Scholar 

  • Lai, G., Xie, Q., Liu, H., et al.: RACE: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017. Association for Computational Linguistics, pp. 785–794, (2017) https://doi.org/10.18653/v1/d17-1082

  • Lan, Y., Jiang, J.: Query graph generation for answering multi-hop complex questions from knowledge bases. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp. 969–974, (2020) 10.18653/v1/2020.acl-main.91

  • Lan, Y., He, G., Jiang, J., et. al.: A survey on complex knowledge base question answering: Methods, challenges and solutions. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021. ijcai.org, pp. 4483–4491, (2021) https://doi.org/10.24963/ijcai.2021/611

  • Lehmann, J., Isele, R., Jakob, M., et al.: Dbpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web. 6(2), 167–195 (2015). https://doi.org/10.3233/SW-140134

    Article  Google Scholar 

  • Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out. Association for Computational Linguistics, pp 74–81 (2004)

  • Liu, Y., Ott, M., Goyal, N., et. al.: Roberta: A robustly optimized BERT pretraining approach. (2019) CoRR abs/1907.11692. arXiv: org/abs/arXiv:1907.11692

  • Liu, Z., Feng, Y., Chen, Z.: Dialtest: automated testing for recurrent-neural-network-driven dialogue systems. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, Denmark, July 11-17, 2021. ACM, pp. 115–126, https://doi.org/10.1145/3460319.3464829 (2021)

  • Longpre, S., Lu, Y., Daiber, J.: MKQA: A linguistically diverse benchmark for multilingual open domain question answering. CoRR abs/2007.15207. arXiv: org/abs/arXiv:2007.15207 (2020)

  • Nguyen, T., Rosenberg, M., Song, X., et. al.: MS MARCO: A human generated machine reading comprehension dataset. In: Proceedings of the 2016 Workshop on Cognitive Computation: Integrating neural and symbolic approaches co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, CEUR Workshop Proceedings, Vol 1773. CEUR-WS.org (2016)

  • Northcutt, C.G., Athalye, A., Mueller, J.: Pervasive label errors in test sets destabilize machine learning benchmarks. CoRR abs/2103.14749. arXiv: org/abs/arXiv:2103.14749 (2021)

  • Onishi, T., Wang, H., Bansal, M., et. al.: Who did what: A large-scale person-centered cloze dataset. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. The Association for Computational Linguistics, pp. 2230–2235, (2016) https://doi.org/10.18653/v1/d16-1241

  • Raffel, C., Shazeer, N., Roberts, A., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1-140:67 (2020)

    MathSciNet  MATH  Google Scholar 

  • Rajpurkar, P., Zhang, J., Lopyrev, K., et. al.: Squad: 100, 000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. The Association for Computational Linguistics, pp. 2383–2392, (2016) https://doi.org/10.18653/v1/d16-1264

  • Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: Unanswerable questions for squad. In: Proceedings of the 2018 Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers. Association for Computational Linguistics, pp. 784–789, (2018) https://doi.org/10.18653/v1/P18-2124

  • Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. (2010) http://is.muni.cz/publication/884893/en

  • Ribeiro, M.T., Wu, T., Guestrin, C., et. al.: Beyond accuracy: Behavioral testing of NLP models with checklist. In: Proceedings of the 2020 Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp. 4902–4912, (2020) https://doi.org/10.18653/v1/2020.acl-main.442

  • Smedt, T.D., Daelemans, W.: Pattern for python. J. Mach. Learn. Res. 13, 2063–2067 (2012)

    Google Scholar 

  • Sun, Z., Zhang, J.M., Harman, M., et. al.: Automatic testing and improvement of machine translation. In: Proceedings of the 2020 International Conference on Software Engineering, ICSE 2020, Seoul, South Korea, 27 June - 19 July, 2020. ACM, pp. 974–985, (2020) https://doi.org/10.1145/3377811.3380420

  • Sun, Z., Zhang, J.M., Xiong, Y., et. al.: Improving machine translation systems via isotopic replacement. In: Proceedings of the 2022 International Conference on Software Engineering, ICSE 2022, Pittsburgh, USA, 21 May - 29 May, 2022. ACM, (2022) https://doi.org/10.1145/3510003.3510206

  • Suster, S., Daelemans, W.: Clicr: a dataset of clinical case reports for machine reading comprehension. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Vol. 1 (Long Papers). Association for Computational Linguistics, pp. 1551–1563, (2018a) https://doi.org/10.18653/v1/n18-1140

  • Suster, S., Daelemans, W.: Clicr: a dataset of clinical case reports for machine reading comprehension. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Vol. 1 (Long Papers). Association for Computational Linguistics, pp. 1551–1563, (2018b) https://doi.org/10.18653/v1/n18-1140

  • Tafjord, O., Clark, P.: General-purpose question-answering with macaw. (2021) CoRR abs/2109.02593 arXiv: org/abs/2109.02593

  • Tanon, T.P., Vrandecic, D., Schaffert, S., et. al.: From freebase to wikidata: The great migration. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11 - 15, 2016. ACM, pp. 1419–1428, (2016) https://doi.org/10.1145/2872427.2874809

  • Tian, Y., Pei, K., Jana, S., et. al.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 2018 International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. ACM, pp. 303–314, (2018) https://doi.org/10.1145/3180155.3180220

  • Trivedi, P., Maheshwari, G., Dubey, M., et. al.: Lc-quad: A corpus for complex question answering over knowledge graphs. In: The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II, Lecture Notes in Computer Science, Vol. 10588. Springer, pp. 210–218, (2017) https://doi.org/10.1007/978-3-319-68204-4_22

  • Wang, S., Su, Z.: Metamorphic object insertion for testing object detection systems. In: Proceedings of the 2020 International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, pp. 1053–1065, (2020) https://doi.org/10.1145/3324884.3416584

  • Wang, X., Zhao, S., Han, J., et al.: Modelling long-distance node relations for KBQA with global dynamic graph. In: Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020. International Committee on Computational Linguistics, pp. 2572–2582, (2020) https://doi.org/10.18653/v1/2020.coling-main.231

  • Xie, X., Ho, J.W., Murphy, C., et al.: Esting and validating machine learning classifiers by metamorphic testing. J. Syst. Softw. 84(4), 544–58 (2011)

    Article  Google Scholar 

  • Xie, X., Zhang, Z., Chen, T.Y., et al.: METTLE: a metamorphic testing approach to assessing and validating unsupervised machine learning systems. IEEE Trans. Reliab. 69(4), 1293–322 (2020). https://doi.org/10.1109/TR.2020.2972266

    Article  Google Scholar 

  • Yan, B., Yecies, B., Zhou, Z.Q.: Metamorphic relations for data validation: a case study of translated text messages. In: Proceedings of the 2019 International Workshop on Metamorphic Testing, MET@ICSE 2019, Montreal, QC, Canada, May 26, 2019. IEEE / ACM, pp 70–75, (2019) https://doi.org/10.1109/MET.2019.00018

  • Yang, Z., Qi, P., Zhang, S., et al.: Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, Brussels, Belgium, October 31 - November 4, 2018. Association for Computational Linguistics, pp. 2369–2380, (2018) https://doi.org/10.18653/v1/d18-1259

  • Yani, M., Krisnadhi, A.A.: Challenges, techniques, and trends of simple knowledge graph question answering: a survey. Inf. 12(7), 271 (2021). https://doi.org/10.3390/info12070271

    Article  Google Scholar 

  • Yih, W., Richardson, M., Meek, C., et al.: The value of semantic parse labeling for knowledge base question answering. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers. The Association for Computer Linguistics, (2016) https://doi.org/10.18653/v1/p16-2033

  • Zhang, M., Zhang, Y., Zhang, L., et. al.: Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 2018 International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018. ACM, pp. 132–142, (2018) https://doi.org/10.1145/3238147.3238187

  • Zhang, Z., Zhao, H., Wang, R.: Machine reading comprehension: The role of contextualized language models and beyond. (2020) arXiv: org/abs/arXiv:2005.06249

  • Zhou, Z., Xiang, S., Chen, T.Y.: Metamorphic testing for software quality assessment: A study of search engines. IEEE Trans. Softw. Eng. 42(3), 264–284 (2016). https://doi.org/10.1109/TSE.2015.2478001

    Article  Google Scholar 

  • Zhou, Z.Q., Sun, L.: Metamorphic testing of driverless cars. Commun. ACM 62(3), 61–67 (2019). https://doi.org/10.1145/3241979

    Article  Google Scholar 

Download references

Acknowledgements

We first sincerely appreciate the positive acknowledgment and the very kind suggestions from the anonymous reviewers for both our conference paper and this extended journal paper. This work was partially supported by the National Natural Science Foundation of China under the grant numbers 62250610224, 61972289, and 61832009. And the numerical calculations in this work have been partially done on the supercomputing system in the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiaoyuan Xie or Songqiang Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, X., Jin, S. & Chen, S. qaAskeR\(^+\): a novel testing method for question answering software via asking recursive questions. Autom Softw Eng 30, 14 (2023). https://doi.org/10.1007/s10515-023-00380-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-023-00380-2

Keywords

Navigation