SMART: A Stratified Machine Reading Test

Yao, Jiarui; Feng, Minxuan; Feng, Haixia; Wang, Zhiguo; Zhang, Yuchen; Xue, Nianwen

doi:10.1007/978-3-030-32233-5_6

SMART: A Stratified Machine Reading Test

Jiarui Yao¹³,
Minxuan Feng¹⁴,
Haixia Feng¹⁵,
Zhiguo Wang¹⁶,
Yuchen Zhang¹³ &
…
Nianwen Xue¹³

Conference paper
First Online: 30 September 2019

2241 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11838))

Abstract

We present a Stratified MAchine Reading Test (SMART) data set for Chinese in which each question is assigned a “level” that reflects the type of reasoning that is needed to answer the question. This data set consists of close to 40 K question-answer pairs and its stratified design allows machine reading researchers to quickly focus in on areas that present the most challenge for a machine comprehension system. We further establish a baseline for future research with BERT, and present results that show the levels we have designed correspond well with the level of difficulty that BERT experiences in answering these questions, as reflected by the lower accuracy for higher levels. We have also collected human answers to the questions in the test portion of this data set, and show that humans and the machine have different challenges when answering these questions. This means that even though the machine is approaching human-level performance on this task, humans and the machine perform this task with very different mechanisms.

We would like to thank the students from Ludong University, particularly Liang Jian ( ), Xu Yuanyuan ( ), Shang Guofeng ( ), and students from Nanjing Normal University, particularly Liu Han ( ), Cao Ziyan ( ), Mao Xuefen ( ) for their assistance with data preparation. The second author would like to acknowledge the support from a National Language Committee project (YB135-23) and a Jiangsu Higher Institutions’ Excellent Innovative Team for Philosophy and Social Sciences project (2017STD006). The third author would like to acknowledge the support of a National Language Committee “13th Five-Year” Research Plan project (ZD\(\vert \)135-22).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
See the leadboard at https://rajpurkar.github.io/SQuAD-explorer/. On SQuAD 1.0, a number of systems have surpassed human performance, and on SQuAD 2.0, the state of the art systems is approaching human performance.
2.
Data will be made available here: https://www.cs.brandeis.edu/~clp/smart.
3.
https://github.com/attardi/wikiextractor.

References

Chen, C., Ng, V.: Chinese zero pronoun resolution: some recent advances. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
Google Scholar
Clark, P., et al.: Think you have solved question answering? try arc, the AI2 reasoning challenge. CoRR abs/1803.05457 (2018). http://arxiv.org/abs/1803.05457
Cui, Y., Liu, T., Chen, Z., Wang, S., Hu, G.: Consensus attention-based neural networks for chinese reading comprehension. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dunn, M., Sagun, L., Higgins, M., Güney, V.U., Cirik, V., Cho, K.: SearchQA: a new Q&A dataset augmented with context from a search engine. CoRR abs/1704.05179 (2017). http://arxiv.org/abs/1704.05179
He, W., et al.: DuReader: a Chinese machine reading comprehension dataset from real-world applications. In: Proceedings of the Workshop on Machine Reading for Question Answering, pp. 37–46 (2018)
Google Scholar
Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, July 2017
Google Scholar
Khashabi, D., Chaturvedi, S., Roth, M., Upadhyay, S., Roth, D.: Looking beyond the surface: a challenge set for reading comprehension over multiple sentences. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), pp. 252–262 (2018)
Google Scholar
Kocisky, T., et al.: The narrativeqa reading comprehension challenge. Trans. Assoc. Comput. Linguis. 6, 317–328 (2018)
Article Google Scholar
Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017)
Google Scholar
Lee, K., He, L., Lewis, M., Zettlemoyer, L.: End-to-end neural coreference resolution. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark (2017)
Google Scholar
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2002)
Google Scholar
Raghunathan, K., et al.: A multi-pass sieve for coreference resolution. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (2010)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)
Google Scholar
Richardson, M., Burges, C.J., Renshaw, E.: MCTest: a challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
Google Scholar
Shao, C., Liu, T., Lai, Y., Tseng, Y., Tsai, S.: DRCD: a Chinese machine reading comprehension dataset. CoRR abs/1806.00920 (2018). http://arxiv.org/abs/1806.00920
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)
Article Google Scholar
Trischler, A., et al.: NewsQA: a machine comprehension dataset. In: Proceedings of the 2nd Workshop on Representation Learning for NLP (2017)
Google Scholar
Welbl, J., Stenetorp, P., Riedel, S.: Constructing datasets for multi-hop reading comprehension across documents. Trans. Assoc. Comput. Linguist. 6, 287–302 (2018)
Article Google Scholar
Xue, N., Ng, H.T., Pradhan, S., Prasad, R., Bryant, C., Rutherford, A.: The CoNLL-2015 shared task on shallow discourse parsing. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning-Shared Task, pp. 1–16 (2015)
Google Scholar
Xue, N., et al.: CoNLL 2016 shared task on multilingual shallow discourse parsing. In: Proceedings of the CoNLL-16 shared task (2016)
Google Scholar
Zhao, S., Ng, H.T.: Identification and resolution of Chinese zero pronouns: a machine learning approach. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Brandeis University, Waltham, USA
Jiarui Yao, Yuchen Zhang & Nianwen Xue
Nanjing Normal University, Nanjing Shi, China
Minxuan Feng
Ludong University, Yantai Shi, China
Haixia Feng
Amazon Web Services, Seattle, USA
Zhiguo Wang

Authors

Jiarui Yao
View author publications
You can also search for this author in PubMed Google Scholar
Minxuan Feng
View author publications
You can also search for this author in PubMed Google Scholar
Haixia Feng
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Nianwen Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nianwen Xue .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, J., Feng, M., Feng, H., Wang, Z., Zhang, Y., Xue, N. (2019). SMART: A Stratified Machine Reading Test. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11838. Springer, Cham. https://doi.org/10.1007/978-3-030-32233-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-32233-5_6
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32232-8
Online ISBN: 978-3-030-32233-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)