Quora Question Answer Dataset

Aghaebrahimian, Ahmad

doi:10.1007/978-3-319-64206-2_8

Ahmad Aghaebrahimian¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

2001 Accesses
8 Citations
3 Altmetric

Abstract

We report on a progressing work for compiling Quora Question Answer dataset. Quora dataset is composed of questions which are posed in Quora Question Answering site. It is the only dataset which provides sentence-level and word-level answers at the same time. Moreover, the questions in the dataset are authentic which is much more realistic for Question Answering systems. We test the performance of a state-of-the-art Question Answering system on the dataset and compare it with human performance to establish an upper bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Quora dataset is available at https://github.com/Q2AD.
2.
The choice of development size is given to the preference of researchers and the attributes of their experiments.
3.
Some users in Quora provides their questions with a comment which helps to clarify the question better.

References

Aghaebrahimian, A.: Constrained deep answer sentence selection. In: Proceedings of the 20th International Conference on Text, Speech and Dialogue (TSD) (2017)
Google Scholar
Aghaebrahimian, A., Jurčíček, F.: Open-domain factoid question answering via knowledge graph search. In: Proceedings of the Workshop on Human-Computer Question Answering, The North American Chapter of the Association for Computational Linguistics (NAACL) (2016)
Google Scholar
Bollacker, K., Tufts, P., Pierce, T., Robert, C.: A platform for scalable, collaborative, structured information integration. In: Proceedings of the Sixth International Workshop on Information Integration on the Web (2007)
Google Scholar
Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arxiv:1506.02075 (2015)
Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Hill, F., Bordes, A., Chopra, S., Weston, J.: The goldilocks principle: reading children’s books with explicit memory representations. arxiv:1511.02301 (2015)
Kadlec, R., Schmid, M., Bajgar, O., Kleindienst, J.: Text understanding with the attention sum reader network. In: Proceedings of the Association for Computational Linguistics (2016)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. arxiv:1606.05250 (2016)
Rao, J., He, H., Lin, J.: Noise-contrastive estimation for answer selection with deep neural networks. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (2016)
Google Scholar
Richardson, M., Burges, J.C., C., Erin, R.: MCTest: a challenge dataset for the open-domain machine comprehension of text. In: Empirical Methods in Natural Language Processing (EMNLP) (2013)
Google Scholar
Santos, C.D., Tan, M., Xiang, B., Zhou, B.: Attentive pooling networks. arXiv:1602.03609v1 (2016)
Voorhees, E.M., Tice, D.M.: Building a question answering test collection. In: ACM Special Interest Group on Information Retreival (SIGIR) (2000)
Google Scholar
Yang, Y., Yih, S.W.T., Meek, C.: WikiQA: a challenge dataset for open-domain question answering. In: Empirical Methods in Natural Language Processing (EMNLP) (2015)
Google Scholar

Download references

Acknowledgments

This research was partially funded by the Ministry of Education, Youth and Sports of the Czech Republic under SVV project number 260 453, core research funding, and GAUK 207-10/250098 of Charles University in Prague.

Author information

Authors and Affiliations

Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Charles University in Prague, Malostranske nam. 25, 11800, Praha 1, Czech Republic
Ahmad Aghaebrahimian

Authors

Ahmad Aghaebrahimian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad Aghaebrahimian .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aghaebrahimian, A. (2017). Quora Question Answer Dataset. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-64206-2_8
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics