The First Resource for Bengali Question Answering Research

Banerjee, Somnath; Lohar, Pintu; Naskar, Sudip Kumar; Bandyopadhyay, Sivaji

doi:10.1007/978-3-319-10888-9_30

The First Resource for Bengali Question Answering Research

Somnath Banerjee²⁰,
Pintu Lohar²⁰,
Sudip Kumar Naskar²⁰ &
…
Sivaji Bandyopadhyay²⁰

Conference paper

2021 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8686))

Abstract

This paper reports the development of the first tagged resource for question answering research for a less computerized Indian language, namely Bengali. We developed a tagging scheme for annotating the questions based on their types. Expected answer type and question topical target are also marked to facilitate the answer search. Due to scarcity of canonical documents in the web for Bengali, we could not take the advantage of web as the resource and the major portion of the resource data was collected from authentic books. Six highly qualified annotators were involved in this rigorous work. At present, the resource contains 47 documents from three domains, namely history, geography and agriculture. Question answering based annotation was performed to prepare more than 2250 question-answer pairs. The inter-annotator agreement scores measured in non-weighted kappa statistics is satisfactory.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Verberne, S., Boves, L., Oostdijk, N., Coppen, P.A.J.M.: Data for question answering: the case of why. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006), 5th edn., Genoa, Italy (2006)
Google Scholar
Inoue, M., Akagi, T.: Collecting humorous expressions from a community-based question-answering-service corpus. In: Proceedings of LREC, pp. 1836–1839 (2012)
Google Scholar
Cabrio, E., Coppola, B., Gretter, R., Kouylekov, M., Magnini, B., Negri, M.: Question answering based annotation for a corpus of spoken requests. In: Proceedings of the Workshop on the Semantic Representation of Spoken Language, Salamanca, Spain (2007)
Google Scholar
Louis, A., Nenkova, A.: A corpus of general and specific sentences from news. In: Proceedings of LREC, pp. 1818–1821 (2012)
Google Scholar
Banerjee, S., Bandyopadhyay, S.: Bengali Question Classification: Towards Developing QA System. In: Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), COLING, India, pp. 25–40 (2012)
Google Scholar
Banerjee, S., Bandyopadhyay, S.: An Empirical Study of Combining Multiple Models in Bengali Question Classification. In: Proceedings of International Joint Conference on Natural Language Processing (IJCNLP), Japan, pp. 892–896 (2013)
Google Scholar
Banerjee, S., Bandyopadhyay, S.: Ensemble Approach for Fine-Grained Question Classification in Bengali. In: Proceedings of 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC), Taiwan, pp. 75–84 (2013)
Google Scholar
Rundell, M.: The biggest corpus of all. Humanising Language Teaching 2(3) (2000)
Google Scholar
Fletcher, W.H.: Concordancing the Web with KWiCFinder. In: Proceedings of the Third North American Symposium on Corpus Linguistics and Language Teaching, Boston, MA (2001)
Google Scholar
Robb, T.: Google as a Corpus Tool? ETJ Journal 4(1) (2003)
Google Scholar
Fletcher, W.H.: Making the Web more useful as source for linguists corpora. In: Conor, U., Upton, T.A. (eds.) Applied Corpus Linguists: A Multidimensional Perspective, pp. 191–205. Rodopi, Amsterdam (2004)
Google Scholar
Prager, J.: Open-Domain Question-Answering. In: Foundations and Trends in Information Retrieval. Now Publishers (2007)
Google Scholar
Monz, C.: From Document Retrieval to Question Answering. Ph.D. thesis, University of Amsterdam (2003)
Google Scholar
Singh, A.K.: Named Entity Recognition for South and South East Asian Languages: Taking Stock. In: Proceedings of the IJNLP 2008 Workshop on NER for South and South East Asian Languages, Hyderabad, India, pp. 5–16 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, India
Somnath Banerjee, Pintu Lohar, Sudip Kumar Naskar & Sivaji Bandyopadhyay

Authors

Somnath Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Pintu Lohar
View author publications
You can also search for this author in PubMed Google Scholar
Sudip Kumar Naskar
View author publications
You can also search for this author in PubMed Google Scholar
Sivaji Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Adam Przepiórkowski & Maciej Ogrodniczuk &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Banerjee, S., Lohar, P., Naskar, S.K., Bandyopadhyay, S. (2014). The First Resource for Bengali Question Answering Research. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-10888-9_30
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10887-2
Online ISBN: 978-3-319-10888-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics