WebShodh: A Code Mixed Factoid Question Answering System for Web

Chandu, Khyathi Raghavi; Chinnakotla, Manoj; Black, Alan W.; Shrivastava, Manish

doi:10.1007/978-3-319-65813-1_9

Khyathi Raghavi Chandu²¹,
Manoj Chinnakotla²²,
Alan W. Black²¹ &
…
Manish Shrivastava²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10456))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1039 Accesses
7 Citations

Abstract

Code-Mixing (CM) is a natural phenomenon observed in many multilingual societies and is becoming the preferred medium of expression and communication in online and social media fora. In spite of this, current Question Answering (QA) systems do not support CM and are only designed to work with a single interaction language. This assumption makes it inconvenient for multi-lingual users to interact naturally with the QA system especially in scenarios where they do not know the right word in the target language. In this paper, we present WebShodh - an end-end web-based Factoid QA system for CM languages. We demonstrate our system with two CM language pairs: Hinglish (Matrix language: Hindi, Embedded language: English) and Tenglish (Matrix language: Telugu, Embedded language: English). Lack of language resources such as annotated corpora, POS taggers or parsers for CM languages poses a huge challenge for automated processing and analysis. In view of this resource scarcity, we only assume the existence of bi-lingual dictionaries from the matrix languages to English and use it for lexically translating the question into English. Later, we use this loosely translated question for our downstream analysis such as Answer Type(AType) prediction, answer retrieval and ranking. Evaluation of our system reveals that we achieve an MRR of 0.37 and 0.32 for Hinglish and Tenglish respectively. We hosted this system online and plan to leverage it for collecting more CM questions and answers data for further improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Mixing of Spanish-English, Hindi-English, Telugu-English, Portugese-Spanish and French-Japanese language pairs respectively.
2.
Hindi is one of the most spoken languages in India, with 370 million native speakers and is an official language along with English. Telugu is the most spoken Dravidian language in South India with about 70 million native speakers.
3.
http://emnlp2014.org/workshops/CodeSwitch/call.html.
4.
http://fire.irsi.res.in/fire/home.
5.
This video is recorded in real time frame to demonstrate the speed of the system for practical purposes.

References

Myers-Scotton, C., Linguistics, C.: Bilingual Encounters and Grammatical Outcomes. Oxford University Press, Oxford (2002)
Book Google Scholar
Hidayat, T.: An Analysis of Code Switching used by Facebookers (2008)
Google Scholar
Brill, E., Dumais, S., Banko, M.: An analysis of the AskMSR question-answering system. In: EMNLP-Volume 10 (2002)
Google Scholar
Zhang, D., Lee, W.S.: A web-based question answering system (2003)
Google Scholar
Magnini, B., et al.: Overview of the CLEF 2004 multilingual question answering track. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 371–391. Springer, Heidelberg (2005). doi:10.1007/11519645_38
Chapter Google Scholar
Tay, M.W.J.: Code switching and code mixing as a communicative strategy in multilingual discourse. World Englishes 8(3), 407–417 (1989)
Article Google Scholar
Lesley, M., Pieter, M.: One Speaker, Two Languages: Cross-Disciplinary Perspectives on Code-Switching. Cambridge University Press, Cambridge (1995)
Google Scholar
Beatrice, A.: Automatic Detection of English Inclusions in Mixed-lingual Data with an Application to Parsing. Dissertation, University of Edinburgh (2007)
Google Scholar
Auer, P.: Code-Switching in Conversation: Language, Interaction and Identity (2013)
Google Scholar
Dey, A., Fung, P.: A hindi-english code-switching corpus. In: LREC, pp. 2410–2413 (2014)
Google Scholar
Barman, U., Das, A., Wagner, J., Foster, J.: Code mixing: a challenge for language identification in the language of social media. In: EMNLP (2014)
Google Scholar
Vyas, Y., et al.: POS tagging of english-hindi code-mixed social media content. In: EMNLP, vol. 14, pp. 974–979 (2014)
Google Scholar
Ferrucci, D., et al.: Building watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)
Article Google Scholar
Moschitti, A., et al.: Using syntactic and semantic structural kernels for classifying definition questions in Jeopardy! In: EMNLP, pp. 712–724 (2011)
Google Scholar
Xu, J., Zhou, Y., Wang, Y.: A classification of questions using SVM and semantic similarity analysis. In: ICICSE, pp. 31–34 (2012)
Google Scholar
Li, X., Roth, D.: Learning question classifiers. In: International Conference on Computational Linguistics-Volume 1, pp. 1–7 (2002)
Google Scholar
Chandu, K.R., Chinnakotla, M., Shrivastava, M.: Answer ka type kya he? Learning to classify questions in code-mixed language. In: International Conference on World Wide Web, pp. 853–858. ACM (2015)
Google Scholar
Majumder, G., Pakray, P.: NLP-NITMZ@ MSIR 2016 system for CodeMixed crossScript question classification. In: ECIR, pp. 7–10 (2016)
Google Scholar
Banerjee, S., et al.: The first cross-script code-mixed question answering corpus. In: ECIR (2016)
Google Scholar
Bhat, I.A., et al.: IIIT-H system submission for FIRE 2014 shared task on transliterated search. In: FIRE, pp. 48–53 (2014)
Google Scholar
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 26–32 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, USA
Khyathi Raghavi Chandu & Alan W. Black
Microsoft India, Hyderabad, India
Manoj Chinnakotla
IIIT Hyderabad, Hyderabad, India
Manish Shrivastava

Authors

Khyathi Raghavi Chandu
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Chinnakotla
View author publications
You can also search for this author in PubMed Google Scholar
Alan W. Black
View author publications
You can also search for this author in PubMed Google Scholar
Manish Shrivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khyathi Raghavi Chandu .

Editor information

Editors and Affiliations

Dublin City University, Dublin, Ireland
Gareth J.F. Jones
Trinity College Dublin, Dublin, Ireland
Séamus Lawless
National University of Distance Education, Madrid, Spain
Julio Gonzalo
Dublin City University, Dublin, Ireland
Liadh Kelly
Université Grenoble Alpes, Grenoble, France
Lorraine Goeuriot
University of Hildesheim, Hildesheim, Germany
Thomas Mandl
University of Padua, Padua, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chandu, K.R., Chinnakotla, M., Black, A.W., Shrivastava, M. (2017). WebShodh: A Code Mixed Factoid Question Answering System for Web. In: Jones, G., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2017. Lecture Notes in Computer Science(), vol 10456. Springer, Cham. https://doi.org/10.1007/978-3-319-65813-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-65813-1_9
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65812-4
Online ISBN: 978-3-319-65813-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics