Abstract
Question Answering Systems (QAS) are tools to retrieve precise answers for user questions from a large set of text documents. Researchers from information retrieval and natural language processing community have put tremendous efforts to improve the performance of QASs across several languages. However, Hindi, the fourth most spoken language has not seen a proportional development in the field of question answering to an extent that information seekers accept QASs as a good alternative of search engines. In this chapter, a pipelined architecture for the development of QASs has been explained in the context of English and Hindi languages. This chapter also reviews the developments taking place in Hindi QASs while explaining the challenges faced by researchers in developing Hindi QASs. To encourage and support the new researchers in conducting researches in Hindi QASs, a list of techniques, tools and linguistic resources required to implement the components of a QAS are described in this chapter in a simple and persuasive manner. Finally, the future directions for research in Hindi QASs have been proposed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Notice that one question can be paraphrased and asked using more than one pattern, getting more than one surface form that share the same meaning. For example the following questions should get the same answer: “During which month tourists visit Kashmir the most?”, “What month do tourists visit Kashmir the most?”, “Which month do tourists visit Kashmir the most?”, and “When do tourists visit Kashmir the most?”.
- 2.
List of Hindi stopwords, http://members.unine.ch/jacques.savoy/clef/hindiST.txt.
- 3.
A shallow parser, http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php.
- 4.
A Hindi stemmer, e http://research.variancia.com/hindi_stemmer/.
- 5.
Hindi POS Tagger, http://sivareddy.in/downloads#hindi_tools.
- 6.
Apache OpenNLP, http://opennlp.apache.org/download.html.
- 7.
Pre-trained models for OpenNLP, http://opennlp.sourceforge.net/models-1.5/.
- 8.
- 9.
Hindi WordNet, http://www.cfilt.iitb.ac.in/wordnet/webhwn/.
- 10.
Python implementation of Hindi WordNet, http://sivareddy.in/downloads#python-hindi-wordnet.
- 11.
IndoWordNet, http://www.cfilt.iitb.ac.in/indowordnet/index.jsp.
- 12.
Hindi Wikipedia, https://hi.wikipedia.org/wiki/िवशेष:/Statistics, accessed on January, 25, 2017.
- 13.
DBPedia, http://wiki.dbpedia.org/about, accessed on January, 25, 2017.
- 14.
HindiWalC corpus, https://www.sketchengine.co.uk/hindiwac-corpus/.
- 15.
Lucene, http://lucene.apache.org/core/.
- 16.
Lucene classes for Hindi, https://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/hi/package-summary.html.
- 17.
GATE, http://gate.ac.uk/.
- 18.
QANUS, http://www.qanus.com/.
- 19.
True Knowledge, http://www.evi.com/.
References
Buscaldi, D., Rosso, P.: Mining knowledge from Wikipedia for the question answering task. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06), pp. 727–730 (2011)
Dolvera-Lobo, M.-D., Gutiérrez-Artacho, J.: Multilingual question-answering system in biomedical domain on the Web: an evaluation, Lect.e Notes Comput. Sci. 6941, 83–88 (2011)
Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Girju, R., Goodrum R., Rus, V.: The structure and performance of an open-domain question answering system. In Proceedings of the Conference of the Association for Computational Linguistics (ACL-2000), pp. 563–570 (2000)
Moldovan, D., Harabagiu, S., Girju, R., Morarescu, P., Lacatusu, F., Novischi, A., Badulescu, A., Bolohan, O.: LCC tools for question answering. In: Proceedings of the 11th Text REtrieval Conference TREC-2002, NIST, Gaithersburg (2002)
Efthimiadis, E.N.: Query expansion. Ann. Rev. Inf. Syst. Technol. 31, 121–187 (1996)
Renals, S., Abberly D.: The THISLSDR system at TREC-9. In: Proceedings of 9th Text Retrieval conference, Gaithersburg, MD (2000)
Clarke, C.L.A., Cormack, G.V., Kisman, D.I.E., Lynam, T.R.: Question answering by passage selection (MultiText experiments for TREC-9). In: Voorhees, E., Harman, D. (eds.) Proceedings of the Ninth Text REtrieval Conference (TREC-9, pp. 673–683), NIST Special Publication (2000)
Araujo, L., Pérez-Agüera, J.R.: Improving query expansion with stemming terms: a new genetic algorithm approach. In: Proceedings of the 8th European Conference on Evolutionary Computation in Combinatorial Optimization, pp. 182–193 (2008)
Li, X., Yang, W.Z.: Research on personalized document retrieval based on user interest model. In: Proceedings of 7th International Conference on, Computer Science & Education, pp. 1771–1773 (2012)
Lee, D.L., Chuang, H., Seamons, K.: Document ranking and the vector space model. IEEE Softw. 14(2), 67–75 (1997)
Crestani, F., Lalmas, M., van Rijsbergen, C.J., Campbell, I.: Is this document relevant? Probably. A survey of probabilistic models in information retrieval. ACM Comput. Surv. 30, 528–552 (1998)
Henzinger, Monika, R.: Hyperlink analysis for the web. IEEE Internet Comput. 5(1), 45–50 (2001)
Brin, S., Page, L.: The anatomy of a large-scale hyper-textual web search engine. In: Proceedings of the Seventh International World Wide Web Conference, pp. 107–117, Elsevier Science, New York (1998)
Baeza-Yates, R., Davis, E.: Web page ranking using link attributes. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, pp. 328–329 (2004)
Lempel, R., Moran, S.P.: The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Comput. Netw. Int. J. Comput. Telecommun. Netw. Elsevier North-Holland, New York 33(1–6), pp 387-401 (2000)
Vallet, D., Fernández, M., Castells, P.: An Ontology-based information retrieval model. In: Gómez-Pérez, A., Euzenat, J. (eds.) Proceedings of the 2nd European Semantic Web Conference (ESWC 2005), Heraklion, Greece, Lecture Notes in Computer Science, vol. 3532, pp. 455–470. Springer (2005)
Tellex, S., Katz, B., Lin, J., Fernandes, A., Marton, G.: Quantitative evaluation of passage retrieval algorithms for Question Answering. In: Proceedings of the 26th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, Canada (2003)
Wang, D.S.: A domain-specific question answering system based on ontology and question templates. In: Proceedings of 11th ACIS International Conference on Software Engineering Artificial Intelligence Networking and Parallel/Distributed Computing (SNPD), pp. 151–156 (2010)
AbdelRahman, S., Elarnaoty, M., Magdy, M., Fahmy, A.: Integrated machine learning techniques for Arabic named entity recognition. Int. J. Comput. Sci. Issues 7(4)(3), 27–36 (2010)
Katz, B., Lin, J.: Selectively using relations to improve precision in question answering. In: Proceedings of the EACL 2003 Workshop on Natural Language Processing for Question Answering, Budapest, Hungary, pp. 43–50 (2003)
Harabagiu, S.M., Pasca, M.A., Maiorano, S.J.: Experiments with open-domain textual question answering. In: Proceedings of the 18th International Conference on Computational Linguistics, Association for Computational Linguistics, Saarbrucken, Germany, pp. 292–298 (2000)
Chu-Carroll, J., Prager, J., Czuba, K., Ferrucci, D., Duboue, P.: Semantic search via XML Fragments: a high-precision approach to IR. In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development on Information Retrieval, Seattle, pp. 445–452 (2006)
Shaalan, K.: Rule-based approach in Arabic natural language processing. Special Issue on Advances in Arabic Language Processing, the International Journal on Information and Communication Technologies (IJICT), vol. 3(3), pp 11–19. Serial Publications, New Delhi, India (2010)
Shaalan, K., Oudah, M.: A hybrid approach to Arabic named entity recognition. Journal of Information Science (JIS). vol. 40(1), pp. 67–87. SAGE Publications Ltd, UK (2014)
Oudah, M., Shaalan, K.: Person name recognition using hybrid approach. In: NLDB 2013, LNCS, vol. 7934, pp. 237–248. Springer, Berlin (2013)
Ray, S.K., Singh, S., Joshi, B.P.: Question classification & answer validation—a semantic approach using WordNet and Wikipedia. Pattern Recogn. Lett. 31(13), 1935–1943 (2010)
Cao, Y.G., Liua, F., Simpsonb, P., Antieaua, L., Bennett, A., Cimino, J.J., Ely, J., Yu, H.: AskHERMES: an online question answering system for complex clinical questions. J. Biomed. Inf. 44(2), pp. 277–288 (2011)
Zheng, Z.: AnswerBus question answering system. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 399–404 (2002)
Lin, J.: An exploration of the principles underlying redundancy-based factoid question answering. ACM Trans. Inf. Syst. 27(2), 1–55 (2007)
Larkey, L.S., Connell, M.E., Abduljaleel, N.: “Hindi CLIR in Thirty Days,” ACM Transactions on Asian Language Information Processing (TALIP), vol 2, Issue 2, pp. 130–142. ACM, New York, NY, USA, June 2003
Sekine, S., Grishman, R.: Hindi-English Cross-Lingual Question-Answering system. ACM Trans. Asian Lang. Inf. Process. (TALIP) 2(3), 181–192 (2003)
Shukla, P., Mukherjee, A., Raina, A.: Towards a language independent encoding of documents: a novel approach to multilingual question answering. In: Proceedings of the 1st International Workshop on Natural Language Understanding and Cognitive Science, NLUCS 2004, pp. 116–125, (2004)
Uchida, H.: UNL Beyond machine translation. In: International Symposium on Language in Cyberspace, Seoul, Korea Systems. ICEIS Press (2001)
Surve, M., Singh, S., Kagathara, S., Venkatasivaramasastry, K., Dubey, S., Rane, G., Saraswati, J., Badodekar, S., Iyer, A., Almeida, A., Nikam, R., Perez, C.G., Bhattacharyya, P.: AgroExplorer: a meaning based multilingual search engine. International Conference on Digital Libraries (2004)
CLIA Consortium: Cross lingual information access system for indian languages. In: Demo/Exhibition of the 3rd International Joint Conference on Natural Language Processing, Hyderabad, India, pp. 973–975 (2008)
Kumar, P., Kashyap, S., Mittal, A., Gupta, S.: A query answering system for e-learning Hindi documents. South Asian Language Review, vol. XIII, Nos 1&2, Jan-June, 2003. pp. 69–81 (2003)
Sahu, S., Vasnik, N., Roy, D.: Prashnottar: a Hindi question answering system. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 4(2) (2012)
Nanda, G., Dua, M., Singla, K.: A Hindi question answering system using machine learning approach. In: 2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT) (2016)
Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. Proc. Jt. SIGDAT Conf. EMNLP VLC 1999, 90–99 (1999)
Li, W., McCallum, A.: Rapid development of Hindi named entity recognition using conditional random fields and feature induction (Short Paper). In: ACM Transactions on Computational Logic (2004)
Kumar, N., Pushpak, B.: Named Entity Recognition in Hindi using MEMM. In Technical Report, IIT Bombay, India (2006)
Saha, S.K., Chatterjee, S., Dandapat, S., Sarkar, S., Mitra, P.: A Hybrid Approach for Named Entity Recognition in Indian Languages. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, Hyderabad, India, pp. 17–24, January 2008
Avinesh, P., Karthik, G.: Part of speech tagging and chunking using conditional random fields and transformation based learning. Proc IJCAI Workshop Shallow Parsing South Asian Lang. India 2007, 21–24 (2007)
Ray, P.R., Harish, V., Basu, A., Sarkar, S.: Part of speech tagging and local word grouping techniques for natural language parsing in Hindi. In: Proceedings of ICON (2003)
Singh, S., Gupta, K., Shrivastava, M., Bhattacharyya, P.: Morphological richness offsets resource demand-experiences in constructing a POS Tagger for Hindi. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, pp. 779–786, July 2006
Akshar, B., Chaitanya, V., Sangal, R.: NLP A Paninian Perspective. Prentice Hall of India, Delhi (1994)
Ramanathan, A., Rao, D.: A lightweight stemmer for Hindi. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL), on Computational Linguistics for South Asian Languages (Budapest, Apr.) workshop (2003)
Reddy, S., Sharoff, S.: Cross language POS Taggers (and other Tools) for Indian languages: an experiment with Kannada using Telugu resources. In: Proceedings of the 5th Workshop on Cross Lingual Information Access (2011)
Brants, T.: Tnt: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, ANLC’00, Stroudsburg, PA, USA, pp. 224–231. Association for Computational Linguistics (2000)
Ekbal, A., Haque, R., Das, A., Poka, V., Bandyopadhyay, S.: Language Independent named entity recognition in Indian languages. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, Hyderabad, India, pp. 33–40, Jan 2008
Dandapat, S.: Part-of-Speech tagging and chunking with maximum entropy mode. In: Proceedings of SPSAL2007, IJCAI, India, pp. 29–32 (2007)
Chaware, S.M., Rao, S.: Ontology approach for cross language information retrieval. Int. J. Comput. Technol Appl. 2, 379–384 (2011)
Bhatt, B., Bhattacharyya, P.: Domain specific ontology extractor for Indian languages. In: Proceedings of 10th Workshop on Asian Language Resources, COLING, Mumbai, pp. 75–84 (2012)
Mathur, I., Darbari, H., Joshi, N.: Domain ontology development for communicable diseases. CS & IT-CSCP 3, 351–360 (2013)
Dwivedi, S.K., Kumar, A.: Development of University ontology for aSPOCMS. J. Emerg. Technol. Web Intell. 5, 213–221 (2013)
Miller, G.A.: WordNet: a Lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Segond, F., Schiller, A., Grefenstette, G., Chanod, J.-P.: An experiment in semantic tagging using hidden Markov model tagging. In: Proceedings of the Workshop in Automatic Information Extraction and Building of Lexical Semantic Resources, pp. 78–81 (1997)
G´omez-Adorno, H., Pinto, D., Darnes, V.A.: Question Answering System for Reading Comprehension Tests. Pattern Recognition Lecture Notes in Computer Science, vol. 7914, pp. 354–363 (2013)
Yue, J., Alan, C., Biermann, W.: The use of lexical semantics in information extraction. In: Proceedings of the Workshop in Automatic Information Extraction and Building of Lexical Semantic Resources, pp. 61–70 (1997)
Fellbaum, C.: WordNet(s). In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, 2nd Edn. pp. 665–670. Oxford, Elsevier (2006)
Zhiguo, G., Chan, W., Leong, H.U.: Web query expansion by WordNet. Database Expert Syst. Appl. Lect. Notes Comput. Sci. 3588, 166–175 (2005)
Attia, M., Toral, A., Tounsi, L., Monachini, M., van Genabith, J.: An automatically built named entity lexicon for Arabic. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta (2010)
Li, X., Szpakowicz, S., Matwin, S.: A WordNet-based algorithm for word sense disambiguation. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1368–1374 (1995)
Sharma, V.K., Mittal, N.: Exploiting Wikipedia API for Hindi-english Cross-language Information Retrieval. In: Proceedings of Twelfth International Multi-Conference on Information Processing-2016, 19-21 Aug 2016, Bangalore, India, pp. 434–440 (2016)
Barman, U., Lohar, P., Bhaskar, P., Bandyopadhyay, S.: Ad-hoc information retrieval focused on wikipedia based query expansion and entropy based ranking. In: The proceedings of the Forum for Information Retrieval Evaluation (FIRE)—2012. Dec 2012, ISI, Kolkata, India (2012)
Adel, T., Okba, T.: DBPedia based factoid question answering system. Int. J. Web Semant. Technol. 4(3), 23–38 (2013)
Kilgarriff, A., Reddy, S., Pomikálek, J., Avinesh, P.V.S.: A Corpus Factory for many languages. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), 19–21 May 2010. Malta, Valletta (2010)
Habash, N., Rambow O., Roth R.: A toolkit for Arabic tokenization, diacritization, morphological, disambiguation, POS tagging, stemming and lemmatization. In: Proceedings of Second International Conference on Arabic Language Resources and Tools, pp. 102–109 (2009)
Palmer, M., Bhatt, R., Narasimhan, B., Rambow, O., Misra, D.S., Xia, F.: Hindi syntax: annotating dependency, lexical predicate-argument structure, and phrase structure. In: The Proceedings of the 7th International Conference on Natural Language Processing, ICON-2009, Hyderabad, India, 14–17 Dec 2009
Bilotti, M.W., Katz, B., Lin, J.: What works better for question answering: stemming or morphological query expansion? In: Proceedings of Information Retrieval for Question Answering Workshop, at SIGIR (2004)
Lopez, V., Victoria, U., Enrico, M., Michele, P.: AquaLog: An ontology-driven question answering system for organizational semantic intranets. J. Web Semant. Elsevier 5(2), 72–105 (2007)
Derczynski, L., Field, C.V., Bøgh, K.S.: DKIE: open source information extraction for Danish. In: Proceedings of the meeting of the European chapter of the Association for Computation Linguistics (EACL), Gothenburg, Sweden (2014)
Maynard, D., Bontcheva, K.: Natural language processing. In: Lehmann, J., Voelker, J. (eds.) Perspectives of Ontology Learning. IOS Press (2014)
Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceedings of the Language Resources and Evaluation Conference (LREC) (2014)
Ng, J.-P., Kan M.-Y.: QANUS: An open source question-answering platform. http://wing.comp.nus.edu.sg/~junping/docs/qanus.pdf (2014). Accessed 1 May 2014
Ageev, M., Lagun, D., Agichtein, E.: The answer is at your fingertips: improving passage retrieval for web question answering with search behavior data. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 1011–1021 (2013)
Geirsson, Ó.P.: IceQA: Developing an open source question-answering system. http://www.ru.is/~hrafn/students/IceQA.pdfm (2013)
Gali, K., Surana, H., Vaidya, A., Shishtla, P., Sharma, D.M.: Aggregative machine learning and rule based heuristics for named entity recognition. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp 25–32 (2008)
Bowden, M., Olteanu, M., Suriyentrakorn, P., Clark, J., Moldovan, D.: LCC’s PowerAnswer at QA@CLEF 2006. In Proceedings of CLEF 2006, pp. 310–317 (2006)
Katz, B., Borchardt, G., Felshin, S.: Natural language annotations for question answering. In: Proceedings of the 19th International FLAIRS Conference (FLAIRS 2006), Melbourne Beach, FL (2006)
Radev, D.R., Qi, H., Wu, H., Fan, W.: Evaluating web-based question answering systems. In: Proceedings of LREC, Las Palmas, Spain (2002)
Higashinaka, R., Isozaki, H.: Corpus-based question answering for why-questions. In: Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP), Hyderabad, India, pp. 418–425 (2008)
Brill, E., Dumais, S., Banko, M.: An analysis of the AskMSR question answering system. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, Pennsylvania, USA, pp. 257–264, 6–7 July 2002
Soricut, R., Brill, E.: Automatic question answering using the Web: beyond the factoid. J. Inf. Retr.—Special Issue Web Inf. Retr. 9(2), 191–206 (2006)
Dror, G., Koren, Y., Maarek, Y., Szpektor, I.: I want to answer; who has a question?: Yahoo! answers recommender system. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1109–1117 (2011)
Adamic, L.A., Zhang, J., Bakshy, E., Ackerman, M.S.: Knowledge sharing and yahoo answers: everyone knows something. In: Proceedings of WWW ‘08, pp. 665–674 (2008)
Surdeanu, M., Massimiliano, C., Hugo, Z.: Learning to rank answers to non-factoid questions fromweb collections. Assoc. Comput. Linguist. 37(2), 351–383 (2011)
Arai, K., Handayani, A.N.: Question answering system for an effective collaborative learning. Int. J. Adv. Comput. Sci. Appl. 3(1), 60–64 (2012)
Cairns, B.L., Nielsen, R.D., Masanz, J.J., Martin, J.H., Palmer, M.S., Ward, W.H., Savova, G.K.: The MiPACQ clinical question answering system. In: Proceedings of AMIA Annual Symposium, pp. 171–180 (2011)
Kongthon, A., Kongyoung, S., Haruechaiyasak, C., Palingoon, P.: A semantic based question answering system for Thailand tourism information. In: Proceedings of the KRAQ11 Workshop, Chiang Mai, Thailand, pp. 38–42 (2011)
Baeza-Yates, R., Rello, L.: How bad do you spell?: the lexical quality of social media. In: Proceedings of the Future of the Social Web, WS-11–03 of AAAI Workshops, AAAI (2011)
Raghavi, K.C., Chinnakotla, M., Shrivastava, M.: Answer ka type kya he? Learning to classify questions in code-mixed language. In: The Proceedings of the International World Wide Web Conference Committee (IW3C2), pp. 853–858 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Ray, S.K., Ahmad, A., Shaalan, K. (2018). A Review of the State of the Art in Hindi Question Answering Systems. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-67056-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67055-3
Online ISBN: 978-3-319-67056-0
eBook Packages: EngineeringEngineering (R0)