Language Modeling Approach to Retrieval for SMS and FAQ Matching
Short Messaging service popularly known as “SMS” has seen growth due to the growth in Mobile phone users. A mobile phone is considered as a cheap and easy device for communication. It is also used as a source to acquire and spread information. SMS based FAQ Retrieval task proposed in FIRE 2011 aims to provide the required information from frequently asked questions (FAQs). Challenge is to find a question from corpora of FAQs that best answers/matches with the SMS query. But, SMS queries are noisy as users tend to compress text by omitting letters, using slang, etc. This is observed due to a cap on the length of messages (160 characters constitute one SMS), lack of screen space (which makes reading large amounts of text difficult). In this paper, we propose a method using language modeling approach to match noisy SMS text with right FAQ. We extended this framework to match SMS queries with Cross-language FAQs. Results are promising for monolingual retrieval applied on English, Hindi and Malayalam languages.
Unable to display preview. Download preview PDF.
- 1.Contractor, D., Faruquie, T., Subramaniam, L.: Unsupervised cleansing of noisy text. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 189–196 (2010)Google Scholar
- 2.Kothari, G., Negi, S., Faruquie, T., Chakravarthy, V., Subramaniam, L.V.: SMS based Interface for FAQ Retrieval. In: Annual Meeting of the Association for Computation Linguistics (2009)Google Scholar
- 3.Sneiders, E.: Automated FAQ Answering: Continued Experience with Shallow Language Understanding Question Answering Systems. In: AAAI Fall Symposium. Technical Report FS-99-02, pp. 97–107. AAAI Press (1999)Google Scholar
- 4.Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI (2006)Google Scholar
- 5.Sahami, M., Heilman, T.: A web-based kernel function for measuring the similarity of short text snippets. In: World Wide Web. ACM Press (2006)Google Scholar
- 6.Pedersen, T.: Computational approaches to measuring the similarity of short contexts: A review of applications and methods. CoRR, abs/0806.3787 (2008)Google Scholar
- 7.Shrestha, P.: Corpus-based methods for short text similarity. In: 15th Rencontre des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, vol. 2, pp. 297–302 (2011)Google Scholar
- 8.Bharadwaj, R., Tandon, N., Varma, V.: An Iterative approach to extract dictionaries from Wikipedia for under-resourced languages. In: 8th International Conference on Natural Language Processing, ICON (2010)Google Scholar
- 9.Ponte, J.M., Bruce Croft, W.: A language modeling approach to information retrieval. In: 21st ACM SIGIR, pp. 275–281 (1998)Google Scholar
- 10.Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: ACM SIGIR, pp. 222–229 (1999)Google Scholar