Language Modeling Approach to Retrieval for SMS and FAQ Matching

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7536)


Short Messaging service popularly known as “SMS” has seen growth due to the growth in Mobile phone users. A mobile phone is considered as a cheap and easy device for communication. It is also used as a source to acquire and spread information. SMS based FAQ Retrieval task proposed in FIRE 2011 aims to provide the required information from frequently asked questions (FAQs). Challenge is to find a question from corpora of FAQs that best answers/matches with the SMS query. But, SMS queries are noisy as users tend to compress text by omitting letters, using slang, etc. This is observed due to a cap on the length of messages (160 characters constitute one SMS), lack of screen space (which makes reading large amounts of text difficult). In this paper, we propose a method using language modeling approach to match noisy SMS text with right FAQ. We extended this framework to match SMS queries with Cross-language FAQs. Results are promising for monolingual retrieval applied on English, Hindi and Malayalam languages.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Contractor, D., Faruquie, T., Subramaniam, L.: Unsupervised cleansing of noisy text. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 189–196 (2010)Google Scholar
  2. 2.
    Kothari, G., Negi, S., Faruquie, T., Chakravarthy, V., Subramaniam, L.V.: SMS based Interface for FAQ Retrieval. In: Annual Meeting of the Association for Computation Linguistics (2009)Google Scholar
  3. 3.
    Sneiders, E.: Automated FAQ Answering: Continued Experience with Shallow Language Understanding Question Answering Systems. In: AAAI Fall Symposium. Technical Report FS-99-02, pp. 97–107. AAAI Press (1999)Google Scholar
  4. 4.
    Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI (2006)Google Scholar
  5. 5.
    Sahami, M., Heilman, T.: A web-based kernel function for measuring the similarity of short text snippets. In: World Wide Web. ACM Press (2006)Google Scholar
  6. 6.
    Pedersen, T.: Computational approaches to measuring the similarity of short contexts: A review of applications and methods. CoRR, abs/0806.3787 (2008)Google Scholar
  7. 7.
    Shrestha, P.: Corpus-based methods for short text similarity. In: 15th Rencontre des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, vol. 2, pp. 297–302 (2011)Google Scholar
  8. 8.
    Bharadwaj, R., Tandon, N., Varma, V.: An Iterative approach to extract dictionaries from Wikipedia for under-resourced languages. In: 8th International Conference on Natural Language Processing, ICON (2010)Google Scholar
  9. 9.
    Ponte, J.M., Bruce Croft, W.: A language modeling approach to information retrieval. In: 21st ACM SIGIR, pp. 275–281 (1998)Google Scholar
  10. 10.
    Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: ACM SIGIR, pp. 222–229 (1999)Google Scholar
  11. 11.
    Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Transactions on Information Systems 22(2), 179–214 (2004)CrossRefGoogle Scholar
  12. 12.
    Ballesteros, L., Croft, B.: Dictionary Methods for Cross-Lingual Information Retrieval. In: Thoma, H., Wagner, R.R. (eds.) DEXA 1996. LNCS, vol. 1134, pp. 791–801. Springer, Heidelberg (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.International Institute of Information TechnologyHyderabadIndia

Personalised recommendations