Detecting Missing Content Queries in an SMS-Based HIV/AIDS FAQ Retrieval System

  • Edwin Thuma
  • Simon Rogers
  • Iadh Ounis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)


Automated Frequently Asked Question (FAQ) answering systems use pre-stored sets of question-answer pairs as an information source to answer natural language questions posed by the users. The main problem with this kind of information source is that there is no guarantee that there will be a relevant question-answer pair for all user queries. In this paper, we propose to deploy a binary classifier in an existing SMS-Based HIV/AIDS FAQ retrieval system to detect user queries that do not have the relevant question-answer pair in the FAQ document collection. Before deploying such a classifier, we first evaluate different feature sets for training in order to determine the sets of features that can build a model that yields the best classification accuracy. We carry out our evaluation using seven different feature sets generated from a query log before and after retrieval by the FAQ retrieval system. Our results suggest that, combining different feature sets markedly improves the classification accuracy.


Frequently Asked Question Missing Content Queries Text Classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bornman, E.: The Mobile Phone in Africa: Has It Become a Highway to the Information Society or Not? Contemp. Edu. Tech. 3(4) (2012)Google Scholar
  2. 2.
    Breiman, L.: Random forests. Machine Learning 45(1) (2001)Google Scholar
  3. 3.
    Caruana, R., Niculescu-Mizil, A.: An Empirical Comparison of Supervised Learning Algorithms. In: Proc. of ICML (2006)Google Scholar
  4. 4.
    Chang, C.-C., Lin, C.-J.: LIBSVM: A library for Support Vector Machine. ACM Trans. Intell. Syst. Technol. 2(3) (2011)Google Scholar
  5. 5.
    Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting Query Performance. In: Proc. of SIGIR (2002)Google Scholar
  6. 6.
    Daelemans, W., Zavrel, J., Sloot, K.V.D., Bosch, A.V.D.: TiMBL: Tilburg Memory-Based Learner - version 4.3 - Reference Guide (2002)Google Scholar
  7. 7.
    Donner, J.: Research Approaches to Mobile Use in the Developing World: A Review of the Literature. The Info. Soc. 24(3) (2008)Google Scholar
  8. 8.
    Ferguson, P., O’Hare, N., Lanagan, J., Smeaton, A.F., McCarthy, K., Phelan, O., Smyth, B.: CALRITY at the TREC 2011 Microblog Track. In: Proc. of TREC (2011)Google Scholar
  9. 9.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: an Update. SIGKDD Explor. Newsl. 11(1) (2009)Google Scholar
  10. 10.
    Hauff, C., Murdock, V., Baeza-Yates, R.: Improved Query Difficulty Prediction for the Web. In: Proc. of CIKM (2008)Google Scholar
  11. 11.
    He, B., Ounis, I.: Inferring Query Performance using Pre-Retrieval Predictors. In: Proc. of SPIRE (2004)Google Scholar
  12. 12.
    He, B., Ounis, I.: Query Performance Prediction. Info. Syst. 31(7) (2006)Google Scholar
  13. 13.
    Hogan, D., Leveling, J., Wang, H., Ferguson, P., Gurrin, C.: DCU@FIRE 2011: SMS-based FAQ Retrieval. In: Proc. of FIRE (2011)Google Scholar
  14. 14.
    Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A Practical Guide to Support Vector Classification (2010)Google Scholar
  15. 15.
    John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proc. of UAI (1995)Google Scholar
  16. 16.
    Lane, I., Kawahara, T., Matsui, T., Nakamura, S.: Out-of-Domain Utterance Detection Using Classification Confidences of Multiple Topics. IEEE Transact. on Aud. Speech, and Lang. Process. 15(1) (2007)Google Scholar
  17. 17.
    Leveling, J.: On the Effect of Stopword Removal for SMS-Based FAQ Retrieval. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 128–139. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Medhi, I., Ratan, A., Toyama, K.: Mobile-Banking Adoption and Usage by Low-Literate, Low-Income Users in the Developing World. In: Proc. of IDGD (2009)Google Scholar
  19. 19.
    Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proc. of OSIR at SIGIR (2006)Google Scholar
  20. 20.
    Porter, M.F.: An Algorithm for Suffix Stripping. Elec. Lib. Info. Syst. 14(3) (1980)Google Scholar
  21. 21.
    Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Info. Retr. 3(4) (2009)Google Scholar
  22. 22.
    Sneiders, E.: Automated FAQ Answering: Continued Experience with Shallow Language Understanding. Question Answering Systems. In: Proc. of AAAI Fall Symp. (1999)Google Scholar
  23. 23.
    Sneiders, E.: Automated FAQ Answering with Question-Specific Knowledge Representation for Web Self-Service. In: Proc. of HSI (2009)Google Scholar
  24. 24.
    Thuma, E., Rogers, S., Ounis, I.: Evaluating Bad Query Abandonment in an Iterative SMS-Based FAQ Retrieval System. In: Proc. of OAIR (2008)Google Scholar
  25. 25.
    Yom-Tov, E., Fine, S., Carmel, D., Darlow, A.: Learning to Estimate Query Difficulty: Including Applications to Missing Content Detection and Distributed Information Retrieval. In: Proc. of SIGIR (2005)Google Scholar
  26. 26.
    Zhang, M., Dodgson, M.Y.: High-tech Entrepreneurship in Asia: Innovation, Industry and Institutional Dynamics in Mobile Payments. Edward Elgar Publishing, Inc. (2007)Google Scholar
  27. 27.
    Zhao, Y., Scholer, F., Tsegay, Y.: Effective Pre-Retrieval Query Performance Prediction Using Similarity and Variability Evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Edwin Thuma
    • 1
    • 2
  • Simon Rogers
    • 1
  • Iadh Ounis
    • 1
  1. 1.School of Computing ScienceUniversity of GlasgowGlasgowUK
  2. 2.Department of Computer ScienceUniversity of BotswanaGaboroneBotswana

Personalised recommendations