ECIR 2014: Advances in Information Retrieval pp 247-259 | Cite as
Detecting Missing Content Queries in an SMS-Based HIV/AIDS FAQ Retrieval System
Abstract
Automated Frequently Asked Question (FAQ) answering systems use pre-stored sets of question-answer pairs as an information source to answer natural language questions posed by the users. The main problem with this kind of information source is that there is no guarantee that there will be a relevant question-answer pair for all user queries. In this paper, we propose to deploy a binary classifier in an existing SMS-Based HIV/AIDS FAQ retrieval system to detect user queries that do not have the relevant question-answer pair in the FAQ document collection. Before deploying such a classifier, we first evaluate different feature sets for training in order to determine the sets of features that can build a model that yields the best classification accuracy. We carry out our evaluation using seven different feature sets generated from a query log before and after retrieval by the FAQ retrieval system. Our results suggest that, combining different feature sets markedly improves the classification accuracy.
Keywords
Frequently Asked Question Missing Content Queries Text ClassificationPreview
Unable to display preview. Download preview PDF.
References
- 1.Bornman, E.: The Mobile Phone in Africa: Has It Become a Highway to the Information Society or Not? Contemp. Edu. Tech. 3(4) (2012)Google Scholar
- 2.Breiman, L.: Random forests. Machine Learning 45(1) (2001)Google Scholar
- 3.Caruana, R., Niculescu-Mizil, A.: An Empirical Comparison of Supervised Learning Algorithms. In: Proc. of ICML (2006)Google Scholar
- 4.Chang, C.-C., Lin, C.-J.: LIBSVM: A library for Support Vector Machine. ACM Trans. Intell. Syst. Technol. 2(3) (2011)Google Scholar
- 5.Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting Query Performance. In: Proc. of SIGIR (2002)Google Scholar
- 6.Daelemans, W., Zavrel, J., Sloot, K.V.D., Bosch, A.V.D.: TiMBL: Tilburg Memory-Based Learner - version 4.3 - Reference Guide (2002)Google Scholar
- 7.Donner, J.: Research Approaches to Mobile Use in the Developing World: A Review of the Literature. The Info. Soc. 24(3) (2008)Google Scholar
- 8.Ferguson, P., O’Hare, N., Lanagan, J., Smeaton, A.F., McCarthy, K., Phelan, O., Smyth, B.: CALRITY at the TREC 2011 Microblog Track. In: Proc. of TREC (2011)Google Scholar
- 9.Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: an Update. SIGKDD Explor. Newsl. 11(1) (2009)Google Scholar
- 10.Hauff, C., Murdock, V., Baeza-Yates, R.: Improved Query Difficulty Prediction for the Web. In: Proc. of CIKM (2008)Google Scholar
- 11.He, B., Ounis, I.: Inferring Query Performance using Pre-Retrieval Predictors. In: Proc. of SPIRE (2004)Google Scholar
- 12.He, B., Ounis, I.: Query Performance Prediction. Info. Syst. 31(7) (2006)Google Scholar
- 13.Hogan, D., Leveling, J., Wang, H., Ferguson, P., Gurrin, C.: DCU@FIRE 2011: SMS-based FAQ Retrieval. In: Proc. of FIRE (2011)Google Scholar
- 14.Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A Practical Guide to Support Vector Classification (2010)Google Scholar
- 15.John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proc. of UAI (1995)Google Scholar
- 16.Lane, I., Kawahara, T., Matsui, T., Nakamura, S.: Out-of-Domain Utterance Detection Using Classification Confidences of Multiple Topics. IEEE Transact. on Aud. Speech, and Lang. Process. 15(1) (2007)Google Scholar
- 17.Leveling, J.: On the Effect of Stopword Removal for SMS-Based FAQ Retrieval. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 128–139. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 18.Medhi, I., Ratan, A., Toyama, K.: Mobile-Banking Adoption and Usage by Low-Literate, Low-Income Users in the Developing World. In: Proc. of IDGD (2009)Google Scholar
- 19.Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proc. of OSIR at SIGIR (2006)Google Scholar
- 20.Porter, M.F.: An Algorithm for Suffix Stripping. Elec. Lib. Info. Syst. 14(3) (1980)Google Scholar
- 21.Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Info. Retr. 3(4) (2009)Google Scholar
- 22.Sneiders, E.: Automated FAQ Answering: Continued Experience with Shallow Language Understanding. Question Answering Systems. In: Proc. of AAAI Fall Symp. (1999)Google Scholar
- 23.Sneiders, E.: Automated FAQ Answering with Question-Specific Knowledge Representation for Web Self-Service. In: Proc. of HSI (2009)Google Scholar
- 24.Thuma, E., Rogers, S., Ounis, I.: Evaluating Bad Query Abandonment in an Iterative SMS-Based FAQ Retrieval System. In: Proc. of OAIR (2008)Google Scholar
- 25.Yom-Tov, E., Fine, S., Carmel, D., Darlow, A.: Learning to Estimate Query Difficulty: Including Applications to Missing Content Detection and Distributed Information Retrieval. In: Proc. of SIGIR (2005)Google Scholar
- 26.Zhang, M., Dodgson, M.Y.: High-tech Entrepreneurship in Asia: Innovation, Industry and Institutional Dynamics in Mobile Payments. Edward Elgar Publishing, Inc. (2007)Google Scholar
- 27.Zhao, Y., Scholer, F., Tsegay, Y.: Effective Pre-Retrieval Query Performance Prediction Using Similarity and Variability Evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008)CrossRefGoogle Scholar