Detecting Non-covered Questions in Frequently Asked Questions Collections
Frequently asked questions (FAQ) collections are a popular and effective way of representing information, and FAQ retrieval systems provide a natural-language interface to such collections. An important aspect of efficient and trustworthy FAQ retrieval is to maintain a low fall-out rate by detecting non-covered questions. In this paper we address the task of detecting non-covered questions. We experiment with threshold-based methods as well as unsupervised one-class and supervised binary classifiers, considering tf-idf and word embeddings text representations. Experiments, carried out on a domain-specific FAQ collection, indicate that a cluster-based model with query paraphrases outperforms threshold-based, one-class, and binary classifiers.
KeywordsFAQ retrieval Novelty detection Question answering
- 1.Burke, R.D., Hammond, K.J., Kulyukin, V., Lytinen, S.L., Tomuro, N., Schoenberg, S.: Question answering from frequently asked question files: experiences with the FAQ finder system. AI Mag. 18(2), 57 (1997)Google Scholar
- 2.Sneiders, E.: Automated FAQ answering with question-specific knowledge representation for web self-service. In: 2nd Conference on Human System Interactions 2009, HSI 2009, pp. 298–305. IEEE (2009)Google Scholar
- 4.Feng, M., Xiang, B., Glass, M.R., Wang, L., Zhou, B.: Applying deep learning to answer selection: a study and an open task. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 813–820. IEEE (2015)Google Scholar
- 7.Wieting, J., Bansal, M., Gimpel, K., Livescu, K., Roth, D.: From paraphrase database to compositional paraphrase model and back. Trans. Assoc. Comput. Linguist. 3, 345–358 (2015)Google Scholar