Detecting Non-covered Questions in Frequently Asked Questions Collections

  • Mladen KaranEmail author
  • Jan Šnajder
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10260)


Frequently asked questions (FAQ) collections are a popular and effective way of representing information, and FAQ retrieval systems provide a natural-language interface to such collections. An important aspect of efficient and trustworthy FAQ retrieval is to maintain a low fall-out rate by detecting non-covered questions. In this paper we address the task of detecting non-covered questions. We experiment with threshold-based methods as well as unsupervised one-class and supervised binary classifiers, considering tf-idf and word embeddings text representations. Experiments, carried out on a domain-specific FAQ collection, indicate that a cluster-based model with query paraphrases outperforms threshold-based, one-class, and binary classifiers.


FAQ retrieval Novelty detection Question answering 


  1. 1.
    Burke, R.D., Hammond, K.J., Kulyukin, V., Lytinen, S.L., Tomuro, N., Schoenberg, S.: Question answering from frequently asked question files: experiences with the FAQ finder system. AI Mag. 18(2), 57 (1997)Google Scholar
  2. 2.
    Sneiders, E.: Automated FAQ answering with question-specific knowledge representation for web self-service. In: 2nd Conference on Human System Interactions 2009, HSI 2009, pp. 298–305. IEEE (2009)Google Scholar
  3. 3.
    Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers to non-factoid questions from web collections. Comput. Linguist. 37(2), 351–383 (2011)CrossRefGoogle Scholar
  4. 4.
    Feng, M., Xiang, B., Glass, M.R., Wang, L., Zhou, B.: Applying deep learning to answer selection: a study and an open task. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 813–820. IEEE (2015)Google Scholar
  5. 5.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)CrossRefzbMATHGoogle Scholar
  6. 6.
    Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)CrossRefGoogle Scholar
  7. 7.
    Wieting, J., Bansal, M., Gimpel, K., Livescu, K., Roth, D.: From paraphrase database to compositional paraphrase model and back. Trans. Assoc. Comput. Linguist. 3, 345–358 (2015)Google Scholar
  8. 8.
    Karan, M., Šnajder, J.: FAQIR – a frequently asked questions retrieval test collection. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS, vol. 9924, pp. 74–81. Springer, Cham (2016). doi: 10.1007/978-3-319-45510-5_9 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Text Analysis and Knowledge Engineering Lab, Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia

Personalised recommendations