FAQIR – A Frequently Asked Questions Retrieval Test Collection

  • Mladen Karan
  • Jan Šnajder
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9924)


Frequently asked question (FAQ) collections are commonly used across the web to provide information about a specific domain (e.g., services of a company). With respect to traditional information retrieval, FAQ retrieval introduces additional challenges, the main ones being (1) the brevity of FAQ texts and (2) the need for topic-specific knowledge. The primary contribution of our work is a new domain-specific FAQ collection, providing a large number of queries with manually annotated relevance judgments. On this collection, we test several unsupervised baseline models, including both count based and semantic embedding based models, as well as a combined model. We evaluate the performance across different setups and identify potential venues for improvement. The collection constitutes a solid basis for research in supervised machine-learning-based FAQ retrieval.


Frequently asked questions Information retrieval Question answering 


  1. 1.
    Bunescu, R., Huang, Y.: Learning the relative usefulness of questions in community QA. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 97–107. Association for Computational Linguistics (2010)Google Scholar
  2. 2.
    Burke, R.D., Hammond, K.J., Kulyukin, V., Lytinen, S.L., Tomuro, N., Schoenberg, S.: Question answering from frequently asked question files: experiences with the FAQ finder system. AI Mag. 18(2), 57 (1997)Google Scholar
  3. 3.
    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)CrossRefGoogle Scholar
  4. 4.
    Jijkoun, V., de Rijke, M.: Retrieving answers from frequently asked questions pages on the web. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 76–83. ACM (2005)Google Scholar
  5. 5.
    Kim, H., Seo, J.: High-performance FAQ retrieval using an automatic clustering method of query logs. Inf. Process. Manag. 42(3), 650–661 (2006)CrossRefGoogle Scholar
  6. 6.
    Kothari, G., Negi, S., Faruquie, T.A., Chakaravarthy, V.T., Subramaniam, L.V.: SMS based interface for FAQ retrieval. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, pp. 852–860. Association for Computational Linguistics (2009)Google Scholar
  7. 7.
    Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  8. 8.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  9. 9.
    Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. NIST SPECIAL PUBLICATION SP, p. 109 (1995)Google Scholar
  10. 10.
    Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: TakeLab: Systems for measuring semantic text similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 441–448. Association for Computational Linguistics (2012)Google Scholar
  11. 11.
    Sneiders, E.: Automated FAQ answering with question-specific knowledge representation for web self-service. In: 2009 2nd Conference on Human System Interactions, HSI 2009, pp. 298–305. IEEE (2009)Google Scholar
  12. 12.
    Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers to non-factoid questions from web collections. Comput. Linguist. 37(2), 351–383 (2011)CrossRefGoogle Scholar
  13. 13.
    Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 355–370. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Wu, C.H., Yeh, J.F., Chen, M.J.: Domain-specific FAQ retrieval using independent aspects. ACM Trans. Asian Lang. Inf. Process. (TALIP) 4(1), 1–17 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Faculty of Electrical Engineering and Computing, Text Analysis and Knowledge Engineering LabUniversity of ZagrebZagrebCroatia

Personalised recommendations