An Unsupervised Approach for Low-Quality Answer Detection in Community Question-Answering

  • Haocheng Wu
  • Zuohui Tian
  • Wei Wu
  • Enhong ChenEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10178)


Community Question Answering (CQA) sites such as Yahoo! Answers provide rich knowledge for people to access. However, the quality of answers posted to CQA sites often varies a lot from precise and useful ones to irrelevant and useless ones. Hence, automatic detection of low-quality answers will help the site managers efficiently organize the accumulated knowledge and provide high-quality contents to users. In this paper, we propose a novel unsupervised approach to detect low-quality answers at a CQA site. The key ideas in our model are: (1) most answers are normal; (2) low-quality answers can be found by checking its “peer” answers under the same question; (3) different questions have different answer quality criteria. Based on these ideas, we devise an unsupervised learning algorithm to assign soft labels to answers as quality scores. Experiments show that our model significantly outperforms the other state-of-the-art models on answer quality prediction.


Community question answering Answer quality evaluation 



This research was partially supported by grants from the National Key Research and Development Program of China (Grant No. 2016YFB1000904), the National Science Foundation for Distinguished Young Scholars of China (Grant No. 61325010), the National Natural Science Foundation of China (Grant No. 61672483), and the Fundamental Research Funds for the Central Universities of China (Grant No. WK2350000001).


  1. 1.
    Berger, A., et al.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR 2000 (2000)Google Scholar
  2. 2.
    Blei, D.M., et al.: Latent Dirichlet allocation. In: NIPS 2001 (2001)Google Scholar
  3. 3.
    Chandola, V., et al.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)Google Scholar
  4. 4.
    Crawford, M., et al.: Survey of review spam detection using machine learning techniques. J. Big Data 2(1), 23 (2015)CrossRefGoogle Scholar
  5. 5.
    Denkowski, M.J., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: EACL 2014 (2014)Google Scholar
  6. 6.
    Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefzbMATHGoogle Scholar
  7. 7.
    Jeon, J., et al.: A framework to predict the quality of answers with non-textual features. In: SIGIR 2006 (2006)Google Scholar
  8. 8.
    Jindal, N., Liu, B.: Review spam detection. In: WWW 2007, pp. 1189–1190 (2007)Google Scholar
  9. 9.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer/Springer, New York (2002)CrossRefGoogle Scholar
  10. 10.
    Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL 2003 (2003)Google Scholar
  11. 11.
    Li, F., et al.: Learning to identify review spam. In: IJCAI 2011 (2011)Google Scholar
  12. 12.
    Liu, W., et al.: Unsupervised one-class learning for automatic outlier removal. In: CVPR 2014 (2014)Google Scholar
  13. 13.
    Lyon, C., et al.: Detecting short passages of similar text in large document collections. In: EMNLP 2001, pp. 118–125 (2001)Google Scholar
  14. 14.
    Mikolov, T., et al.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)Google Scholar
  15. 15.
    Nakov, P., et al.: Semeval-2015 task 3: answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)Google Scholar
  16. 16.
    Nakov, P., et al.: Semeval-2016 task 3: community question answering. In: SemEval@NAACL-HLT 2016, pp. 525–545 (2016)Google Scholar
  17. 17.
    Nallapati, R.: Discriminative models for information retrieval. In: SIGIR 2004 (2004)Google Scholar
  18. 18.
    Nicosia, M.Q., et al.: QCRI: answer selection for community question answering - experiments for arabic and english. In: SemEval@NAACL-HLT 2015 (2015)Google Scholar
  19. 19.
    Radev, D.R., et al.: Evaluating web-based question answering systems. In: LREC’s 2002 (2002)Google Scholar
  20. 20.
    Sakai, T., et al.: Using graded-relevance metrics for evaluating community QA answer selection. In: WSDM 2011 (2011)Google Scholar
  21. 21.
    Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR 2010 (2010)Google Scholar
  22. 22.
    Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)Google Scholar
  23. 23.
    Tran, Q.H., et al.: JAIST: combining multiple features for answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)Google Scholar
  24. 24.
    Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. In: SIGCSE 1996, pp. 130–134 (1996)Google Scholar
  25. 25.
    Xia, Y., et al.: Learning discriminative reconstructions for unsupervised outlier removal. In: ICCV 2015 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of Science and Technology of ChinaHefeiChina
  2. 2.Harbin Institute of TechnologyHarbinChina
  3. 3.Microsoft ResearchBeijingChina

Personalised recommendations