Skip to main content
Log in

Similar question retrieval with incorporation of multi-dimensional quality analysis for community question answering

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The semantic-based method for question retrieval is an important method for searching similar questions in community question answering (CQA). The major challenges in question retrieval lie in polysemy and lexical gaps between questions, and the quality of retrieved similar questions by semantic retrieval model might not be high enough to effectively solve one’s doubts. In order to address these challenges, a high-quality and multi-level semantic analysis-based similar question retrieval framework named HQML-QR is proposed, which consists of semantic representation from tag-level and sentence-level semantics for question retrieval (TS-QR) and multi-dimensional quality analysis (MDQQ). Firstly, TS-QR extracts multi-level semantic features of the question contents, where graph embedding model is utilized to learn coarse-grained semantics of questions from the scope of the tag. Meanwhile, in order to effectively identify polysemy and extract fine-grained sentence semantic of questions, TS-QR integrates the pre-trained language model based on self-attention mechanism to ensure the accuracy of question retrieval. Secondly, based on the quality factors in CQA (i.e., popularity, question, answer and user), MDQQ constructs a multi-dimensional quality evaluation model to provide a reasonable quality measurement standard for questions. Under the guidance of the quality of questions, the similarity score obtained by semantic vector matching is updated to retrieve high-quality and semantically similar questions. Finally, experiments are executed on CQADupStack dataset from Stack Overflow and the experimental results show that the P@N of HQML-QR has an average increase of 5.65%, 4.44% and 4.34% compared with LDA-VSM-SEM, WET-QR, RCM-QR, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

Data are openly available in a public repository. The Program dataset that supports the findings of this study is openly available at https://archive.org/details/stackexchange. The Stack overflow dataset that supports the findings of this study is openly available at http://nlp.cis.unimelb.edu.au/resources/cqadupstack/.

References

  1. Qu M, Qiu G, He X, Zhang C, Wu H, Bu J, Chen C (2009) Probabilistic question recommendation for question answering communities. In: Proceedings of the 18th International Conference on World Wide Web, pp 1229–1230

  2. Jeon J, Croft WB, Lee JH (2005) Finding similar questions in large question and answer archives. In: Proceedings of the 2005 ACM CIKM international conference on information and knowledge management, pp 84–90

  3. Zhao J, Guan Z, Sun H (2019) Riker: Mining rich keyword representations for interpretable product question answering. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1389–1398

  4. Chen Z, Zhang C, Zhao Z, Yao C, Cai D (2018) Question retrieval for community-based question answering via heterogeneous social influential network. Neurocomputing 285:117–124

    Article  Google Scholar 

  5. Othman N, Faiz R, Smaïli K (2020) Improving the community question retrieval performance using attention-based siamese LSTM. In: Natural Language Processing and Information Systems—25th International Conference on Applications of Natural Language to Information Systems, vol 12089, pp 252–263. Springer, New York

  6. Liu Y, Tang A, Sun Z, Tang W, Cai F, Wang C (2020) An integrated retrieval framework for similar questions: word-semantic embedded label clustering - LDA with question life cycle. Inf Sci 537:227–245

    Article  MathSciNet  Google Scholar 

  7. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st International conference on learning representations

  8. Zhang K, Wu W, Wu H, Li Z, Zhou M (2014) Question retrieval with high quality answers in community question answering. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 371–380

  9. Lee J, Kim S, Song Y, Rim H (2008) Bridging lexical gaps between queries and questions on large online q &a collections with compact translation models. In: 2008 conference on empirical methods in Natural Language Processing. ACL, pp 410–418

  10. Zhou G, Cai L, Zhao J, Liu K (2011) Phrase-based translation model for question retrieval in community question answer archives. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp 653–662

  11. Cai L, Zhou G, Liu K, Zhao J (2011) Learning the latent topics for question retrieval in community QA. In: Fifth international joint conference on Natural Language Processing, pp 273–281

  12. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. Journal of machine Learning research 3(Jan):993–1022

    Google Scholar 

  13. Liu M, Fang Y, Choulos AG, Park DH, Hu X (2017) Product review summarization through question retrieval and diversification. Inf. Retr. J. 20(6):575–605

    Article  Google Scholar 

  14. Zhou G, He T, Zhao J, Hu P Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pp 250–259

  15. Li B, Du X, Chen M (2020) Cross-language question retrieval with multi-layer representation and layer-wise adversary. Inf Sci 527:241–252

    Article  Google Scholar 

  16. Shen Y, Rong W, Sun Z, Ouyang Y, Xiong Z (2015) Question/answer matching for CQA system via combining lexical and sequential information. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 275–281

  17. Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the international conference on Web Search and Web Data Mining, pp 183–194

  18. Bian J, Liu Y, Agichtein E, Zha H (2008) Finding the right facts in the crowd: factoid question answering over social media. In: Proceedings of the 17th international conference on World Wide Web, pp 467–476

  19. Sakai T, Ishikawa D, Kando N, Seki Y, Kuriyama K, Lin C (2011) Using graded-relevance metrics for evaluating community QA answer selection. In: Proceedings of the forth international conference on Web Search and Web Data Mining, pp 187–196

  20. Shah C, Pomerantz J (2010) Evaluating and predicting answer quality in community QA. In: Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, pp 411–418

  21. Ghasemi N, Fatourechi R, Momtazi S (2021) User embedding for expert finding in community question answering. ACM Trans Knowl Discov Data 15(4):70–17016

    Article  Google Scholar 

  22. Liu Y, Tang W, Liu Z, Ding L, Tang A (2022) High-quality domain expert finding method in CQA based on multi-granularity semantic analysis and interest drift. Inf Sci 596:395–413

    Article  Google Scholar 

  23. Li B, Jin T, Lyu MR, King I, Mak B (2012) Analyzing and predicting question quality in community question answering services. In: Proceedings of the 21st World Wide Web conference, pp 775–782

  24. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 701–710

  25. Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics

  26. Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4171–4186

  27. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450

  28. Hoogeveen D, Wang L, Baldwin T, Verspoor KM (2018) Web forum retrieval and text analytics: a survey. Found Trends Inf Retr 12(1):1–163

    Article  Google Scholar 

  29. Li Z, Jiang J, Sun Y, Wang W (2019) Personalized question routing via heterogeneous network embedding. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019, the thirty-first innovative applications of artificial intelligence conference, pp 192–199

  30. Ravi S, Pang B, Rastogi V, Kumar R (2014) Great question! question quality in community q &a. In: Adar E, Resnick P, Choudhury MD, Hogan B, Oh A (eds) Proceedings of the eighth international conference on Weblogs and Social Media

  31. Calinski Harabasz (1974) A dendrite method for cluster analysis. Commun Stat-Theory Methods 3:1–27

    Article  MathSciNet  Google Scholar 

  32. Xiong D, Wang J, Lin H (2012) An lda-based approach to finding similar questions for community question answer. J Chin Inform Process 26(5):40–45

    Google Scholar 

  33. Othman N, Faiz R, Smaïli K (2018) Using word embeddings to retrieve semantically similar questions in community question answering. J Int Sci Gen Appl 1(1)

  34. Lei T, Joshi H, Barzilay R, Jaakkola TS, Tymoshenko K, Moschitti A, Màrquez L (2016) Semi-supervised question retrieval with gated convolutions. In: NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1279–1289

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 52073169 and 92270124). We appreciate the support of the High Performance Computing Center of Shanghai University, and Shanghai Engineering Research Center of Intelligent Computing System.

Author information

Authors and Affiliations

Authors

Contributions

Yue Liu was involved in the conceptualization, methodology, validation, formal analysis, writing—original draft, writing—review and editing, and supervision. Weize Tang contributed to the methodology, software, validation, data curation, writing—original draft, and writing—review and editing. Zitu Liu contributed to the software, validation, writing—original draft, writing—review and editing. Aihua Tang assisted in the methodology, formal analysis, and writing—original draft. Lipeng Zhang performed formal analysis and writing—review and editing.

Corresponding author

Correspondence to Yue Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical and informed consent

The experimental dataset used in our research consists of publicly available datasets.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Tang, W., Liu, Z. et al. Similar question retrieval with incorporation of multi-dimensional quality analysis for community question answering. Neural Comput & Applic 36, 3663–3679 (2024). https://doi.org/10.1007/s00521-023-09266-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09266-6

Keywords

Navigation