The Journal of Supercomputing

, Volume 72, Issue 11, pp 4276–4295 | Cite as

A novel framework for social web forums’ thread ranking based on semantics and post quality features

Article

Abstract

Online discussion forums are a valuable source of knowledge. Users may share or exchange ideas by posting content in the form of questions and answers. With the increasing volume of online content in the form of forums, finding relevant information in forums can be a challenging task and knowledge management and quality assurance of this content are of critical importance. Although online discussion forums offer search services, in most cases only keyword search is provided. In keyword search techniques, such as cosine similarity, lexical overlap between query and document terms is considered; however, these techniques do not consider the context or meaning of the terms, thus failed to retrieve the relevant documents. Earlier content-based research efforts for improving the performance of thread retrieval were primarily based on cosine similarity technique. Cosine similarity technique assigns term-weights based on term-frequency and inverse-document frequency; however, this technique does not consider discussion semantics which may lead to less effective document retrieval. To address these issues, we have proposed two thread ranking techniques for online discussion forums: (1) threads are ranked on the basis of a semantic similarity score between posts and (2) threads are ranked based on their participants’ reputation and posts’ quality. The proposed work provides a performance comparison between semantic similarity techniques and cosine similarity techniques along with reputation and post quality features in thread ranking process. Experimental results obtained using a real online forum dataset demonstrate that the proposed techniques have significantly improved thread ranking performance.

Keywords

Thread ranking Knowledge sharing Semantic similarity Link analysis Online forums 

References

  1. 1.
    Adamic LA, Zhang J, Bakshy E, Ackerman MS (2008) Knowledge sharing and yahoo answers: everyone knows something. In: Proceedings of the 17th International Conference on World Wide Web, (2008), pp 665–674Google Scholar
  2. 2.
    Wan X (2007) A novel document similarity measure based on earth mover’s distance. Inf Sci 177:3718–3730CrossRefGoogle Scholar
  3. 3.
    Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, (2008), pp 183–194Google Scholar
  4. 4.
    Li B, Jin T, Lyu MR, King I, Mak B (2012) Analyzing and predicting question quality in community question answering services. In: Proceedings of the 21st International Conference Companion on World Wide Web, (2012), pp 775–782Google Scholar
  5. 5.
    Li C, Yin J, Zhao J (2014) Using improved ICA method for hyperspectral data classification. Arab J Sci Eng 39:181–189CrossRefGoogle Scholar
  6. 6.
    Cong G, Wang L, Lin CY, Song Y-I, Sun Y (2008) Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 467–474Google Scholar
  7. 7.
    Singh A, Raghu D (2012) Retrieving similar discussion forum threads: a structure based approach. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2012), pp 135–144Google Scholar
  8. 8.
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24:513–523CrossRefGoogle Scholar
  9. 9.
    Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp 775–780Google Scholar
  10. 10.
    Vallet D, Cantador I, Jose JM (2010) Personalizing web search with folksonomy-based user and document profiles. In: Advances in information retrieval, ed: Springer, pp 420–431Google Scholar
  11. 11.
    Varelas G, Voutsakis E, Raftopoulou P, Petrakis EG, Milios EE (2005) Semantic similarity methods in wordNet and their application to information retrieval on the web. In: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, (2005), pp 10–16Google Scholar
  12. 12.
    Mohler M, Mihalcea R (2009) Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp 567–575Google Scholar
  13. 13.
    Liu G, Wang R, Buckley J, Zhou HM (2011) A WordNet-based semantic similarity measure enhanced by internet-based knowledge. In: SEKE, (2011), pp 175–178Google Scholar
  14. 14.
    Kannan V, Srinivasan G. Yet another way of ranking web documents based on semantic similarityGoogle Scholar
  15. 15.
    Bhatia S, Mitra P (2010) Adopting inference networks for online thread retrieval. In: AAAI, pp 1300–1305Google Scholar
  16. 16.
    Elsas JL, Carbonell JG (2009) It pays to be picky: an evaluation of thread retrieval in online forums. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 714–715Google Scholar
  17. 17.
    Jain AK, Dubes RC (1988) Algorithms for clustering data, vol 6. Prentice Hall, Englewood CliffsMATHGoogle Scholar
  18. 18.
    Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341CrossRefGoogle Scholar
  19. 19.
    Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. WordNet Electron Lex Database 49:265–283Google Scholar
  20. 20.
    Meng L, Huang R, Gu J (2013) A review of semantic similarity measures in wordnet. Int J Hybrid Inf Technol 6:1–12Google Scholar
  21. 21.
    Hliaoutakis A, Varelas G, Voutsakis E, Petrakis EG, Milios E (2006) Information retrieval by semantic similarity. Int J Semantic Web Inf Syst 2:55–73CrossRefGoogle Scholar
  22. 22.
    Pasca M, Harabagiu S (2001) The informative role of WordNet in open-domain question answering. In: Proceedings of NAACL-01 Workshop on WordNet and Other Lexical Resources, pp 138–143Google Scholar
  23. 23.
    Mohler M, Bunescu R, Mihalcea R (2011) Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol 1, pp 752–762Google Scholar
  24. 24.
    Corley C, Mihalcea R (2005) Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp 13–18Google Scholar
  25. 25.
    Tari L, Tu PH, Lumpkin B, Leaman R, Gonzalez G, Baral C (2007) Passage relevancy through semantic relatedness. In: TRECGoogle Scholar
  26. 26.
    Chahal P, Singh M, Kumar S (2013) Ranking of web documents using semantic similarity. In: International Conference on Information Systems and Computer Networks (ISCON), pp 145–150Google Scholar
  27. 27.
    kralja Aleksandra B. The role of semantic similarity for intelligent question routingGoogle Scholar
  28. 28.
    Seo J, Croft WB, Smith DA (2011) Online community search using conversational structures. Inf Retr 14:547–571CrossRefGoogle Scholar
  29. 29.
    Faisal ChMS, Daud A, Usman A (2017) Expert ranking using reputation and answer quality of co-existing users. Int Arab J Inf Technol 14(2)Google Scholar
  30. 30.
    Cho JH, Sondhi P, Zhai C, Schatz BR (2014) Resolving healthcare forum posts via similar thread retrieval. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp 33–42Google Scholar
  31. 31.
    Jeon J, Croft WB, Lee JH, Park S (2006) A framework to predict the quality of answers with non-textual features. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (2006), pp 228–235Google Scholar
  32. 32.
    Lee J-T, Yang M-C, Rim H-C (2014) Discovering high-quality threaded discussions in online forums. J Comput Sci Technol 29:519–531CrossRefGoogle Scholar
  33. 33.
    Wang GA, Wang HJ, Li J, Fan W (2014) Mining knowledge sharing processes in online discussion forums. In: System Sciences (HICSS), 2014 47th Hawaii International Conference on, 2014, pp 3898–3907Google Scholar
  34. 34.
    Gottipati S, Lo D, Jiang J (2011) Finding relevant answers in software forums. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, (2011), pp 323–332Google Scholar
  35. 35.
    Wang H, Wang C, Zhai C, Han J (2011) Learning online discussion structures by conditional random fields.In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2011), pp 435–444Google Scholar
  36. 36.
    Zhu T, Wang B, Wu B, Zhu C (2012) Topic correlation and individual influence analysis in online forums. Expert Syst Appl 39:4222–4232CrossRefGoogle Scholar
  37. 37.
    Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernández L (2014) Syntactic n-grams as machine learning features for natural language processing. Expert Syst Appl 41:853–860CrossRefGoogle Scholar
  38. 38.
    Kim SN, Wang L, Baldwin T (2010) Tagging and linking web forum posts. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, (2010), pp 192–202Google Scholar
  39. 39.
    Albaham AT, Salim N, Adekunle OI (2014) Leveraging post level quality indicators in online forum thread retrieval. In: Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), (2014), pp 417–425Google Scholar
  40. 40.
    Deepak P, Visweswariah K. Unsupervised solution post identification from discussion forumsGoogle Scholar
  41. 41.
    Yang L, Qiu M, Gottipati S, Zhu F, Jiang J, Sun H, Chen Z (2013) Cqarank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, (2013), pp 99–108Google Scholar
  42. 42.
    Hong L, Davison BD (2009) A classification-based approach to question answering in discussion boards.In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2009), pp 171–178Google Scholar
  43. 43.
    John BM, Chua AY-K, Goh DH-L (2011) What makes a high-quality user-generated answer? Internet Comput IEEE 15:66–71CrossRefGoogle Scholar
  44. 44.
    Toba H, Ming Z-Y, Adriani M, Chua T-S (2014) Discovering high quality answers in community question answering archives using a hierarchy of classifiers. Inf Sci 261:101–115MathSciNetCrossRefGoogle Scholar
  45. 45.
    Li Y-M, Liao T-F, Lai C-Y (2012) A social recommender mechanism for improving knowledge sharing in online forums. Inf Process Manag 48:978–994CrossRefGoogle Scholar
  46. 46.
    Wang XJ, Tu X, Feng D, Zhang L (2009) Ranking community answers by modeling question-answer relationships via analogical reasoning. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2009), pp 179–186Google Scholar
  47. 47.
    Ren Z, Ma J, Wang S, Liu Y (2011) Summarizing web forum threads based on a latent topic propagation process. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, (2011), pp 879–884Google Scholar
  48. 48.
    Sack W (2003) Conversation map: a content-based Usenet newsgroup browser. In: From Usenet to CoWebs, ed: Springer, 2003, pp 92–109Google Scholar
  49. 49.
    Shi L, Sun B, Kong L, Zhang Y (2009) Web forum Sentiment analysis based on topics. In: Computer and Information Technology, 2009. CIT’09. Ninth IEEE International Conference on 2009:148–153Google Scholar
  50. 50.
    Kardan AA, Ebrahimi M (2013) A novel approach to hybrid recommendation systems based on association rules mining for content recommendation in asynchronous discussion groups. Inf Sci 219:93–110CrossRefGoogle Scholar
  51. 51.
    Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr 3:235–244CrossRefGoogle Scholar
  52. 52.
    Xu Z, Chen M, Weinberger K, Sha F (2012) An alternative text representation to TF-IDF and Bag-of-Words. In: Proceedings of 21st ACM Conference of Information and Knowledge Management (CIKM), (2012)Google Scholar
  53. 53.
    Grozin VA, Gusarova NF, Dobrenko NV (2015) Feature selection for language independent text forum summarization. In: Knowledge engineering and semantic Web, ed: Springer, 2015, pp 63–71Google Scholar
  54. 54.
    Montague M, Aslam JA (2001) Relevance score normalization for metasearch. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, (2001), pp 427–433Google Scholar
  55. 55.
    Gopalan N, Batri K (2007) Adaptive selection of top-m retrieval strategies for data fusion in information retrieval. Int J Soft Comput 2:11–16Google Scholar
  56. 56.
    Fox EA, Shaw JA (1994) Combination of multiple searches. NIST Special Publication SP, pp 243–243Google Scholar
  57. 57.
    Biyani P, Bhatia S, Caragea C, Mitra P (2012) Thread specific features are helpful for identifying subjectivity orientation of online forum threads, in COLING, (2012), pp 295–310Google Scholar
  58. 58.
    Bhatia S, Biyani P, Mitra P (2012) Classifying user messages for managing web forum dataGoogle Scholar
  59. 59.
    Kardan AA, Omidvar A, Behzadi M (2012) Context based expert finding in online communities using social network analysis. Int J Comput Sci Res Appl 2:79–88Google Scholar
  60. 60.
    Shah C, Pomerantz J (2010) Evaluating and predicting answer quality in community QA. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2010), pp 411–418Google Scholar
  61. 61.
    Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. Ann Arbor MI 48113:161–175Google Scholar
  62. 62.
    Kumar N, Srinathan K (2008) Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceedings of the Eighth ACM Symposium on Document Engineering, (2008), pp 199–208Google Scholar
  63. 63.
    Shah U, Finin T, Joshi A, Cost RS, Matfield J (2002) Information retrieval on the semantic web, in Proceedings of the Eleventh International Conference on Information and Knowledge Management, (2002), pp 461–468Google Scholar
  64. 64.
    Wang X, McCallum A, Wei X (2007) Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Data mining, (2007). ICDM 2007. Seventh IEEE International Conference on 2007:697–702Google Scholar
  65. 65.
    Baldwin T, Martinez D, Penman RB (2007) Automatic thread classification for Linux user forum information access. In: Proceedings of the Twelfth Australasian Document Computing Symposium (ADCS 2007), 2007, pp 72–9Google Scholar
  66. 66.
    Duan H, Zhai C (2011) Exploiting thread structures to improve smoothing of language models for forum post retrieval. In: Advances in information retrieval, ed: Springer, (2011), pp 350–361Google Scholar
  67. 67.
    Lapata M (2006) Automatic evaluation of information ordering: Kendall’s tau. Comput Linguistics 32:471–484CrossRefMATHGoogle Scholar
  68. 68.
    Rijsbergen CJV (1979) Information retrieval. Butterworth-Heinemann, NewtonMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceCOMSATS Institute of ITAttockPakistan
  2. 2.Department of Computer Science and Software EngineeringInternational Islamic UniversityIslamabadPakistan
  3. 3.Faculty of Computing and Information TechnologyKing Abdul Aziz UniversityJeddahSaudi Arabia
  4. 4.Department of Media SoftwareSungkyul UniversityAnyangKorea

Personalised recommendations