Finding and Ranking High-Quality Answers in Community Question Answering Sites

  • Pradeep Kumar Roy
  • Zishan Ahmad
  • Jyoti Prakash Singh
  • Mohammad Abdallah Ali Alryalat
  • Nripendra P. Rana
  • Yogesh K. Dwivedi
Original Research
  • 100 Downloads

Abstract

Community Question Answering (CQA) sites have become a very popular place to ask questions and give answers to a large community of users on the Internet. Stack Exchange is one of the popular CQA sites where a large amount of contents are posted every day in the form of questions, answers and comments. The answers on Stack Exchange are listed by their recent occurrences, time of posting or votes obtained by peer users under three tabs called active, oldest and votes, respectively. Votes tab is the default setting on the site and is also preferred tab of users because answers under this tab are voted as good answers by other users. The problem of voting-based sorting is that new answers which are yet to receive any vote are placed at the bottom in vote tab. The new answer may be of sufficiently high-quality to be placed at the top but no or fewer votes (later posting) have made them stay at the bottom. We introduce a new tab called promising answers tab where answers are listed based on their usefulness, which is calculated by our proposed system using the classification and regression models. Several textual features of answers and users reputation are used as features to predict the usefulness of the answers. The results are validated with good values of precision, recall, F1-score, area under the receiver operating characteristic curve (AUC) and root mean squared error. We also compare the top ten answers predicted by our system to the actual top ten answers based on votes and found that they are in high agreement.

Keywords

Answer ranking Classification Community Question Answering Data imbalance Regression 

References

  1. Alalwan, A., Rana, N. P., & Dwivedi, Y. K., Algharabat, R. (2017). Social media in marketing: A review and analysis of the existing literature, telematics and informatics, Available at http://www.sciencedirect.com/science/article/pii/S0736585317301077.
  2. Aswani, R., Kar, A. K., Ilavarasan, P. V., & Dwivedi, Y. K. (2018). Search engine marketing is not all gold: insights from Twitter and SEOClerks. International Journal of Information Management, 38(1), 107–116.CrossRefGoogle Scholar
  3. Atkinson, J., Figueroa, A., & Andrade, C. (2013). Evolutionary optimization for ranking how-to questions based on user-generated contents. Expert Systems with Applications, 40(17), 7060–7068.CrossRefGoogle Scholar
  4. Bian, J., Liu, Y., Zhou, D., Agitating, E., & Zha, H. (2009). Learning to recognize reliable users and content in social media with coupled mutual reinforcement. In Proceedings of the 18th international conference on world wide web (pp. 51–60).Google Scholar
  5. Blooma, M. J., Hoe-Lian Goh, D., & Yeow-Kuan Chua, A. (2012). Predictors of high-quality answers. Online Information Review, 36(3), 383–400.CrossRefGoogle Scholar
  6. Burel, G., He, Y., & Alani, H. (2012). Automatic identification of best answers in online enquiry communities. The Semantic Web: Research and Applications, 7295, 514–529. (ESWC 2012. Lecture Notes in Computer Science).Google Scholar
  7. Calefato, F., Lanubile, F., & Novielli, N. (2016). Moving to stack overflow: Best-answer prediction in legacy developer forums. In Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement. Article 13 (pp. 1–10). ACM.Google Scholar
  8. Chall, J. S., & Dale, E. (1995). Manual for use of the new Dale–Chall readability formula. Brookline: Brookline Books.Google Scholar
  9. Chen, B. C., Dasgupta, A., Wang, X., & Yang, J. (2012). Vote calibration in community question-answering systems. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (pp. 781–790). ACM.Google Scholar
  10. Craswell, N. (2009). Mean reciprocal rank. In L. Liu, M. T. Özsu (Eds.), Encyclopedia of database systems. (pp. 1703–1703). Springer US.Google Scholar
  11. Davis, J., & Goadrich, M. (2006). The relationship between Precision–Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (pp. 233–240).Google Scholar
  12. Dong, H., Wang, J., Lin, H., Xu, B., & Yang, Z. (2015). Predicting best answerers for new questions: an approach leveraging distributed representations of words in community question answering. In 2015 ninth international conference on frontier of computer science and technology (FCST) (pp. 13–18). IEEE.Google Scholar
  13. Dwivedi, Y. K., Kapoor, K. K., & Chen, H. (2015a). Social media marketing and advertising. The Marketing Review, 15(3), 289–309.CrossRefGoogle Scholar
  14. Dwivedi, Y. K., Rana, N. P., & Alryalat, M. (2017a). Affiliate marketing: An overview and analysis of emerging literature. The Marketing Review, 17(1), 33–50.CrossRefGoogle Scholar
  15. Dwivedi, Y. K., Rana, N. P., Janssen, M., Lal, B., Williams, M. D., & Clement, M. (2017b). An empirical validation of a unified model of electronic government adoption (UMEGA). Government Information Quarterly, 34(2), 211–230.CrossRefGoogle Scholar
  16. Dwivedi, Y. K., Rana, N. P., Jeyaraj, A., Clement, M., & Williams, M. D. (2017c). Re-examining the unified theory of acceptance and use of technology (UTAUT): Towards a revised theoretical model. Information Systems Frontiers.  https://doi.org/10.1007/s10796-017-9774-y.Google Scholar
  17. Dwivedi, Y. K., Shareef, M. A., Simintiras, A. C., Lal, B., & Weerakkody, V. (2016). A generalised adoption model for services: A cross-country comparison of mobile health (m-health). Government Information Quarterly, 33(1), 174–187.CrossRefGoogle Scholar
  18. Dwivedi, Y. K., Wastell, D., Laumer, S., Henriksen, H. Z., Myers, M. D., Bunker, D., et al. (2015b). Research on information systems failures and successes: Status update and future directions. Information Systems Frontiers, 17(1), 143–157.CrossRefGoogle Scholar
  19. Figueroa, A., & Neumann, G. (2014). Category-specific models for ranking effective paraphrases in community question answering. Expert Systems with Applications, 41(10), 4730–4742.CrossRefGoogle Scholar
  20. Hughes, D. L., Dwivedi, Y. K., & Rana, N. P. (2017). Mapping IS failure factors on PRINCE2® stages: An application of Interpretive Ranking Process (IRP). Production Planning & Control, 28(9), 776–790.CrossRefGoogle Scholar
  21. Hughes, D. L., Dwivedi, Y. K., Rana, N. P., & Simintiras, A. C. (2016). Information systems project failure–analysis of causal links using interpretive structural modelling. Production Planning & Control, 27(16), 1313–1333.CrossRefGoogle Scholar
  22. Hussain, W., Hussain, O. K., Hussain, F. K., & Khan, M. Q. (2017). Usability evaluation of english, local and plain languages to enhance on-screen text readability: A use case of Pakistan. Global Journal of Flexible Systems Management, 18(1), 33–49.CrossRefGoogle Scholar
  23. Ismagilova, E., Dwivedi, Y. K., Slade, E. L., & Williams, M. D. (2017). Electronic word of mouth (eWOM) in the marketing context: A state of the art analysis and future directions. Berlin: Springer.CrossRefGoogle Scholar
  24. John, B. M., Chua, A. Y. K., & Goh, D. H. L. (2011). What makes a high-quality user-generated answer? IEEE Internet Computing, 15(1), 66–71.CrossRefGoogle Scholar
  25. Kapoor, K. K., & Dwivedi, Y. K. (2015). Metamorphosis of Indian electoral campaigns: Modi’s social media experiment. International Journal of Indian Culture and Business Management, 11(4), 496–516.CrossRefGoogle Scholar
  26. Kapoor, K. K., Dwivedi, Y. K., & Piercy, N. (2016). Pay-per-click advertising: A review of literature. The Marketing Review, 16(2), 183–202.CrossRefGoogle Scholar
  27. Kincaid, J. P., Fishburne, R. P., Jr., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (No. RBR-8-75). Millington: Naval Technical Training Command Millington TN Research Branch.CrossRefGoogle Scholar
  28. Lin, J., & Demner-Fushman, D. (2006). Methods for automatically evaluating answers to complex questions. Information Retrieval, 9(5), 565–587.CrossRefGoogle Scholar
  29. Liu, Q., Agichtein, E., Dror, G., Gabrilovich, E., Maarek, Y., Pelleg, D., & Szpektor, I. (2011). Predicting web searcher satisfaction with existing community-based answers. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. (pp. 415–424). ACM.Google Scholar
  30. Liu, B., Feng, J., Liu, M., Hu, H., & Wang, X. (2015). Predicting the quality of user-generated answers using co-training in community-based question answering portals. Pattern Recognition Letters, 58, 29–34.CrossRefGoogle Scholar
  31. Liu, L., & Ozsu, M. T. (Eds.). (2009). Mean average precision (p. 1703). Boston, MA: Springer.Google Scholar
  32. Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.CrossRefGoogle Scholar
  33. Molino, P., Aiello, L. M., & Lops, P. (2016). Social question answering: Textual, user, and network features for best answer prediction. ACM Transactions on Information Systems (TOIS), 35(1), 4:1–4:40.CrossRefGoogle Scholar
  34. Palanisamy, R., & Foshay, N. (2013). Impact of user’s internal flexibility and participation on usage and information systems flexibility. Global Journal of Flexible Systems Management, 14(4), 195–209.CrossRefGoogle Scholar
  35. Plume, C. J., Dwivedi, Y. K., & Slade, E. L. (2016). Social media in the marketing context: A state of the art analysis and future directions (1st ed.). Oxford: Chandos Publishing Ltd.Google Scholar
  36. Rana, N. P., Dwivedi, Y. K., Lal, B., Williams, M. D., & Clement, M. (2017). Citizens’ adoption of an electronic government system: Towards a unified view. Information Systems Frontiers, 19(3), 549–568.CrossRefGoogle Scholar
  37. Rana, N. P., Dwivedi, Y. K., Williams, M. D., & Weerakkody, V. (2016). Adoption of online public grievance redressal system in India: Toward developing a unified view. Computers in Human Behavior, 59, 265–282.CrossRefGoogle Scholar
  38. Rathore, A. K., Ilavarasan, P. V., & Dwivedi, Y. K. (2016). Social media content and product co-creation: An emerging paradigm. Journal of Enterprise Information Management, 29(1), 7–18.CrossRefGoogle Scholar
  39. Sahu, T. P., Nagwani, N. K., & Verma, S. (2016). Selecting best answer: An empirical analysis on community question answering sites. IEEE Access, 4, 4797–4808.CrossRefGoogle Scholar
  40. Sakai, T., Ishikawa, D., Kando, N., Seki, Y., Kuriyama, K., & Lin, C. Y. (2011). Using graded-relevance metrics for evaluating community QA answer selection. In Proceedings of the fourth ACM international conference on web search and data mining. (pp. 187–196). ACM.Google Scholar
  41. Shah, C., & Pomerantz, J. (2010). Evaluating and predicting answer quality in community QA. In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. (pp. 411–418). ACM.Google Scholar
  42. Singh, J. P., Dwivedi, Y. K., Rana, N. P., Kumar, A., & Kapoor, K. K. (2017a). Event classification and location prediction from tweets during disasters. Annals of Operations Research.  https://doi.org/10.1007/s10479-017-2522-3.
  43. Singh, J. P., Irani, S., Rana, N. P., Dwivedi, Y. K., Saumya, S., & Roy, P. K. (2017b). Predicting the “helpfulness” of online consumer reviews. Journal of Business Research, 70, 346–355.CrossRefGoogle Scholar
  44. Soricut, R., & Brill, E. (2006). Automatic question answering using the web: Beyond the factoid. Information Retrieval, 9(2), 191–206.CrossRefGoogle Scholar
  45. The stack exchange dataset. (2017). Retrived from https://archive.org/details/stackexchange/. Accessed on March 13, 2017.
  46. Yao, Y., Tong, H., Xie, T., Akoglu, L., Xu, F., & Lu, J. (2015). Detecting high-quality posts in community question answering sites. Information Sciences, 302, 70–82.CrossRefGoogle Scholar
  47. Yen, S. J., Wu, Y. C., Yang, J. C., Lee, Y. S., Lee, C. J., & Liu, J. J. (2013). A support vector machine-based context-ranking model for question answering. Information Sciences, 224, 77–87.CrossRefGoogle Scholar
  48. Zhang, Z., & Li, Q. (2011). QuestionHolic: Hot topic discovery and trend analysis in community question answering systems. Expert Systems with Applications, 38(6), 6848–6855.CrossRefGoogle Scholar

Copyright information

© Global Institute of Flexible Systems Management 2017

Authors and Affiliations

  • Pradeep Kumar Roy
    • 1
  • Zishan Ahmad
    • 1
  • Jyoti Prakash Singh
    • 1
  • Mohammad Abdallah Ali Alryalat
    • 2
  • Nripendra P. Rana
    • 3
  • Yogesh K. Dwivedi
    • 3
  1. 1.National Institute of Technology PatnaPatnaIndia
  2. 2.Al-Balqa’ Applied UniversitySaltJordan
  3. 3.Emerging Markets Research Centre (EMaRC), School of ManagementSwansea University Bay CampusSwanseaUK

Personalised recommendations