Skip to main content
Log in

Finding and Ranking High-Quality Answers in Community Question Answering Sites

  • Original Research
  • Published:
Global Journal of Flexible Systems Management Aims and scope Submit manuscript

Abstract

Community Question Answering (CQA) sites have become a very popular place to ask questions and give answers to a large community of users on the Internet. Stack Exchange is one of the popular CQA sites where a large amount of contents are posted every day in the form of questions, answers and comments. The answers on Stack Exchange are listed by their recent occurrences, time of posting or votes obtained by peer users under three tabs called active, oldest and votes, respectively. Votes tab is the default setting on the site and is also preferred tab of users because answers under this tab are voted as good answers by other users. The problem of voting-based sorting is that new answers which are yet to receive any vote are placed at the bottom in vote tab. The new answer may be of sufficiently high-quality to be placed at the top but no or fewer votes (later posting) have made them stay at the bottom. We introduce a new tab called promising answers tab where answers are listed based on their usefulness, which is calculated by our proposed system using the classification and regression models. Several textual features of answers and users reputation are used as features to predict the usefulness of the answers. The results are validated with good values of precision, recall, F1-score, area under the receiver operating characteristic curve (AUC) and root mean squared error. We also compare the top ten answers predicted by our system to the actual top ten answers based on votes and found that they are in high agreement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

(Source: www.stackexchange.com)

Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Activeness of a post is defined by the number of times it has been modified.

References

  • Alalwan, A., Rana, N. P., & Dwivedi, Y. K., Algharabat, R. (2017). Social media in marketing: A review and analysis of the existing literature, telematics and informatics, Available at http://www.sciencedirect.com/science/article/pii/S0736585317301077.

  • Aswani, R., Kar, A. K., Ilavarasan, P. V., & Dwivedi, Y. K. (2018). Search engine marketing is not all gold: insights from Twitter and SEOClerks. International Journal of Information Management, 38(1), 107–116.

    Article  Google Scholar 

  • Atkinson, J., Figueroa, A., & Andrade, C. (2013). Evolutionary optimization for ranking how-to questions based on user-generated contents. Expert Systems with Applications, 40(17), 7060–7068.

    Article  Google Scholar 

  • Bian, J., Liu, Y., Zhou, D., Agitating, E., & Zha, H. (2009). Learning to recognize reliable users and content in social media with coupled mutual reinforcement. In Proceedings of the 18th international conference on world wide web (pp. 51–60).

  • Blooma, M. J., Hoe-Lian Goh, D., & Yeow-Kuan Chua, A. (2012). Predictors of high-quality answers. Online Information Review, 36(3), 383–400.

    Article  Google Scholar 

  • Burel, G., He, Y., & Alani, H. (2012). Automatic identification of best answers in online enquiry communities. The Semantic Web: Research and Applications, 7295, 514–529. (ESWC 2012. Lecture Notes in Computer Science).

    Google Scholar 

  • Calefato, F., Lanubile, F., & Novielli, N. (2016). Moving to stack overflow: Best-answer prediction in legacy developer forums. In Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement. Article 13 (pp. 1–10). ACM.

  • Chall, J. S., & Dale, E. (1995). Manual for use of the new Dale–Chall readability formula. Brookline: Brookline Books.

    Google Scholar 

  • Chen, B. C., Dasgupta, A., Wang, X., & Yang, J. (2012). Vote calibration in community question-answering systems. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (pp. 781–790). ACM.

  • Craswell, N. (2009). Mean reciprocal rank. In L. Liu, M. T. Özsu (Eds.), Encyclopedia of database systems. (pp. 1703–1703). Springer US.

  • Davis, J., & Goadrich, M. (2006). The relationship between Precision–Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (pp. 233–240).

  • Dong, H., Wang, J., Lin, H., Xu, B., & Yang, Z. (2015). Predicting best answerers for new questions: an approach leveraging distributed representations of words in community question answering. In 2015 ninth international conference on frontier of computer science and technology (FCST) (pp. 13–18). IEEE.

  • Dwivedi, Y. K., Kapoor, K. K., & Chen, H. (2015a). Social media marketing and advertising. The Marketing Review, 15(3), 289–309.

    Article  Google Scholar 

  • Dwivedi, Y. K., Rana, N. P., & Alryalat, M. (2017a). Affiliate marketing: An overview and analysis of emerging literature. The Marketing Review, 17(1), 33–50.

    Article  Google Scholar 

  • Dwivedi, Y. K., Rana, N. P., Janssen, M., Lal, B., Williams, M. D., & Clement, M. (2017b). An empirical validation of a unified model of electronic government adoption (UMEGA). Government Information Quarterly, 34(2), 211–230.

    Article  Google Scholar 

  • Dwivedi, Y. K., Rana, N. P., Jeyaraj, A., Clement, M., & Williams, M. D. (2017c). Re-examining the unified theory of acceptance and use of technology (UTAUT): Towards a revised theoretical model. Information Systems Frontiers. https://doi.org/10.1007/s10796-017-9774-y.

    Google Scholar 

  • Dwivedi, Y. K., Shareef, M. A., Simintiras, A. C., Lal, B., & Weerakkody, V. (2016). A generalised adoption model for services: A cross-country comparison of mobile health (m-health). Government Information Quarterly, 33(1), 174–187.

    Article  Google Scholar 

  • Dwivedi, Y. K., Wastell, D., Laumer, S., Henriksen, H. Z., Myers, M. D., Bunker, D., et al. (2015b). Research on information systems failures and successes: Status update and future directions. Information Systems Frontiers, 17(1), 143–157.

    Article  Google Scholar 

  • Figueroa, A., & Neumann, G. (2014). Category-specific models for ranking effective paraphrases in community question answering. Expert Systems with Applications, 41(10), 4730–4742.

    Article  Google Scholar 

  • Hughes, D. L., Dwivedi, Y. K., & Rana, N. P. (2017). Mapping IS failure factors on PRINCE2® stages: An application of Interpretive Ranking Process (IRP). Production Planning & Control, 28(9), 776–790.

    Article  Google Scholar 

  • Hughes, D. L., Dwivedi, Y. K., Rana, N. P., & Simintiras, A. C. (2016). Information systems project failure–analysis of causal links using interpretive structural modelling. Production Planning & Control, 27(16), 1313–1333.

    Article  Google Scholar 

  • Hussain, W., Hussain, O. K., Hussain, F. K., & Khan, M. Q. (2017). Usability evaluation of english, local and plain languages to enhance on-screen text readability: A use case of Pakistan. Global Journal of Flexible Systems Management, 18(1), 33–49.

    Article  Google Scholar 

  • Ismagilova, E., Dwivedi, Y. K., Slade, E. L., & Williams, M. D. (2017). Electronic word of mouth (eWOM) in the marketing context: A state of the art analysis and future directions. Berlin: Springer.

    Book  Google Scholar 

  • John, B. M., Chua, A. Y. K., & Goh, D. H. L. (2011). What makes a high-quality user-generated answer? IEEE Internet Computing, 15(1), 66–71.

    Article  Google Scholar 

  • Kapoor, K. K., & Dwivedi, Y. K. (2015). Metamorphosis of Indian electoral campaigns: Modi’s social media experiment. International Journal of Indian Culture and Business Management, 11(4), 496–516.

    Article  Google Scholar 

  • Kapoor, K. K., Dwivedi, Y. K., & Piercy, N. (2016). Pay-per-click advertising: A review of literature. The Marketing Review, 16(2), 183–202.

    Article  Google Scholar 

  • Kincaid, J. P., Fishburne, R. P., Jr., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (No. RBR-8-75). Millington: Naval Technical Training Command Millington TN Research Branch.

    Book  Google Scholar 

  • Lin, J., & Demner-Fushman, D. (2006). Methods for automatically evaluating answers to complex questions. Information Retrieval, 9(5), 565–587.

    Article  Google Scholar 

  • Liu, Q., Agichtein, E., Dror, G., Gabrilovich, E., Maarek, Y., Pelleg, D., & Szpektor, I. (2011). Predicting web searcher satisfaction with existing community-based answers. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. (pp. 415–424). ACM.

  • Liu, B., Feng, J., Liu, M., Hu, H., & Wang, X. (2015). Predicting the quality of user-generated answers using co-training in community-based question answering portals. Pattern Recognition Letters, 58, 29–34.

    Article  Google Scholar 

  • Liu, L., & Ozsu, M. T. (Eds.). (2009). Mean average precision (p. 1703). Boston, MA: Springer.

    Google Scholar 

  • Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.

    Article  Google Scholar 

  • Molino, P., Aiello, L. M., & Lops, P. (2016). Social question answering: Textual, user, and network features for best answer prediction. ACM Transactions on Information Systems (TOIS), 35(1), 4:1–4:40.

    Article  Google Scholar 

  • Palanisamy, R., & Foshay, N. (2013). Impact of user’s internal flexibility and participation on usage and information systems flexibility. Global Journal of Flexible Systems Management, 14(4), 195–209.

    Article  Google Scholar 

  • Plume, C. J., Dwivedi, Y. K., & Slade, E. L. (2016). Social media in the marketing context: A state of the art analysis and future directions (1st ed.). Oxford: Chandos Publishing Ltd.

    Google Scholar 

  • Rana, N. P., Dwivedi, Y. K., Lal, B., Williams, M. D., & Clement, M. (2017). Citizens’ adoption of an electronic government system: Towards a unified view. Information Systems Frontiers, 19(3), 549–568.

    Article  Google Scholar 

  • Rana, N. P., Dwivedi, Y. K., Williams, M. D., & Weerakkody, V. (2016). Adoption of online public grievance redressal system in India: Toward developing a unified view. Computers in Human Behavior, 59, 265–282.

    Article  Google Scholar 

  • Rathore, A. K., Ilavarasan, P. V., & Dwivedi, Y. K. (2016). Social media content and product co-creation: An emerging paradigm. Journal of Enterprise Information Management, 29(1), 7–18.

    Article  Google Scholar 

  • Sahu, T. P., Nagwani, N. K., & Verma, S. (2016). Selecting best answer: An empirical analysis on community question answering sites. IEEE Access, 4, 4797–4808.

    Article  Google Scholar 

  • Sakai, T., Ishikawa, D., Kando, N., Seki, Y., Kuriyama, K., & Lin, C. Y. (2011). Using graded-relevance metrics for evaluating community QA answer selection. In Proceedings of the fourth ACM international conference on web search and data mining. (pp. 187–196). ACM.

  • Shah, C., & Pomerantz, J. (2010). Evaluating and predicting answer quality in community QA. In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. (pp. 411–418). ACM.

  • Singh, J. P., Dwivedi, Y. K., Rana, N. P., Kumar, A., & Kapoor, K. K. (2017a). Event classification and location prediction from tweets during disasters. Annals of Operations Research. https://doi.org/10.1007/s10479-017-2522-3.

  • Singh, J. P., Irani, S., Rana, N. P., Dwivedi, Y. K., Saumya, S., & Roy, P. K. (2017b). Predicting the “helpfulness” of online consumer reviews. Journal of Business Research, 70, 346–355.

    Article  Google Scholar 

  • Soricut, R., & Brill, E. (2006). Automatic question answering using the web: Beyond the factoid. Information Retrieval, 9(2), 191–206.

    Article  Google Scholar 

  • The stack exchange dataset. (2017). Retrived from https://archive.org/details/stackexchange/. Accessed on March 13, 2017.

  • Yao, Y., Tong, H., Xie, T., Akoglu, L., Xu, F., & Lu, J. (2015). Detecting high-quality posts in community question answering sites. Information Sciences, 302, 70–82.

    Article  Google Scholar 

  • Yen, S. J., Wu, Y. C., Yang, J. C., Lee, Y. S., Lee, C. J., & Liu, J. J. (2013). A support vector machine-based context-ranking model for question answering. Information Sciences, 224, 77–87.

    Article  Google Scholar 

  • Zhang, Z., & Li, Q. (2011). QuestionHolic: Hot topic discovery and trend analysis in community question answering systems. Expert Systems with Applications, 38(6), 6848–6855.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yogesh K. Dwivedi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roy, P.K., Ahmad, Z., Singh, J.P. et al. Finding and Ranking High-Quality Answers in Community Question Answering Sites. Glob J Flex Syst Manag 19, 53–68 (2018). https://doi.org/10.1007/s40171-017-0172-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40171-017-0172-6

Keywords

Navigation