Assessing the quality of answers autonomously in community question–answering

Le, Long T.; Shah, Chirag; Choi, Erik

doi:10.1007/s00799-019-00272-5

Assessing the quality of answers autonomously in community question–answering

Published: 05 August 2019

Volume 20, pages 351–367, (2019)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

649 Accesses
8 Citations
2 Altmetric
Explore all metrics

Abstract

Community question–answering (CQA) has become a popular method of online information seeking. Within these services, peers ask questions and create answers to those questions. For some time, content repositories created through CQA sites have widely supported general-purpose tasks; however, they can also be used as online digital libraries that satisfy specific needs related to education. Horizontal CQA services, such as Yahoo! Answers, and vertical CQA services, such as Brainly, aim to help students improve their learning process via Q&A exchanges. In addition, Stack Overflow—another vertical CQA—serves a similar purpose but specifically focuses on topics relevant to programmers. Receiving high-quality answer(s) to a posed CQA query is a critical factor to both user satisfaction and supported learning in these services. This process can be impeded when experts do not answer questions and/or askers do not have the knowledge and skills needed to evaluate the quality of the answers they receive. Such circumstances may cause learners to construct a faulty knowledge base by applying inaccurate information acquired from online sources. Though site moderators could alleviate this problem by surveying answer quality, their subjective assessments may cause evaluations to be inconsistent. Another potential solution lies in human assessors, though they may also be insufficient due to the large amount of content available on a CQA site. The following study addresses these issues by proposing a framework for automatically assessing answer quality. We accomplish this by integrating different groups of features—personal, community-based, textual, and contextual—to build a classification model and determine what constitutes answer quality. We collected more than 10 million educational answers posted by more than 3 million users on Brainly and 7.7 million answers on Stack Overflow to test this evaluation framework. The experiments conducted on these data sets show that the model using random forest achieves high accuracy in identifying high-quality answers. Findings also indicate that personal and community-based features have more prediction power in assessing answer quality. Additionally, other key metrics such as F1-score and area under ROC curve achieve high values with our approach. The work reported here can be useful in many other contexts that strive to provide automatic quality assessment in a digital repository.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding and Ranking High-Quality Answers in Community Question Answering Sites

Article 07 November 2017

Classsourcing: Crowd-Based Validation of Question-Answer Learning Objects

Leaf: Multiple-Choice Question Generation

Notes

References

Adamic, L.A., Zhang, J., Bakshy, E., Ackerman, M.S.: Knowledge sharing and yahoo answers: everyone knows something. In: WWW, pp. 665–674 (2008)
Aritajati, C., Narayanan, N.H.: Facilitating students’ collaboration and learning in a question and answer system. In: CSCW Companion, pp. 101–106 (2013)
Berlingerio, M., Koutra, D., Eliassi-Rad, T., Faloutsos, C.: Network similarity via multiple social theories. In: ASONAM, pp. 1439–1440 (2013)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Berlin (2006)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Choi, E., Borkowski, M., Zakoian, J., Sagan, K., Scholla, K., Ponti, C., Labedz, M., Bielski, M.: Utilizing content moderators to investigate critical factors for assessing the quality of answers on brainly, social learning Q&A platform for students: a pilot study. In: ASIST, pp. 69:1–69:4 (2015)
Choi, E., Kitzie, V., Shah, C.: Developing a typology of online Q&A models and recommending the right model for each question type. In: ASIST, pp. 1–4 (2012)
Choi, E., Shah, C.: User motivation for asking a question in online Q&A services. J. Assoc. Inf. Sci. Technol. 67(5), 1182–1197 (2016)
Article Google Scholar
Cole, R.A.: Issues in Web-Based Pedagogy: A Critical Primer. Greenwood Press, Westport (2000)
Google Scholar
Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: Exploiting user feedback to learn to rank answers in Q&A forums: a case study with stack overflow. In: SIGIR, pp. 543–552 (2013)
Dalip, D.H., Lima, H., Gonçalves, M.A., Cristo, M., Calado, P.: Quality assessment of collaborative content with minimal information. In: JCDL, pp. 201–210 (2014)
Dror, G., Maarek, Y., Szpektor, I.: Will my question be answered? Predicting “question answerability” in community question-answering sites. ECML/PKDD 8190, 499–514 (2013)
Google Scholar
Ganu, G., Marian, A.: Personalizing forum search using multidimensional random walks. In: ICWSM, pp. 140–149 (2014)
Gazan, R.: Social Q&A. JASIST 63, 2301–2312 (2011)
Article Google Scholar
Gollapalli, D., Mitra, P., Giles, C.L.: Ranking experts using author-document-topic graphs. In: JCDL, pp. 87–96 (2013)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, Berlin (2009)
MATH Google Scholar
Kincaid, J.P., Fishburne, R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Air Station Memphis (1975)
Le, L.T., Eliassi-Rad, T., Tong, H.: MET: a fast algorithm for minimizing propagation in large graphs with small eigen-gaps. In: SDM, pp. 694–702 (2015)
Le, L.T., Shah, C.: Retrieving rising stars in focused community question-answering. In: ACIIDS, pp. 25–36 (2016)
Le, L.T., Shah, C., Choi, E.: Evaluating the quality of educational answers in community question-answering. In: JCDL, pp. 25–36 (2016)
Le, L.T., Shah, C., Choi, E.: Bad users or bad content? Breaking the vicious cycle by finding struggling students in community question-answering. In: CHIIR, pp. 165–174 (2017)
Levy, A.Y., Rajaraman, A., Ordille, J.J.: Querying heterogeneous information sources using source descriptions. In: VLDB, pp. 251–262 (1996)
Liu, Y., Bian, J., Agichtein, E.: Predicting information seeker satisfaction in community question answering. In: SIGIR, pp. 483–490 (2008)
Momeni, E., Tao, K., Haslhofer, B., Houben, G.-J.: Identification of useful user comments in social media: a case study on flickr commons. In: JCDL, pp. 1–10 (2013)
Noer, M.: One man, one computer, 10 million students: how khan academy is reinventing education. In: Forbes (2013)
Pelleg, D., Yom-Tov, E., Maarek, Y.: Can you believe an anonymous contributor? On truthfulness in yahoo! answers. In: SOCIALCOM-PASSAT, pp. 411–420 (2012)
Preece, J., Nonnecke, B., Andrews, D.: The top five reasons for lurking: improving community experiences for everyone. Comput. Hum. Behav. 20(2), 201–223 (2004)
Article Google Scholar
Ross, C., Nilsen, K., Dewdney, P.: Conducting the Reference Interview: A How-to-do-it Manual for Librarians. NealSchuman, New York (2002)
Google Scholar
Shah, C., Kitzie, V.: Social Q&A and virtual reference—comparing apples and oranges with the help of experts and users. JASIST 63, 2020–2036 (2012)
Article Google Scholar
Shah, C., Oh, S., Oh, J.S.: Research agenda for social Q&A. Libr. Inf. Sci. Res. 31(4), 205–209 (2009)
Article Google Scholar
Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR, pp. 411–418 (2010)
Shah, C., Radford, M., Connaway, L., Choi, E., Kitzie, V.: How much change do you get from 40\$? Analyzing and addressing failed questions on social Q&A. In: ASIST, pp. 1–10 (2012)
Srba, I., Bielikova, M.: Askalot: community question answering as a means for knowledge sharing in an educational organization. In: CSCW Companion, pp. 179–182 (2015)
Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers on large online QA collections. In: ACL, pp. 719–727 (2008)
Surowiecki, J.: The Wisdom of Crowds. Anchor, New York City (2005)
Google Scholar
Suryanto, M.A., Lim, E.P., Sun, A., Chiang, R.H.L.: Quality-aware collaborative question answering: methods and evaluation. In: WSDM, pp. 142–151 (2009)
Szymczak, V.: Github and Stackoverflow in Technical Recruitment. http://sourcingrecruitment.info/2015/05/github-and-stackoverflow-in-technical-recruitment/. Accessed 30 Dec 2016
Tan, C.H., Agichtein, E., Ipeirotis, P., Gabrilovich, E.: Trust, but verify: predicting contribution quality for knowledge base construction and curation. In: WSDM, pp. 553–562 (2014)
Tess, P.A.: The role of social media in higher education classes (real and virtual)—a literature review. Comput. Hum. Behav. 29, A60–A68 (2013)
Article Google Scholar
Wang, G., Gill, K., Mohanlal, M., Zheng, H., Zhao, B.Y.: Wisdom in the social crowd: an analysis of quora. In: WWW, pp. 1341–1352 (2013)
Yang, L., Bao, S., Lin, Q., Wu, X., Han, D., Su, Z., Yu, Y.: Analyzing and predicting not-answered questions in community-based question answering services. In: AAAI, pp. 1273–1278 (2011)
Yang, S.: Information seeking as problem-solving using a qualitative approach to uncover the novice learners’ information-seeking process in a perseus hypertext system. Libr. Inf. Sci. Res. 19(1), 71–92 (1997)
Article MathSciNet Google Scholar
Yao, Y., Tong, H., Xie, T., Akoglu, L., Xu, F., Lu, J.: Joint voting prediction for questions and answers in CQA. In: ASONAM, pp. 340–343 (2014)
Yao, Y., Tong, H., Xu, F., Lu, J.: Predicting long-term impact of CQA posts: a comprehensive viewpoint. In: SIGKDD, pp. 1496–1505 (2014)

Download references

Acknowledgements

The work reported in this paper is supported by the US Institute of Museum and Library Services (IMLS) Grant #LG-81-16-0025. A portion of the work reported here was possible due to funds and data access provided by Brainly. We are also grateful to Michal Labedz and Mateusz Burdzel from Brainly for their help and insights into the topics discussed in this work. We are also thankful to Stack Overflow for sharing their data.

Author information

Authors and Affiliations

Department of Computer Science, Rutgers University, Piscataway, NJ, USA
Long T. Le
School of Communication and Information, Rutgers University, New Brunswick, NJ, USA
Chirag Shah
Dropbox, New York, NY, USA
Erik Choi

Authors

Long T. Le
View author publications
You can also search for this author in PubMed Google Scholar
Chirag Shah
View author publications
You can also search for this author in PubMed Google Scholar
Erik Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long T. Le.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le, L.T., Shah, C. & Choi, E. Assessing the quality of answers autonomously in community question–answering. Int J Digit Libr 20, 351–367 (2019). https://doi.org/10.1007/s00799-019-00272-5

Download citation

Received: 09 January 2017
Revised: 28 June 2019
Accepted: 29 July 2019
Published: 05 August 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00799-019-00272-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing the quality of answers autonomously in community question–answering

Abstract

Access this article

Similar content being viewed by others

Finding and Ranking High-Quality Answers in Community Question Answering Sites

Classsourcing: Crowd-Based Validation of Question-Answer Learning Objects

Leaf: Multiple-Choice Question Generation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Assessing the quality of answers autonomously in community question–answering

Abstract

Access this article

Similar content being viewed by others

Finding and Ranking High-Quality Answers in Community Question Answering Sites

Classsourcing: Crowd-Based Validation of Question-Answer Learning Objects

Leaf: Multiple-Choice Question Generation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation