Skip to main content

Reliability Prediction for Health-Related Content: A Replicability Study

  • Conference paper
  • First Online:
Book cover Advances in Information Retrieval (ECIR 2021)

Abstract

Determining reliability of online data is a challenge that has recently received increasing attention. In particular, unreliable health-related content has become pervasive during the COVID-19 pandemic. Previous research [37] has approached this problem with standard classification technology using a set of features that have included linguistic and external variables, among others. In this work, we aim to replicate parts of the study conducted by Sondhi and his colleagues using our own code, and make it available for the research community (https://github.com/MarcosFP97/Health-Rel). The performance obtained in this study is as strong as the one reported by the original authors. Moreover, their conclusions are also confirmed by our replicability study. We report on the challenges involved in replication, including that it was impossible to replicate the computation of some features (since some tools or services originally used are now outdated or unavailable). Finally, we also report on a generalisation effort made to evaluate our predictive technology over new datasets [20, 35].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.hon.ch/cgi-bin/HONcode/principles.pl?English.

  2. 2.

    https://www.hon.ch/en/.

  3. 3.

    https://github.com/MarcosFP97/Health-Rel/blob/master/lexicon/privacy.txt.

  4. 4.

    https://github.com/MarcosFP97/Health-Rel/blob/master/lexicon/contact.txt.

  5. 5.

    https://www.crummy.com/software/BeautifulSoup/bs4/doc/.

  6. 6.

    https://github.com/MarcosFP97/Health-Rel/blob/master/lexicon/comm_list.txt.

  7. 7.

    http://elinks.or.cz.

  8. 8.

    https://www.nltk.org/nltk_data.

  9. 9.

    https://github.com/MarcosFP97/Health-Rel/blob/master/lexicon/stopwords.txt.

  10. 10.

    https://bitbucket.org/wcauchois/pysvmlight.

  11. 11.

    http://commoncrawl.org.

  12. 12.

    https://trec-health-misinfo.github.io.

  13. 13.

    https://github.com/MarcosFP97/Health-Rel.

References

  1. Abbasi, M.-A., Liu, H.: Measuring user credibility in social media. In: Greenberg, A.M., Kennedy, W.G., Bos, N.D. (eds.) SBP 2013. LNCS, vol. 7812, pp. 441–448. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37210-0_48

    Chapter  Google Scholar 

  2. Abualsaud, M., Smucker, M.D.: Exposure and order effects of misinformation on health search decisions. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Rome (2019)

    Google Scholar 

  3. Andersen, R., et al.: Robust pagerank and locally computable spam detection features. In: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, pp. 69–76 (2008)

    Google Scholar 

  4. Becchetti, L., Castillo, C., Donato, D., Baeza-Yates, R., Leonardi, S.: Link analysis for web spam detection. ACM Trans. Web (TWEB) 2(1), 1–42 (2008)

    Article  Google Scholar 

  5. Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link analysis ranking: algorithms, theory, and experiments. ACM Trans. Internet Technol. (TOIT) 5(1), 231–297 (2005)

    Article  Google Scholar 

  6. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  8. Do, C.B., Ng, A.Y.: Transfer learning for text classification. Adv. Neural Inf. Process. Syst. 18, 299–306 (2005)

    Google Scholar 

  9. Eysenbach, G.: Infodemiology: the epidemiology of (mis)information. Am. J. Med. 113(9), 763–765 (2002)

    Article  Google Scholar 

  10. Fogg, B.J.: Prominence-interpretation theory: explaining how people assess credibility online. In: CHI 2003 Extended Abstracts on Human Factors in Computing Systems, pp. 722–723 (2003)

    Google Scholar 

  11. Ginsca, A.L., Popescu, A., Lupu, M.: Credibility in information retrieval. Found. Trends Inf. Retr. 9(5), 355–475 (2015). https://doi.org/10.1561/1500000046

    Article  Google Scholar 

  12. Griffiths, K.M., Tang, T.T., Hawking, D., Christensen, H.: Automated assessment of the quality of depression websites. J. Med. Internet Res. 7(5), e59 (2005)

    Article  Google Scholar 

  13. Gupta, A., Kumaraguru, P., Castillo, C., Meier, P.: TweetCred: real-time credibility assessment of content on Twitter. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 228–243. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13734-6_16

    Chapter  Google Scholar 

  14. Hahnel, C., Goldhammer, F., Kröhne, U., Naumann, J.: The role of reading skills in the evaluation of online information gathered from search engine environments. Comput. Hum. Behav. 78, 223–234 (2018)

    Article  Google Scholar 

  15. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  16. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  17. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  18. Hoens, T.R., Chawla, N.V.: Imbalanced datasets: from sampling to classifiers. Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 43–59 (2013)

    Google Scholar 

  19. Islam, M.S., et al.: Covid-19-related infodemic and its impact on public health: a global social media analysis. Am. J. Trop. Med. Hyg. 103(4), 1621–1629 (2020)

    Article  Google Scholar 

  20. Jimmy, J., Zuccon, G., Palotti, J., Goeuriot, L., Kelly, L.: Overview of the CLEF 2018 consumer health search task. In: International Conference of the Cross-Language Evaluation Forum for European Languages (2018)

    Google Scholar 

  21. Kakol, M., Nielek, R., Wierzbicki, A.: Understanding and predicting web content credibility using the content credibility corpus. Inf. Process. Manag. 53(5), 1043–1061 (2017)

    Article  Google Scholar 

  22. Kattenbeck, M., Elsweiler, D.: Understanding credibility judgements for web search snippets. Aslib J. Inf. Manag. 71, 368–391 (2019)

    Article  Google Scholar 

  23. Liao, Q.V., Fu, W.T.: Age differences in credibility judgments of online health information. ACM Trans. Comput.-Hum. Interact. (TOCHI) 21(1), 1–23 (2014)

    Article  Google Scholar 

  24. Matsumoto, D., Hwang, H.C., Sandoval, V.A.: Cross-language applicability of linguistic features associated with veracity and deception. J. Police Crim. Psychol. 30(4), 229–241 (2015)

    Article  Google Scholar 

  25. Matthews, S.C., Camacho, A., Mills, P.J., Dimsdale, J.E.: The Internet for medical information about cancer: help or hindrance? Psychosomatics 44(2), 100–103 (2003)

    Article  Google Scholar 

  26. McKnight, D.H., Kacmar, C.J.: Factors and effects of information credibility. In: Proceedings of the Ninth International Conference on Electronic Commerce, pp. 423–432 (2007)

    Google Scholar 

  27. Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. (TOIS) 27(1), 1–27 (2008)

    Article  Google Scholar 

  28. Mukherjee, S., Weikum, G.: Leveraging joint interactions for credibility analysis in news communities. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 353–362 (2015)

    Google Scholar 

  29. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)

    Article  Google Scholar 

  30. Pennycook, G., McPhetres, J., Zhang, Y., Lu, J.G., Rand, D.G.: Fighting Covid-19 misinformation on social media: experimental evidence for a scalable accuracy-nudge intervention. Psychol. Sci. 31(7), 770–780 (2020)

    Article  Google Scholar 

  31. Pogacar, F.A., Ghenai, A., Smucker, M.D., Clarke, C.L.: The positive and negative influence of search results on people’s decisions about the efficacy of medical treatments. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, pp. 209–216 (2017)

    Google Scholar 

  32. Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: Credibility assessment of textual claims on the web. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2173–2178 (2016)

    Google Scholar 

  33. Reuters Institute, University of Oxford: Reuters Digital News Report 2020 (2020). https://www.digitalnewsreport.org/survey/2020. Accessed 16 Nov 2020

  34. Rieh, S.Y.: Judgment of information quality and cognitive authority in the web. J. Am. Soc. Inf. Sci. Technol. 53(2), 145–161 (2002)

    Article  Google Scholar 

  35. Schwarz, J., Morris, M.: Augmenting web pages and search results to support credibility assessment. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1245–1254 (2011)

    Google Scholar 

  36. Sharma, K., Qian, F., Jiang, H., Ruchansky, N., Zhang, M., Liu, Y.: Combating fake news: a survey on identification and mitigation techniques. ACM Trans. Intell. Syst. Technol. (TIST) 10(3), 1–42 (2019)

    Article  Google Scholar 

  37. Sondhi, P., Vydiswaran, V.G.V., Zhai, C.X.: Reliability prediction of webpages in the medical domain. In: Baeza-Yates, R., et al. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 219–231. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28997-2_19

    Chapter  Google Scholar 

  38. Viviani, M., Pasi, G.: Credibility in social media: opinions, news, and health information-a survey. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 7(5), e1209 (2017)

    Google Scholar 

  39. Vydiswaran, V.V., Zhai, C., Roth, D.: Content-driven trust propagation framework. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 974–982 (2011)

    Google Scholar 

  40. Yamamoto, Y., Tanaka, K.: Enhancing credibility judgment of web search results. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1235–1244 (2011)

    Google Scholar 

  41. Zha, W., Wu, H.D.: The impact of online disruptive ads on users’ comprehension, evaluation of site credibility, and sentiment of intrusiveness. Am. Commun. J. 16(2), 15–28 (2014)

    Google Scholar 

Download references

Acknowledgements

This work was funded by FEDER/Ministerio de Ciencia, Innovación y Universidades – Agencia Estatal de Investigación/Project (RTI2018-093336-B-C21). This work has received financial support from the Consellería de Educación, Universidade e Formación Profesional (accreditation 2019–2022 ED431G-2019/04, ED431C 2018/29, ED431C 2018/19) and the European Regional Development Fund (ERDF), which acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the Galician University System.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos Fernández-Pichel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fernández-Pichel, M., Losada, D.E., Pichel, J.C., Elsweiler, D. (2021). Reliability Prediction for Health-Related Content: A Replicability Study. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12657. Springer, Cham. https://doi.org/10.1007/978-3-030-72240-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72240-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72239-5

  • Online ISBN: 978-3-030-72240-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics