Machine Learning Reveals Adaptive COVID-19 Narratives in Online Anti-Vaccination Network

Sear, Richard; Leahy, Rhys; Restrepo, Nicholas Johnson; Johnson, Neil

doi:10.1007/978-3-030-96188-6_12

Richard Sear³,
Rhys Leahy³,
Nicholas Johnson Restrepo⁴ &
…
Neil Johnson³

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Included in the following conference series:

Conference of the Computational Social Science Society of the Americas

227 Accesses
1 Citations
22 Altmetric

The original version of this chapter was revised: The author “Yonatan Lupu” were included erroneously as co-author, which has now been corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-96188-6_14

Abstract

The COVID-19 pandemic sparked an online “infodemic” of potentially dangerous misinformation. We use machine learning to quantify COVID-19 content from opponents of establishment health guidance, in particular vaccination. We quantify this content in two different ways: number of topics and evolution of keywords. We find that, even in the early stages of the pandemic, the anti-vaccination community had the infrastructure to more effectively garner support than their pro-vaccination counterparts by exhibiting a broader array of discussion topics. This provided an advantage in terms of attracting new users seeking COVID-19 guidance online. We also find that our machine learning framework can pick up on the adaptive nature of discussions within the anti-vaccination community, tracking distrust of authorities, opposition to lockdown orders, and an interest in early vaccine trials. Our approach is scalable and hence tackles the urgent problem facing social media platforms of having to analyze huge volumes of online health misinformation. With vaccine booster shots being approved and vaccination rates stagnating, such an automated approach is key in understanding how to combat the misinformation that slows the eradication of the pandemic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

19 February 2023
In the original version of the chapter, the author “Yonatan Lupu” were included erroneously as co-author, which has now been corrected. The chapter and book have been updated with the changes.

Notes

1.
https://radimrehurek.com/gensim/.

References

Community standards enforcement report, second quarter 2021. About Facebook, 18 August 2021. https://about.fb.com/news/2021/08/community-standards-enforcement-report-q2-2021/. Accessed 15 Sep 2021
Sear, R.F., et al.: Quantifying COVID-19 content in the online health opinion war using machine learning. IEEE Access 8, 91886–91893 (2020). https://doi.org/10.1109/ACCESS.2020.2993967
Article Google Scholar
Larson, H.J.: Blocking information on COVID-19 can fuel the spread of misinformation. Nature 580(7803), 306–306 (2020). https://doi.org/10.1038/d41586-020-00920-w
Article ADS Google Scholar
Kata, A.: A postmodern Pandora’s box: anti-vaccination misinformation on the internet. Vaccine 28(7), 1709–1716 (2010). https://doi.org/10.1016/j.vaccine.2009.12.022
Article Google Scholar
Coronavirus: scientists brand 5G claims ‘complete rubbish,’ BBC News, 15 April 2020. https://www.bbc.com/news/52168096. Accessed 03 Sep 2021
Mythbusters. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/advice-for-public/myth-busters. Accessed 03 Sep 2021
A man thought aquarium cleaner with the same name as the anti-viral drug chloroquine would prevent coronavirus. It killed him. Washington Post. http://www.washingtonpost.com/nation/2020/03/24/coronavirus-chloroquine-poisoning-death/. Accessed 16 Sep 2021
Frenkel, S., Alba, D., Zhong, R.: Surge of virus misinformation stumps Facebook and Twitter. The New York Times. 08 Mar 2020. https://www.nytimes.com/2020/03/08/technology/coronavirus-misinformation-social-media.html. Accessed 03 Sep 2021
Iyengar, R.: The coronavirus is stretching Facebook to its limits CNN. https://www.cnn.com/2020/03/18/tech/zuckerberg-facebook-coronavirus-response/index.html. Accessed 03 Sep 2021
Broniatowski, D.A., et al.: Weaponized health communication: twitter bots and Russian trolls amplify the vaccine debate. Am. J. Public Health 108(10), 1378–1384 (2018). https://doi.org/10.2105/AJPH.2018.304567
Article Google Scholar
Lama, Y., Chen, T., Dredze, M., Jamison, A., Quinn, S.C., Broniatowski, D.A.: Discordance between human papillomavirus Twitter images and disparities in human papillomavirus risk and disease in the United States: mixed-methods analysis. J. Med. Internet Res. 20(9), e10244 (2018). https://doi.org/10.2196/10244
Article Google Scholar
Ammari, T., Schoenebeck, S.: Thanks for your interest in our Facebook group, but it’s only for dads’: social roles of stay-at-home dads. In: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work and Social Computing, New York, pp. 1363–1375, February 2016. https://doi.org/10.1145/2818048.2819927
Johnson, N.F., et al.: Hidden resilience and adaptive dynamics of the global online hate ecology. Nature 573(7773), 261–265 (2019). https://doi.org/10.1038/s41586-019-1494-7
Article ADS Google Scholar
Johnson, N.F., et al.: New online ecology of adversarial aggregates: ISIS and beyond. Science, June 2016. https://www.science.org/doi/abs/10.1126/science.aaf0675. Accessed 03 Sep 2021
Facebook. https://www.facebook.com/policies_center/pages_groups_events. Accessed 03 Sep 2021
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation, 30 (2003)
Google Scholar
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning - ICML 2006, Pittsburgh, Pennsylvania, pp. 113–120 (2006). https://doi.org/10.1145/1143844.1143859
Syed, S., Spruit, M.: Full-text or abstract? examining topic coherence scores using latent dirichlet allocation. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 165–174, October 2017. https://doi.org/10.1109/DSAA.2017.61
CDC newsroom, CDC, 01 January 2016. https://www.cdc.gov/media/releases/2020/s0229-COVID-19-first-death.html. Accessed 03 Sep 2021
Johnson, N.F., et al.: The online competition between pro-and anti-vaccination views. Nature 582(7811), 230–233 (2020). https://doi.org/10.1038/s41586-020-2281-1
Article ADS Google Scholar

Download references

Acknowledgement

CrowdTangle data are made available through The George Washington University. We are grateful for funding for this research from the U.S. Air Force Office of Scientific Research under award numbers FA9550-20-1-0382 and FA9550-20-1-0383.

Author information

Authors and Affiliations

The George Washington University, Washington, DC, 20052, USA
Richard Sear, Rhys Leahy & Neil Johnson
ClustrX, LLC, Washington, USA
Nicholas Johnson Restrepo

Authors

Richard Sear
View author publications
You can also search for this author in PubMed Google Scholar
Rhys Leahy
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Johnson Restrepo
View author publications
You can also search for this author in PubMed Google Scholar
Neil Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard Sear .

Editor information

Editors and Affiliations

Technology and Operations Management, California State Polytechnic University, Pomona, CA, USA
Zining Yang
Assistant Professor of Computer Science, Elon University, Elon, NC, USA
Elizabeth von Briesen

Appendix

As mentioned in the main text, the methodology starts with a seed of manually identified Facebook Pages discussing either vaccines, public policies about vaccination, or the pro-vs-anti vaccination debate. Then their connections to other fan pages are indexed. At each step, new findings are vetted through a combination of human coding and computer assisted filters. This snowball process is continued, noting that new links can often lead back to members already in the list and hence some form of closure can in principle be achieved. This process leads to a set containing many hundreds of pages for both the anti-vaccination and pro-vaccination communities. Before training the LDA models, several steps are employed to clean the content of these pages in a similar way to other LDA analyses in the literature:

1.
Mentions of URL shorteners are removed, such as “bit.ly”. These are fragments output by Facebook’s CrowdTangle API.
2.
Many of the posts link to external websites. The fact that these specific websites were mentioned could itself be an interesting component of the COVID-19 conversation. Hence instead of removing them completely, the pieces “.gov”, “.com”, and “.org” were replaced with “__gov”, “__com”, and “__org”, respectively. This operation effectively concatenates domains into a form that will not be filtered out by the later preprocessing steps.
3.
The posts are then run through Gensim’s simple_preprocess function, which tokenizes the post on spaces and removes tokens that are only 1 or 2 characters long. This step also removes numeric and punctuation characters.
4.
Tokens that are in Gensim’s list of stopwords, are removed. For example, “the” is not a good indication of a topic.
5.
Tokens are lemmatized using the WordNetLemmatizer from the Natural Language Toolkit NLTK, which converts all words to singular form and/or present tense.
6.
Tokens are stemmed using the SnowballStemmer from NLTK, which removes affixes on words.
7.
Any remaining fragments of URLs (other than domain) that are left over after stemming, such as “http” and “www”, are removed.

Steps 5 and 6 help ensure that words are compared fairly during the training process, and that if a particular word is a strong indicator of a topic, its signal is not lost just because it is used in many different forms. These steps rely on words existing in NLTK’s pretrained vocabulary. Any word not in this vocabulary is left unchanged. After this preprocessing, we then train the LDA models on the cleaned data. We refer to [2] for a complete discussion of the standard LDA models employed. 8 dynamic LDA models were trained with their “number of topics” parameter ranging from 3–10 (inclusive) and each time frame consisting of the data gathered from the anti-vaccination groups in 1-week periods. While the amount of data available in each time frame is not uniform, we believe there is sufficient data in each time frame for the model to make useful inferences.

The code used to run our experiments is available and documented here: https://github.com/gwdonlab/topic-modeling. It is meant as a framework that can be used to run similar experiments on any text dataset.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sear, R., Leahy, R., Restrepo, N.J., Johnson, N. (2022). Machine Learning Reveals Adaptive COVID-19 Narratives in Online Anti-Vaccination Network. In: Yang, Z., von Briesen, E. (eds) Proceedings of the 2021 Conference of The Computational Social Science Society of the Americas. CSSSA 2021. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-030-96188-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-96188-6_12
Published: 29 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96187-9
Online ISBN: 978-3-030-96188-6
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics

Machine Learning Reveals Adaptive COVID-19 Narratives in Online Anti-Vaccination Network

Abstract

Access this chapter

Change history

19 February 2023

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation