COVID-19-FAKES: A Twitter (Arabic/English) Dataset for Detecting Misleading Information on COVID-19

Elhadad, Mohamed K.; Li, Kin Fun; Gebali, Fayez

doi:10.1007/978-3-030-57796-4_25

Mohamed K. Elhadad¹⁷,
Kin Fun Li¹⁷ &
Fayez Gebali¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1263))

Included in the following conference series:

International Conference on Intelligent Networking and Collaborative Systems

1899 Accesses
33 Citations

Abstract

This paper aims to aid the ongoing research efforts for combating the Infodemic related to COVID-19. We provide an automatically annotated, bilingual (Arabic/English) COVID-19 Twitter dataset (COVID-19-FAKES). This dataset has been continuously collected from February 04, 2020, to March 10, 2020. For annotating the collected dataset, we utilized the shared information on the official websites and the official Twitter accounts of the WHO, UNICEF, and UN as a source of reliable information, and the collected COVID-19 pre-checked facts from different fact-checking websites to build a ground-truth database. Then, the Tweets in the COVID-19-FAKES dataset are annotated using 13 different machine learning algorithms and employing 7 different feature extraction techniques. We are making our dataset publicly available to the research community (https://github.com/mohaddad/COVID-FAKES). This work will help researchers in understanding the dynamics behind the COVID-19 outbreak on Twitter. Furthermore, it could help in studies related to sentiment analysis, the analysis of the propagation of misleading information related to this outbreak, the analysis of users’ behavior during the crisis, the detection of botnets, the analysis of the performance of different classification algorithms with various feature extraction techniques that are used in text mining. It is worth noting that, in this paper, we use the terms of misleading information, misinformation, and fake news interchangeably.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kefalaki, M., Karanicolas, S.: Communication’s rough navigations: ‘fake’ news in a time of a global crisis. J. Appl. Learn. Teach. 3(1), 1–13 (2020)
Google Scholar
Ziems, C., He, B., Soni, S., Kumar, S.: Racism is a virus: anti-asian hate and counterhate in social media during the COVID-19 crisis. arXiv preprint arXiv:2005.12423 (2020)
Fontanilla, M.V.: Cybercrime pandemic. Eubios J. Asian Int. Bioethics 30(4), 161–165 (2020)
Google Scholar
Taylor, C.R.: Advertising and COVID-19. Int. J. Advertising 39(5), 587–589 (2020)
Article Google Scholar
Ansari, B., Ganjoo, M.: Impact of Covid-19 on advertising: a perception study on the effects on print and broadcast media and consumer behavior. Purakala ISSN 0971-2143 UGC CARE J. 31(28), 52–62 (2020)
Google Scholar
Richtel, M.: W.H.O. fights a pandemic besides coronavirus: an ‘infodemic’ (2020). https://www.nytimes.com/2020/02/06/health/coronavirus-misinformation-social-media.html?searchResultPosition=1. Accessed 21 Mar 2020
Binti Hamzah, F.A., Lau, C.H., Nazri, H., Ligot, D.V., et al.: CoronaTracker: worldwide COVID-19 outbreak data analysis and prediction. Bull. World Health Organ. 1, 32 (2020)
Google Scholar
Zarocostas, J.: How to fight an infodemic. Lancet 395(10225), 676 (2020)
Article Google Scholar
Cybenko, A.K., Cybenko, G.: AI and Fake News. IEEE Intell. Syst. 33(5), 1–5 (2018)
Article Google Scholar
Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. arXiv preprint arXiv:1811.00770 (2018)
Elhadad, M.K., Li, K.F., Gebali, F.: A novel approach for selecting hybrid features from online news textual metadata for fake news detection. In: International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp. 914–925 (2019)
Google Scholar
Elhadad, M.K., Li, K.F., Gebali, F.: Fake news detection on social media: a systematic survey. In: 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Victoria, B.C., Canada (2019)
Google Scholar
Chen, E., Lerman, K., Ferrara, E.: Covid-19: the first public coronavirus twitter dataset. arXiv preprint arXiv:2003.07372 (2020)
Twitter Streaming API (2017). https://github.com/spatie/twitter-streaming-api. Accessed 21 Mar 2020
Lopez, C.E., Vasu, M., Gallemore, C.: Understanding the perception of COVID-19 policies by mining a multilanguage twitter dataset. arXiv preprint arXiv:2003.10359 (2020)
Singh, L., Bansal, S., Bode, L., Budak, C., et al.: A first look at COVID-19 information and misinformation sharing on Twitter. arXiv preprint arXiv:2003.13907 (2020)
Sharma, K., Seo, S., Meng, C., Rambhatla, S., et al.: COVID-19 on social media: analyzing misinformation in Twitter conversations. arXiv preprint arXiv:2003.12309 (2020)
Alqurashi, S., Alhindi, A., Alanazi, E.: Large Arabic Twitter dataset on COVID-19. arXiv preprint arXiv:2004.04315 (2020)
Hydrator: Turn Tweet IDs Onto Twitter JSON & CSV From Your Desktop (2019). https://github.com/DocNow/hydrator. Accessed 21 Mar 2020
TWARC: A Command Line Tool (and Python Library) for Archiving Twitter JSON (2019). https://github.com/DocNow/twarc. Accessed 21 Mar 2020
Haouari, F., Hasanain, M., Suwaileh, R., Elsayed, T.: ArCOV-19: the first Arabic COVID-19 Twitter dataset with propagation networks. arXiv preprint arXiv:2004.05861 (2020)
Zarei, K., Farahbakhsh, R., Crespi, N., Tyson, G.: A first Instagram dataset on COVID-19. arXiv preprint arXiv:2004.12226 (2020)
Instagram: Official API Graph Instagram (2020). https://developers.facebook.com/docs/instagram-api. Accessed 21 Mar 2020
Cui, L., Lee, D.: CoAID: COVID-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885 (2020)
Mohamed Sikandar, G.: 100 social media statistics for 2019. Statusbrew Blog (2019). https://blog.statusbrew.com/social-media-statistics-2018-for-business/. Accessed 18 Nov 2019
Krikorian, R.: Introducing Twitter Data Grants. Twitter (2014). https://blog.twitter.com/engineering/en_us/a/2014/introducing-twitter-data-grants.html. Accessed 18 Nov 2019
Gligorić, K., Anderson, A., West, R.: How constraints affect content: the case of Twitter’s switch from 140 to 280 characters. In: Proceedings of the Twelfth International AAAI Confernce on Web and Social Media (2018)
Google Scholar
Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., et al.: Detection and resolution of rumours in social media: a survey. ACM Comput. Surv. (CSUR) 52(2), 32 (2018)
Google Scholar
Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools, and platforms. AI Soc. 30(1), 89–116 (2015)
Article Google Scholar
De Maio, C., Fenza, G., Loia, V., Orciuoli, F.: Unfolding social content evolution along with time and semantics. Future Gener. Comput. Syst. 66, 146–159 (2017)
Article Google Scholar
Sahoo, K., Samal, A.K., Pramanik, J., Pani, S.K.: Exploratory data analysis using Python. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 8(12), 4727–4735 (2019)
Article Google Scholar
Kulkarni, A., Shivananda, A.: Exploring and processing text data. In: Natural Language Processing Recipes, pp. 37–65 (2019)
Google Scholar
Plotly Python Open Source Graphing Library (2020). https://plot.ly/python/. Accessed 21 Mar 2020
Bokeh Visualization Library (2019). https://docs.bokeh.org/en/latest/. Accessed 21 Mar 2020
TextBlob: Simplified Text Processing (2020). https://textblob.readthedocs.io/en/dev/. Accessed 21 Mar 2020
TextBlob-ar: Arabic Support for Textblob (2020). https://github.com/adhaamehab/textblob-ar. Accessed 21 Mar 2020

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada
Mohamed K. Elhadad, Kin Fun Li & Fayez Gebali

Authors

Mohamed K. Elhadad
View author publications
You can also search for this author in PubMed Google Scholar
Kin Fun Li
View author publications
You can also search for this author in PubMed Google Scholar
Fayez Gebali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kin Fun Li .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering Faculty of Information Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada
Kin Fun Li
School of Science and Technology, Kwansei Gakuin University, Sanda, Japan
Hiroyoshi Miwa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elhadad, M.K., Li, K.F., Gebali, F. (2021). COVID-19-FAKES: A Twitter (Arabic/English) Dataset for Detecting Misleading Information on COVID-19. In: Barolli, L., Li, K., Miwa, H. (eds) Advances in Intelligent Networking and Collaborative Systems. INCoS 2020. Advances in Intelligent Systems and Computing, vol 1263. Springer, Cham. https://doi.org/10.1007/978-3-030-57796-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-57796-4_25
Published: 21 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57795-7
Online ISBN: 978-3-030-57796-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics