Skip to main content

COVID-19-FAKES: A Twitter (Arabic/English) Dataset for Detecting Misleading Information on COVID-19

  • Conference paper
  • First Online:
Advances in Intelligent Networking and Collaborative Systems (INCoS 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1263))

Abstract

This paper aims to aid the ongoing research efforts for combating the Infodemic related to COVID-19. We provide an automatically annotated, bilingual (Arabic/English) COVID-19 Twitter dataset (COVID-19-FAKES). This dataset has been continuously collected from February 04, 2020, to March 10, 2020. For annotating the collected dataset, we utilized the shared information on the official websites and the official Twitter accounts of the WHO, UNICEF, and UN as a source of reliable information, and the collected COVID-19 pre-checked facts from different fact-checking websites to build a ground-truth database. Then, the Tweets in the COVID-19-FAKES dataset are annotated using 13 different machine learning algorithms and employing 7 different feature extraction techniques. We are making our dataset publicly available to the research community (https://github.com/mohaddad/COVID-FAKES). This work will help researchers in understanding the dynamics behind the COVID-19 outbreak on Twitter. Furthermore, it could help in studies related to sentiment analysis, the analysis of the propagation of misleading information related to this outbreak, the analysis of users’ behavior during the crisis, the detection of botnets, the analysis of the performance of different classification algorithms with various feature extraction techniques that are used in text mining. It is worth noting that, in this paper, we use the terms of misleading information, misinformation, and fake news interchangeably.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kefalaki, M., Karanicolas, S.: Communication’s rough navigations: ‘fake’ news in a time of a global crisis. J. Appl. Learn. Teach. 3(1), 1–13 (2020)

    Google Scholar 

  2. Ziems, C., He, B., Soni, S., Kumar, S.: Racism is a virus: anti-asian hate and counterhate in social media during the COVID-19 crisis. arXiv preprint arXiv:2005.12423 (2020)

  3. Fontanilla, M.V.: Cybercrime pandemic. Eubios J. Asian Int. Bioethics 30(4), 161–165 (2020)

    Google Scholar 

  4. Taylor, C.R.: Advertising and COVID-19. Int. J. Advertising 39(5), 587–589 (2020)

    Article  Google Scholar 

  5. Ansari, B., Ganjoo, M.: Impact of Covid-19 on advertising: a perception study on the effects on print and broadcast media and consumer behavior. Purakala ISSN 0971-2143 UGC CARE J. 31(28), 52–62 (2020)

    Google Scholar 

  6. Richtel, M.: W.H.O. fights a pandemic besides coronavirus: an ‘infodemic’ (2020). https://www.nytimes.com/2020/02/06/health/coronavirus-misinformation-social-media.html?searchResultPosition=1. Accessed 21 Mar 2020

  7. Binti Hamzah, F.A., Lau, C.H., Nazri, H., Ligot, D.V., et al.: CoronaTracker: worldwide COVID-19 outbreak data analysis and prediction. Bull. World Health Organ. 1, 32 (2020)

    Google Scholar 

  8. Zarocostas, J.: How to fight an infodemic. Lancet 395(10225), 676 (2020)

    Article  Google Scholar 

  9. Cybenko, A.K., Cybenko, G.: AI and Fake News. IEEE Intell. Syst. 33(5), 1–5 (2018)

    Article  Google Scholar 

  10. Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. arXiv preprint arXiv:1811.00770 (2018)

  11. Elhadad, M.K., Li, K.F., Gebali, F.: A novel approach for selecting hybrid features from online news textual metadata for fake news detection. In: International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp. 914–925 (2019)

    Google Scholar 

  12. Elhadad, M.K., Li, K.F., Gebali, F.: Fake news detection on social media: a systematic survey. In: 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Victoria, B.C., Canada (2019)

    Google Scholar 

  13. Chen, E., Lerman, K., Ferrara, E.: Covid-19: the first public coronavirus twitter dataset. arXiv preprint arXiv:2003.07372 (2020)

  14. Twitter Streaming API (2017). https://github.com/spatie/twitter-streaming-api. Accessed 21 Mar 2020

  15. Lopez, C.E., Vasu, M., Gallemore, C.: Understanding the perception of COVID-19 policies by mining a multilanguage twitter dataset. arXiv preprint arXiv:2003.10359 (2020)

  16. Singh, L., Bansal, S., Bode, L., Budak, C., et al.: A first look at COVID-19 information and misinformation sharing on Twitter. arXiv preprint arXiv:2003.13907 (2020)

  17. Sharma, K., Seo, S., Meng, C., Rambhatla, S., et al.: COVID-19 on social media: analyzing misinformation in Twitter conversations. arXiv preprint arXiv:2003.12309 (2020)

  18. Alqurashi, S., Alhindi, A., Alanazi, E.: Large Arabic Twitter dataset on COVID-19. arXiv preprint arXiv:2004.04315 (2020)

  19. Hydrator: Turn Tweet IDs Onto Twitter JSON & CSV From Your Desktop (2019). https://github.com/DocNow/hydrator. Accessed 21 Mar 2020

  20. TWARC: A Command Line Tool (and Python Library) for Archiving Twitter JSON (2019). https://github.com/DocNow/twarc. Accessed 21 Mar 2020

  21. Haouari, F., Hasanain, M., Suwaileh, R., Elsayed, T.: ArCOV-19: the first Arabic COVID-19 Twitter dataset with propagation networks. arXiv preprint arXiv:2004.05861 (2020)

  22. Zarei, K., Farahbakhsh, R., Crespi, N., Tyson, G.: A first Instagram dataset on COVID-19. arXiv preprint arXiv:2004.12226 (2020)

  23. Instagram: Official API Graph Instagram (2020). https://developers.facebook.com/docs/instagram-api. Accessed 21 Mar 2020

  24. Cui, L., Lee, D.: CoAID: COVID-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885 (2020)

  25. Mohamed Sikandar, G.: 100 social media statistics for 2019. Statusbrew Blog (2019). https://blog.statusbrew.com/social-media-statistics-2018-for-business/. Accessed 18 Nov 2019

  26. Krikorian, R.: Introducing Twitter Data Grants. Twitter (2014). https://blog.twitter.com/engineering/en_us/a/2014/introducing-twitter-data-grants.html. Accessed 18 Nov 2019

  27. Gligorić, K., Anderson, A., West, R.: How constraints affect content: the case of Twitter’s switch from 140 to 280 characters. In: Proceedings of the Twelfth International AAAI Confernce on Web and Social Media (2018)

    Google Scholar 

  28. Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., et al.: Detection and resolution of rumours in social media: a survey. ACM Comput. Surv. (CSUR) 52(2), 32 (2018)

    Google Scholar 

  29. Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools, and platforms. AI Soc. 30(1), 89–116 (2015)

    Article  Google Scholar 

  30. De Maio, C., Fenza, G., Loia, V., Orciuoli, F.: Unfolding social content evolution along with time and semantics. Future Gener. Comput. Syst. 66, 146–159 (2017)

    Article  Google Scholar 

  31. Sahoo, K., Samal, A.K., Pramanik, J., Pani, S.K.: Exploratory data analysis using Python. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 8(12), 4727–4735 (2019)

    Article  Google Scholar 

  32. Kulkarni, A., Shivananda, A.: Exploring and processing text data. In: Natural Language Processing Recipes, pp. 37–65 (2019)

    Google Scholar 

  33. Plotly Python Open Source Graphing Library (2020). https://plot.ly/python/. Accessed 21 Mar 2020

  34. Bokeh Visualization Library (2019). https://docs.bokeh.org/en/latest/. Accessed 21 Mar 2020

  35. TextBlob: Simplified Text Processing (2020). https://textblob.readthedocs.io/en/dev/. Accessed 21 Mar 2020

  36. TextBlob-ar: Arabic Support for Textblob (2020). https://github.com/adhaamehab/textblob-ar. Accessed 21 Mar 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kin Fun Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Elhadad, M.K., Li, K.F., Gebali, F. (2021). COVID-19-FAKES: A Twitter (Arabic/English) Dataset for Detecting Misleading Information on COVID-19. In: Barolli, L., Li, K., Miwa, H. (eds) Advances in Intelligent Networking and Collaborative Systems. INCoS 2020. Advances in Intelligent Systems and Computing, vol 1263. Springer, Cham. https://doi.org/10.1007/978-3-030-57796-4_25

Download citation

Publish with us

Policies and ethics