Abstract
This paper aims to aid the ongoing research efforts for combating the Infodemic related to COVID-19. We provide an automatically annotated, bilingual (Arabic/English) COVID-19 Twitter dataset (COVID-19-FAKES). This dataset has been continuously collected from February 04, 2020, to March 10, 2020. For annotating the collected dataset, we utilized the shared information on the official websites and the official Twitter accounts of the WHO, UNICEF, and UN as a source of reliable information, and the collected COVID-19 pre-checked facts from different fact-checking websites to build a ground-truth database. Then, the Tweets in the COVID-19-FAKES dataset are annotated using 13 different machine learning algorithms and employing 7 different feature extraction techniques. We are making our dataset publicly available to the research community (https://github.com/mohaddad/COVID-FAKES). This work will help researchers in understanding the dynamics behind the COVID-19 outbreak on Twitter. Furthermore, it could help in studies related to sentiment analysis, the analysis of the propagation of misleading information related to this outbreak, the analysis of users’ behavior during the crisis, the detection of botnets, the analysis of the performance of different classification algorithms with various feature extraction techniques that are used in text mining. It is worth noting that, in this paper, we use the terms of misleading information, misinformation, and fake news interchangeably.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kefalaki, M., Karanicolas, S.: Communication’s rough navigations: ‘fake’ news in a time of a global crisis. J. Appl. Learn. Teach. 3(1), 1–13 (2020)
Ziems, C., He, B., Soni, S., Kumar, S.: Racism is a virus: anti-asian hate and counterhate in social media during the COVID-19 crisis. arXiv preprint arXiv:2005.12423 (2020)
Fontanilla, M.V.: Cybercrime pandemic. Eubios J. Asian Int. Bioethics 30(4), 161–165 (2020)
Taylor, C.R.: Advertising and COVID-19. Int. J. Advertising 39(5), 587–589 (2020)
Ansari, B., Ganjoo, M.: Impact of Covid-19 on advertising: a perception study on the effects on print and broadcast media and consumer behavior. Purakala ISSN 0971-2143 UGC CARE J. 31(28), 52–62 (2020)
Richtel, M.: W.H.O. fights a pandemic besides coronavirus: an ‘infodemic’ (2020). https://www.nytimes.com/2020/02/06/health/coronavirus-misinformation-social-media.html?searchResultPosition=1. Accessed 21 Mar 2020
Binti Hamzah, F.A., Lau, C.H., Nazri, H., Ligot, D.V., et al.: CoronaTracker: worldwide COVID-19 outbreak data analysis and prediction. Bull. World Health Organ. 1, 32 (2020)
Zarocostas, J.: How to fight an infodemic. Lancet 395(10225), 676 (2020)
Cybenko, A.K., Cybenko, G.: AI and Fake News. IEEE Intell. Syst. 33(5), 1–5 (2018)
Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. arXiv preprint arXiv:1811.00770 (2018)
Elhadad, M.K., Li, K.F., Gebali, F.: A novel approach for selecting hybrid features from online news textual metadata for fake news detection. In: International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp. 914–925 (2019)
Elhadad, M.K., Li, K.F., Gebali, F.: Fake news detection on social media: a systematic survey. In: 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Victoria, B.C., Canada (2019)
Chen, E., Lerman, K., Ferrara, E.: Covid-19: the first public coronavirus twitter dataset. arXiv preprint arXiv:2003.07372 (2020)
Twitter Streaming API (2017). https://github.com/spatie/twitter-streaming-api. Accessed 21 Mar 2020
Lopez, C.E., Vasu, M., Gallemore, C.: Understanding the perception of COVID-19 policies by mining a multilanguage twitter dataset. arXiv preprint arXiv:2003.10359 (2020)
Singh, L., Bansal, S., Bode, L., Budak, C., et al.: A first look at COVID-19 information and misinformation sharing on Twitter. arXiv preprint arXiv:2003.13907 (2020)
Sharma, K., Seo, S., Meng, C., Rambhatla, S., et al.: COVID-19 on social media: analyzing misinformation in Twitter conversations. arXiv preprint arXiv:2003.12309 (2020)
Alqurashi, S., Alhindi, A., Alanazi, E.: Large Arabic Twitter dataset on COVID-19. arXiv preprint arXiv:2004.04315 (2020)
Hydrator: Turn Tweet IDs Onto Twitter JSON & CSV From Your Desktop (2019). https://github.com/DocNow/hydrator. Accessed 21 Mar 2020
TWARC: A Command Line Tool (and Python Library) for Archiving Twitter JSON (2019). https://github.com/DocNow/twarc. Accessed 21 Mar 2020
Haouari, F., Hasanain, M., Suwaileh, R., Elsayed, T.: ArCOV-19: the first Arabic COVID-19 Twitter dataset with propagation networks. arXiv preprint arXiv:2004.05861 (2020)
Zarei, K., Farahbakhsh, R., Crespi, N., Tyson, G.: A first Instagram dataset on COVID-19. arXiv preprint arXiv:2004.12226 (2020)
Instagram: Official API Graph Instagram (2020). https://developers.facebook.com/docs/instagram-api. Accessed 21 Mar 2020
Cui, L., Lee, D.: CoAID: COVID-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885 (2020)
Mohamed Sikandar, G.: 100 social media statistics for 2019. Statusbrew Blog (2019). https://blog.statusbrew.com/social-media-statistics-2018-for-business/. Accessed 18 Nov 2019
Krikorian, R.: Introducing Twitter Data Grants. Twitter (2014). https://blog.twitter.com/engineering/en_us/a/2014/introducing-twitter-data-grants.html. Accessed 18 Nov 2019
Gligorić, K., Anderson, A., West, R.: How constraints affect content: the case of Twitter’s switch from 140 to 280 characters. In: Proceedings of the Twelfth International AAAI Confernce on Web and Social Media (2018)
Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., et al.: Detection and resolution of rumours in social media: a survey. ACM Comput. Surv. (CSUR) 52(2), 32 (2018)
Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools, and platforms. AI Soc. 30(1), 89–116 (2015)
De Maio, C., Fenza, G., Loia, V., Orciuoli, F.: Unfolding social content evolution along with time and semantics. Future Gener. Comput. Syst. 66, 146–159 (2017)
Sahoo, K., Samal, A.K., Pramanik, J., Pani, S.K.: Exploratory data analysis using Python. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 8(12), 4727–4735 (2019)
Kulkarni, A., Shivananda, A.: Exploring and processing text data. In: Natural Language Processing Recipes, pp. 37–65 (2019)
Plotly Python Open Source Graphing Library (2020). https://plot.ly/python/. Accessed 21 Mar 2020
Bokeh Visualization Library (2019). https://docs.bokeh.org/en/latest/. Accessed 21 Mar 2020
TextBlob: Simplified Text Processing (2020). https://textblob.readthedocs.io/en/dev/. Accessed 21 Mar 2020
TextBlob-ar: Arabic Support for Textblob (2020). https://github.com/adhaamehab/textblob-ar. Accessed 21 Mar 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Elhadad, M.K., Li, K.F., Gebali, F. (2021). COVID-19-FAKES: A Twitter (Arabic/English) Dataset for Detecting Misleading Information on COVID-19. In: Barolli, L., Li, K., Miwa, H. (eds) Advances in Intelligent Networking and Collaborative Systems. INCoS 2020. Advances in Intelligent Systems and Computing, vol 1263. Springer, Cham. https://doi.org/10.1007/978-3-030-57796-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-57796-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57795-7
Online ISBN: 978-3-030-57796-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)