Skip to main content

IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)

Abstract

Emerged in Wuhan city of China in December 2019, COVID-19 continues to spread rapidly across the world despite authorities having made available a number of vaccines. While the coronavirus has been around for a significant period of time, people and authorities still feel the need for awareness due to the mutating nature of the virus and therefore varying symptoms and prevention strategies. People and authorities resort to social media platforms the most to share awareness information and voice out their opinions due to their massive outreach in spreading the word in practically no time. People use a number of languages to communicate over social media platforms based on their familiarity, language outreach, and availability on social media platforms. The entire world has been hit by the coronavirus and India is the second worst-hit country in terms of the number of active coronavirus cases. India, being a multilingual country, offers a great opportunity to study the outreach of various languages that have been actively used across social media platforms. In this study, we aim to study the dataset related to COVID-19 collected in the period between February 2020 to July 2020 specifically for regional languages in India. This could be helpful for the Government of India, various state governments, NGOs, researchers, and policymakers in studying different issues related to the pandemic. We found that English has been the mode of communication in over 64% of tweets while as many as twelve regional languages in India account for approximately 4.77% of tweets .

D. Uniyal and A. Agarwal—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Broniatowski, D.A., Paul, M.J., Dredze, M.: National and local influenza surveillance through Twitter: an analysis of the 2012–2013 influenza epidemic. PLoS ONE 8(12), e83672 (2013)

    Article  Google Scholar 

  2. Vieweg, S., Hughes, A.L., Starbird, K., Palen, L.: Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1079–1088 (2010)

    Google Scholar 

  3. Güner, H.R., Hasanoğlu, I., Aktaş, F.: COVID-19: prevention and control measures in community. Turk. J. Med. Sci. 50(SI–1), 571–577 (2020)

    Article  Google Scholar 

  4. Alqurashi, S., Alhindi, A., Alanazi, E.: Large Arabic Twitter dataset on COVID-19. arXiv preprint arXiv:2004.04315 (2020)

  5. Primack, B.A., et al.: Social media use and perceived social isolation among young adults in the us. Am. J. Prev. Med. 53(1), 1–8 (2017)

    Article  Google Scholar 

  6. González-Padilla, D.A., Tortolero-Blanco, L.: Social media influence in the COVID-19 pandemic. Int. braz j urol 46, 120–124 (2020)

    Article  Google Scholar 

  7. Census (2021). https://en.wikipedia.org/wiki/2001_Census_of_India. Accessed 1 Apr 2021

  8. Github (2021). https://github.com/deepakuniyaliit/Covid19IRLTDataset. Accessed 1 Apr 2021

  9. Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.: Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 4 (2010)

    Google Scholar 

  10. Li, L., et al.: Characterizing the propagation of situational information in social media during COVID-19 epidemic: a case study on Weibo. IEEE Trans. Comput. Soc. Syst. 7(2), 556–562 (2020)

    Article  Google Scholar 

  11. Agarwal, A., Uniyal, D., Toshniwal, D., Deb, D.: Dense vector embedding based approach to identify prominent disseminators from Twitter data amid COVID-19 outbreak. IEEE Trans. Emerg. Top. Comput. Intell. 5(3), 308–320 (2021)

    Article  Google Scholar 

  12. Kouzy, R., et al.: Coronavirus goes viral: quantifying the COVID-19 misinformation epidemic on Twitter. Cureus 12(3), e7255 (2020)

    Google Scholar 

  13. Choi, D., Chun, S., Hyunchul, O., Han, J., et al.: Rumor propagation is amplified by echo chambers in social media. Sci. Rep. 10(1), 1–10 (2020)

    Article  Google Scholar 

  14. Alharbi, A., Lee, M.: Kawarith: an Arabic Twitter corpus for crisis events. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 42–52 (2021)

    Google Scholar 

  15. Agarwal, A., Toshniwal, D.: Identifying leadership characteristics from social media data during natural hazards using personality traits. Sci. Rep. 10(1), 1–15 (2020)

    Article  Google Scholar 

  16. Barkur, G., Vibha, G.B.K.: Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: evidence from India. Asian J. Psychiatr. 51, 102089 (2020)

    Article  Google Scholar 

  17. Han, X., Wang, J., Zhang, M., Wang, X.: Using social media to mine and analyze public opinion related to COVID-19 in China. Int. J. Environ. Res. Public Health 17(8), 2788 (2020)

    Article  Google Scholar 

  18. Ferrara, E.: What types of COVID-19 conspiracies are populated by Twitter bots? First Monday (2020)

    Google Scholar 

  19. Sharma, K., Seo, S., Meng, C., Rambhatla, S., Liu, Y.: COVID-19 on social media: analyzing misinformation in Twitter conversations. arXiv:2003.12309 (2020)

  20. Brennen, J.S., Simon, F., Howard, P.N., Nielsen, R.K.: Types, sources, and claims of COVID-19 misinformation. Reuters Inst. 7(3), 1 (2020)

    Google Scholar 

  21. Gupta, L., Gasparyan, A.Y., Misra, D.P., Agarwal, V., Zimba, O., Yessirkepov, M.: Information and misinformation on COVID-19: a cross-sectional survey study. J. Korean Med. Sci. 35(27), e256 (2020)

    Article  Google Scholar 

  22. Banda, J.M., et al.: A large-scale COVID-19 Twitter chatter dataset for open scientific research-an international collaboration. arXiv preprint arXiv:2004.03688 (2020)

  23. Zarei, K., Farahbakhsh, R., Crespi, N., Tyson, G.: A first Instagram dataset on COVID-19. arXiv preprint arXiv:2004.12226 (2020)

  24. Hu, Y., Huang, H., Chen, A., Mao, X.L.: Weibo-COV: a large-scale COVID-19 social media dataset from Weibo (2020)

    Google Scholar 

  25. Haouari, F., Hasanain, M., Suwaileh, R., Elsayed, T.: ArCOV-19: the first Arabic COVID-19 twitter dataset with propagation networks. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 82–91 (2021)

    Google Scholar 

  26. Qazi, U., Imran, M., Ofli, F.: GeoCoV19: a dataset of hundreds of millions of multilingual COVID-19 tweets with location information. SIGSPATIAL Spec. 12(1), 6–15 (2020)

    Article  Google Scholar 

  27. Gao, Z., Yada, S., Wakamiya, S., Aramaki, E.: NAIST COVID: multilingual COVID-19 Twitter and Weibo dataset. arXiv preprint arXiv:2004.08145 (2020)

  28. Aguilar-Gallegos, N., Romero-García, L.E., Martínez-González, E.G., Iván García-Sánchez, E., Aguilar-Ávila, J.: Dataset on dynamics of coronavirus on Twitter. Data Brief 30, 105684 (2020)

    Google Scholar 

  29. Shahi, G.K., Nandini, D.: FakeCovid-a multilingual cross-domain fact check news dataset for COVID-19. arXiv preprint arXiv:2006.11343 (2020)

  30. Chen, E., Lerman, K., Ferrara, E.: Tracking social media discourse about the COVID-19 pandemic: development of a public coronavirus Twitter data set. JMIR Public Health Surveill. 6(2), e19273 (2020)

    Article  Google Scholar 

  31. Uniyal, D., Rai, A.: Citizens’ emotion on GST: a spatio-temporal analysis over Twitter data. arXiv preprint arXiv:1906.08693 (2019)

  32. Uniyal, D., Uniyal, S.: Social media emerging as a third eye!! Decoding users’ sentiment on government policy: a case study of GST. In: 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), pp. 116–122. IEEE (2020)

    Google Scholar 

  33. Agarwal, A., Singh, R., Toshniwal, D.: Geospatial sentiment analysis using twitter data for UK-EU referendum. J. Inf. Optim. Sci. 39(1), 303–317 (2018)

    Google Scholar 

  34. Agarwal, A., Toshniwal, D.: Face off: travel habits, road conditions and traffic city characteristics bared using Twitter. IEEE Access 7, 66536–66552 (2019)

    Article  Google Scholar 

  35. Geopy (2021). https://geopy.readthedocs.io/en/stable/. Accessed 8 Apr 2021

  36. Cataldi, M., Aufaure, M.-A.: The 10 million follower fallacy: audience size does not prove domain-influence on Twitter. Knowl. Inf. Syst. 44(3), 559–580 (2014). https://doi.org/10.1007/s10115-014-0773-8

    Article  Google Scholar 

  37. Twitter Developer Policy (2021). https://developer.twitter.com/en/developer-terms/agreement-and-policy. Accessed 1 Apr 2021

  38. Hydrator (2021). https://github.com/DocNow/hydrator. Accessed 1 Apr 2021

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Uniyal, D., Agarwal, A. (2021). IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1525. Springer, Cham. https://doi.org/10.1007/978-3-030-93733-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93733-1_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93732-4

  • Online ISBN: 978-3-030-93733-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics