India reported its first case of COVID-19 on January 30, 2020. Six months since then, COVID-19 continues to be a growing crisis in India with over 1.6 million reported cases. In this communication, we assess the quality of COVID-19 data reporting done by the state and union territory governments in India between July 12 and July 25, 2020. We compare our findings with those from an earlier assessment conducted in May 2020. We conclude that 6 months into the pandemic, the quality of COVID-19 data reporting across India continues to be highly disparate, which could hinder public health efforts.
Two key components in containing the COVID-19 pandemic are public awareness and public trust in the government. These components critically depend on timely and accessible dissemination of COVID-19 data by the government1. While there are studies showing disparities in personal healthcare access in India, very little was known about the quality of access to public health data across India, especially during the early months of COVID-19 pandemic2,3. To address this problem, we developed a semi-quantitative framework to assess the quality of COVID-19 data reporting, and used it to calculate a COVID-19 Data Reporting Score (CDRS) for 29 state and union territory (UT) governments of India4. This assessment was done during the 2-week period from May 19 to June 1, 2020. The study showed a strong disparity in the quality of COVID-19 data reporting across India—CDRS varied from 0.61 (good) to 0.0 (poor) across the country, with a median value of 0.26.
In this communication, we present the findings from a second assessment of the quality of COVID-19 data reporting across India. This study was done during the 2-week period from July 12 to July 25, 2020, and includes 35 statesFootnote 1 and UTs of India. Hereafter, this 2-week period is referred to as the scoring period. Lakshadweep was excluded from the study as it did not have any COVID-19-positive cases as of July 12, 2020. Hereafter, the first assessment done during May is referred to as study-1 and the second assessment from July is referred to as study-2.
Our scoring framework consists of 45 indicators spanning four key dimensions of public health data reporting—availability, accessibility, granularity, and privacy4,5. These indicators capture the presence or absence of a piece of information in the reported data and the format in which it is reported. We would like to emphasize that our framework does not assess the “accuracy of the reported data.”
In the availability dimension, we check the availability of basic data such as, daily and cumulative number of confirmed cases, deaths, and recoveries in the state5. To assess the accessibility of data, we check for the presence of trend graphics, availability of data in English, and the ease of getting to the web page where data are reported. Trend graphics are important because they make it easier to see patterns in the data. To evaluate the granularity of data, we check whether the state is reporting cumulative data stratified by age, gender, comorbidity, and districts. Granular data helps a layperson connect with the data at a personal level. To assess if a state is ensuring privacy while reporting data, we check if any personally identifiable information of COVID-19 suspects or patients are made publicly available on the state’s COVID-19 data reporting page. The report items shown as column headers in Table 1 represent five possible stages in which an individual can find themselves during the pandemic.
Each “Metric-Report Item” pair shown in Table 1 is an indicator. The entries in the table represent the possible scores an indicator can earn4. This table is filled for each state during the scoring period by checking the data reported by that state. For example, if a state is reporting total confirmed COVID-19 cases then a score of 1 is assigned to that indicator. The scores recorded in the table are collectively referred to as the scoring data.
Using the scoring data, four categorical scores, one for each dimension, and an overall score is calculated for each state. The categorical scores are obtained by summing the scores earned by the indicators in that dimension. The overall score is the normalized sum of the four categorical scores, and is referred to as the COVID-19 Data Reporting Score (CDRS). For further details on the scoring metrics, scoring process, and score calculation, refer to our article introducing the CDRS framework4.
Results and Discussion
CDRS and the normalized categorical scores for the states in India are tabulated in Table 2. The categorical scores are normalized by the difference of maximum and minimum score possible in that category. The value of CDRS across states indicates a strong disparity in the quality of COVID-19 data reporting in India. The five number summary of CDRS is, min = 0.00, first quartile = 0.20, median = 0.30, third quartile = 0.35, and maximum = 0.63. The geographical disparity in CDRS is evident from the mapFootnote 2 shown in Fig. 1.
Figure 2 lists states in the decreasing order of CDRS. As seen in the figure, Karnataka is at the top, Bihar and Uttar Pradesh are at the bottom. Bihar and Uttar Pradesh get a CDRS of 0 because they do not release any COVID-19 data on their government or health department website. Figure 2 also shows the incremental change in CDRS from its previous value calculated during study-1 conducted between May 19 and June 1, 2020. As seen in Fig. 2 CDRS has increased in 12 states and decreased in 5 states since the previous study. Figure 3 presents boxplots showing CDRS across India from study-1 and study-2. As seen in the figure the median value has increased slightly from 0.26 to 0.30.
Figure 4 shows the number of states that get a non-zero score on an indicator in our framework. Among the 35 states assessed in this study, 33 states report some data on the COVID-19 situation in the state. Bihar and Uttar Pradesh continue to not publish any data on their government or health department website. The remaining 33 states report the total deaths and recovered cases, while only 32 of them report the total confirmed cases. Gujarat does not report the total confirmed cases but reports the number of active cases.
CDRS of 12 states have improved in study-2 as compared to study-1. Nine of the 12 states, namely, Andhra Pradesh, Chhattisgarh, Goa, Haryana, Karnataka, Kerala, Ladakh, Uttarakhand, and West Bengal have started reporting more granular data. This is encouraging and is definitely a step in the right direction.
In general, the states continue to score the lowest in the granularity dimension. Jharkhand, which had the highest granularity score in study-1 has stopped reporting age- and gender-stratified data for the total confirmed cases, deaths, and recoveries since June 8, 2020. Hence, its normalized granularity score dropped from 0.50 to 0.17 in this study. It might be worthwhile to investigate what led the Jharkhand government to stop reporting age- and gender-stratified data.
Punjab and Chandigarh compromised the privacy of individuals under quarantine by releasing personally identifiable information on their official websites. Chandigarh releases the name and address of people under home quarantine on a daily basis. Punjab released name, age, gender, and mobile number of persons inbound to the state from New Delhi on May 10, 20204. As of July 25, 2020, the document is still present on the Punjab government’s health department website.
Testing: The strategy recommended by ICMR for COVID-19 testing in India has evolved over time6,7,–8. The degree of relevance of testing data in understanding the spread of COVID-19 within a state depends on the testing strategy (e.g., how people are chosen for testing). Therefore, we did not include an indicator in our framework to score the reporting of testing data. However, we note that all the states in India report some data on testing. But the reported testing data in most states do not distinguish total samples tested from total persons tested. In other words, most states are reporting total samples tested without specifying how many of them are unique. This is an important limitation to the data that is available to track the testing in a state9. For instance, in the case of Tamil Nadu which reports both total samples and total persons tested, the difference between those two numbers is more than a lakh as on August 7, 202010.
Age brackets: Karnataka, Odisha, and Tamil Nadu report total number of confirmed cases stratified by age. Karnataka and Kerala report the total number of deaths stratified by age. However, the number of age brackets used by each of these states is different, making it difficult to compare the age distribution of confirmed and deceased individuals across states. For example, Karnataka, Odisha, and Tamil Nadu use eight, four, and three age brackets, respectively, to report the total number of confirmed cases stratified by age.
Aarogya Setu mobile app: On April 02, 2020, the Indian government launched Aarogya Setu mobile app with the objective of enabling Bluetooth-based contact tracing, mapping of likely hotspots, and dissemination of relevant COVID-19 information11. To use the app, one has to register with a mobile number, agree to its data sharing policy, and give it access to Bluetooth and location information. While access to phone number, Bluetooth, and location information might be necessary for contact tracing, we believe that expecting people to provide such information just to access critical COVID-19 data is unreasonable. Therefore, we did not consider data reported through the Aarogya Setu app while scoring the states. However, we would like to mention that the app reports cumulative and daily data for confirmed, deaths, and recoveries, both as text and trend graphics for all states.
Data aggregation platforms: covid19india.org is a volunteer-driven nationwide COVID-19 data aggregation initiative. They collect and report COVID-19 data from across the country. While the initiative is noteworthy, it does not replace the need for high-quality data reporting on official government websites for the following reason. The initiative can fill-in gaps in the accessibility dimension described in our framework. However, they cannot fill-in for the gaps along the availability and granularity dimensions resulting from the lack of corresponding data released by the government.
Our assessment informs the public health efforts in India about the disparity in the quality of COVID-19 data reporting across the country. The available evidence shows that an improvement in the quality of data reporting is required all across India. The disparity in CDRS shows the lack of a unified framework for reporting COVID-19 data in India, and highlights the need for a national agency like Indian Council of Medical Research (ICMR) to monitor or audit the quality of data reporting done by the states. The disparate reporting score also reflects inequality in individual access to public health information and privacy protection based on the state of residence4.
Overall, there is an urgent need to fill the gaps in COVID-19 data reporting across the states. There has been only a marginal improvement in the quality of COVID-19 data reporting done by the states between May and July. With the pandemic being far from over, it is imperative that the states continue to learn from each other and improve their data reporting. We conclude this communication by quoting the following from the Economic Survey of India, “Given that sophisticated technologies already exist to protect privacy and share confidential information, governments can create data as a public good within the legal framework of data privacy. In the spirit of the Constitution of India, data should be ‘of the people, by the people, for the people’.”12
The curated scoring data used to calculate CDRS is publicly available at https://github.com/varun-vasudevan/CDRS-India/tree/master/study2_july. The states can use the scoring data to identify the limitations in their data reporting and improve upon them.
Hereafter, unless specified otherwise, the word state refers to both a state and union territory in India.
The map was generated using Tableau Desktop software version 2020.2.1 and the boundary information for regions in India was obtained as shapefiles from Datameet Org (http://projects.datameet.org/maps/).
O’Malley P, Rainford J, Thompson A (2009) Transparency during public health emergencies: from rhetoric to reality. Bull World Health Organ 87:614–618
Fullman N, Yearwood J, Abay SM, Abbafati C, Abd-Allah F, Abdela J et al (2018) Measuring performance on the Healthcare Access and Quality Index for 195 countries and territories and selected subnational locations: a systematic analysis from the Global Burden of Disease Study 2016. Lancet 391(10136):2236–2271
Barik D, Thorat A (2015) Issues of unequal access to public health in India. Front Public Health 3:245
Vasudevan V, Gnanasekaran A, Sankar V, Vasudevan SA, Zou J (2020) Disparity in the quality of COVID-19 data reporting across India. medRxiv (in submission). https://doi.org/10.1101/2020.07.19.20157248. https://www.medrxiv.org/content/early/2020/07/21/2020.07.19.20157248. Accessed 8 Aug 2020
WHO, Global surveillance for COVID-19 caused by human infection with COVID-19 virus: interim guidance. https://apps.who.int/iris/rest/bitstreams/1272502/retrieve. Accessed 14 July 2020
ICMR, Testing strategy for COVID-19 testing in India (March 09, 2020). https://www.mohfw.gov.in/pdf/ICMRstrategyforCOVID19testinginIndia.pdf. Accessed 08 Aug 2020
ICMR, Revised strategy of COVID-19 testing in India (Version 3, March 20, 2020). https://www.icmr.gov.in/pdf/covid/strategy/2020-03-20_covid19_test_v3.pdf. Accessed 08 Aug 2020
ICMR, Strategy for COVID-19 testing in India (May 18, 2020). https://www.icmr.gov.in/pdf/covid/strategy/Testing_Strategy_v5_18052020.pdf. Accessed 08 Aug 2020
JHU Coronavirus Resource Center, Which U.S. states meet WHO recommended testing criteria? https://coronavirus.jhu.edu/testing/testing-positivity. Accessed 08 Aug 2020
Health and Family Welfare Department, Tamil Nadu, Media Bulletin August 08, 2020. https://stopcorona.tn.gov.in/wp-content/uploads/2020/03/Media-Bulletin-07-08-20-COVID-19-6-PM.pdf. Accessed 07 Aug 2020
Ministry of Electronics and Information Technology, Aarogya Setu is now open source. https://static.mygov.in/rest/s3fs-public/mygov_159050700051307401.pdf. Accessed 08 August 2020
Ministry of Finance (2019) Economic Survey 2018–2019, vol 1. https://www.indiabudget.gov.in/budget2019-20/economicsurvey/index.php. Accessed 25 June 2020
The authors would like to thank Suhas Javagal for providing comments on a version of the manuscript.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Sources for scoring data
About this article
Cite this article
Vasudevan, V., Gnanasekaran, A., Sankar, V. et al. Variation in COVID-19 Data Reporting Across India: 6 Months into the Pandemic. J Indian Inst Sci 100, 885–892 (2020). https://doi.org/10.1007/s41745-020-00188-z