Introduction

In February 2020, two early reports on the epidemiological characteristics of the COVID-19 outbreak on large populations were made public, both from China. The first, published by the China Center for Disease Control (China CDC) on February 17, 2020, included 72,314 individuals suspected of being infected, of which 44,672 (61.8%) were confirmed by laboratory testing (Zhang, 2020). The reported male-to-female sex ratio among these cases was 1.06:1 with modest geographical variation in Wuhan (0.99:1) and Hubei (1.04:1). The second report, published less than 2 weeks later by the World Health Organization (WHO)-China Joint Mission on COVID-19, confirmed that 51.1% of the 55,924 laboratory-confirmed cases were in men (Aylward & Liang, 2020). Both studies also calculated a case fatality rate (or lethality) and found a large sexual dimorphism. The earliest report (from China CDC) found that 2.8% of infected men vs. 1.7% of infected women had died. By the time the WHO report was released, lethality had increased but a similar ratio was found: Crude fatality rate was 4.7% in men vs. 2.8% in women. For a roughly similar infection rate, the distribution of lethal cases was almost two-third men and one-third women, meaning twice as many men had died of the disease than women. A New York Times article on March 20, 2020, described a similar sexual dimorphism for the Italian outbreak (Rabin, 2020b): according to an analysis of over 25,000 COVID-19 cases by the Higher Health Institute of Rome, 8% of infected men and 5% of infected women, had died. Possible epidemiological differences also began to emerge as, in the Italian population, the proportion of men in the group who tested positive was significantly higher (about 60% were men) than in the Chinese cohorts.

This sex difference in COVID-19 lethality is reminiscent of the two previous outbreaks due to other coronavirus strains: Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) and Middle East Respiratory Syndrome coronavirus (MERS-CoV). For example, the lethality in infected men during the 2003 SARS-CoV outbreak in Hong-Kong was 21.9% versus 13.2% in women (Karlberg, Chong, & Lai, 2004); lethality has also been higher in men in the MERS-CoV outbreak (e.g., 30.5% vs. 25.8%; Mobaraki & Ahmadzadeh, 2019). Possible explanations for the sex differences in SARS-CoV-2, the virus causing COVID-19 infection, have been extensively discussed elsewhere (BMJ Global Health Blogs, 2020; Polglase, Mezzofiore, & Foster, 2020; Rabin, 2020a) and include sexual dimorphism in immune response to infection (Klein & Flanagan, 2016; Voskuhl, 2011) as well as the incidence of comorbidities and risk factors, such as chronic lung disease, and the behaviors that may underlie those, such as smoking.

While the United Nations (UN) has requested sex-stratified data be collected since at least the 1975 first World Conference on Women, a December 2010 report from the UN Economic and Social Council on the Global Gender Statistics Program highlighted that many countries do not provide these data (Rosenthal, 2010). This was also true for health-related data as documented during the third Global Forum on Gender Statistics (United Nations, 2010). Stunningly, it took the WHO until 2019 to publish its annual World Health Statistics report with data disaggregated by sex (Chaib, 2019). This report highlighted the vast sex differences in all aspects of health epidemiology, including access to and attitudes toward care, and revealed that “of the 40 leading causes of death, 33 causes contribute more to reduced life expectancy in men than in women”(World Health Organization, 2019, p. 5).

In the field of epidemics specifically, the importance of, and suggestions for, metrics to document communicable diseases were detailed in a 2007 WHO report (Anker, 2007). Recent experiences with the Ebola and Zika viruses highlight the importance of documenting epidemiological metrics by sex, adapting aid response to the specificity of local gender roles, and involving both men and women in surveillance and decision-making (Wenham, Smith, & Morgan, 2020). However, as Wenham et al. alerted in The Lancet on March 6, 2020: “Policies and public health efforts have not addressed the gendered impacts of disease outbreaks. The response to coronavirus disease 2019 (COVID-19) appears no different. We are not aware of any gender analysis of the outbreak by global health institutions or governments in affected countries or in preparedness phases” (p. 846).

Mandates to address sex in scientific research have been established for decades with limited success (see European Commission Directorate-General for Research & Innovation, 2016; Johnson, Sharman, Vissandjée, & Stewart, 2014; National Institutes of Health, 2015). Guidelines to design such studies have been published (e.g., Rich-Edwards, Kaiser, Chen, Manson, & Goldstein, 2018). While researchers now report the distribution of sex in their samples, the full potential of analyzing sex-disaggregated data to uncover potential sexual dimorphisms has yet to be realized. For instance, the first COVID-19 randomized clinical trial, reported that lopinavir-rotonavir, an antiviral combination used in the treatment of HIV infection, was not effective against SARS-CoV-2 (Cao et al., 2020). A similar number of men and women were included in the trial but drug efficacy was not analyzed by sex. Similarly, a report demonstrating that heart damage is an independent risk factor for death in COVID-19 infection did not analyze data with sex as a biological variable (Shi et al., 2020a). A similar number of men and women were included in the final report of 416 consecutive cases but a larger proportion of women had been excluded (57% of 229) because critical laboratory results were not available for them. This seems a missed opportunity given the well-known sex differences in the burden of cardiovascular disease (e.g., Mosca, Barrett-Connor, & Kass Wenger, 2011).

An emerging hypothesis to explain the severity of COVID-19 disease progression in some patients is the so-called cytokine storm, or cytokine release syndrome (Mehta et al., 2020; Shi et al., 2020). Accounts are emerging in the news media that critically ill patients were saved by experimental treatment with anti-inflammatory drugs (e.g., Brumfiel, 2020; Read, 2020). As the immune response is one of the best documented sexually dimorphic biological processes in humans (Klein & Flanagan, 2016; Ortona, Pierdominici, & Rider, 2019), collecting and analyzing data regarding the immune and inflammatory status of patients in a sex-disaggregated way will be critical in devising appropriate therapies for COVID-19 infection. This will, of course, also be critical when testing vaccine efficacy and toxicity. The sexual dimorphism of vaccine response is clearly established, with women typically mounting more robust antibody responses and experiencing more adverse events following vaccination than men. (Fischinger, Boudreau, Butler, Streeck, & Alter, 2019; Flanagan, Fink, Plebanski, & Klein, 2017).

In the current report, we sought to document what sex-disaggregated information was made available to the public and the research community through governmental websites. To account for the extreme fluidity of the situation, we accessed official websites at the transnational, governmental, and US state levels in on two occasions within the span of approximately 3 weeks, between March 21–24 and April 5–10, 2020. We hope our analysis serves as not only a review of the publicly available data at the onset of the recent COVID-19 outbreak in the context of the apparent sex bias but also sheds light on the lack of harmonized, sex-disaggregated statistics that are often released for epidemiological research. These data are used to inform healthcare and policy decisions and this should serve as a global call-to-action to better meet the needs and requirements of epidemiological data released.

Method

On March 22, 2020, we accessed the list of countries ranked by number of reported cases of COVID-19 maintained by Worldometers.org (see “Limitations” below). Between March 22 and March 24, we accessed the sources quoted by Worldometers to search for raw data. These sources included national news outlets reporting official press releases (e.g., El País for Spain), as well as official governmental websites. When no governmental website was quoted, we independently searched for one. We visited and took timed-stamped screenshots of all the websites. Websites were searched by native speakers in English, French, Chinese, and Swedish; authors with fluent or functional knowledge of the language in Spanish, Italian, and German, further aided by online translators for Portuguese, Dutch, Norwegian and Danish. Available metrics, source URLs, and date of accession are collated in Table 1.

Table 1 Sex-disaggregated metrics documenting the COVID-19 epidemic available on official websites

We also accessed the websites of trans-national entities, such as the WHO and the European CDC. Deficiencies of the WHO website, including a change in reporting methods on March 18 and an inability to correct errors in a timely fashion, were documented by ourworldindata.org (Roser, Ritchie, Ortiz-Ospina, & Hasell, 2020). The European CDC, which reports daily statistics for the whole world, beyond Europe, has been used as a reliable source by other reputed outlets. Johns Hopkins University also maintains a tally of cases on its Coronavirus Report Center webpage used as a source by news outlets. Because the U.S. CDC reported that they had stopped tallying tests and recovery rates, deferring to each state’s authorities to do so, we also sought data provided by the six states with the highest number of cases on March 23–24, 2020: New York, New Jersey, Washington, California, Michigan, and Illinois. We performed a second data capture to detail the diversity of reported metrics. Websites were accessed again between April 4 and 10, 2020 by one or more authors to verify and complete the data sets (see Table 1). (Details of reported metrics not related to sex can be found in LoTempio et al., 2020). To verify whether the trend of longitudinal sex ratio decrease among confirmed cases identified in the USA, the same six state websites were reassessed on April 19 (by which time New York City had more confirmed cases than any world country).

Results

We were unable to locate any sex-disaggregated metrics on the European CDC (accessed March 20, March 22, April 2), Johns Hopkins (March 20, April 2), WHO (March 21; April 12), Worldometers (daily March 20-24), or the CDC (March 20, April 2) dashboards.

Reported Metrics in the First Data Capture: March 21–24, 2020

Worldometers did not provide sex-disaggregated statistics. This website cited governmental websites for 8 countries, news agencies for 8 countries, and no sources for 2 countries (Table 1). For the remaining two countries, websites with uncredited sources were quoted: one was a GitHub for the Italian data (later identified as from the Dipartimento della Protezione Civile); for the other, on platz.se, for Sweden, we could not identify the authors or where the data were sourced.

We found sex documented among confirmed cases on the main governmental interface and/or the worldometers source for 6 out of 20 countries, 4 of 6 states in the U.S., and on the WHO-Europe website. Italian data was reported in the New York Times (Rabin, 2020b). Sweden and Australia provided graphs of cases disaggregated by both sex and age but did not provide numerical values. Only Denmark provided both the graph and proportion of confirmed cases of men by age (see Fig. 4). Sex-disaggregated information was located in published reports for China (Aylward & Liang, 2020; Zhang, 2020) but we were unable to find sex-disaggregated data on the interface of either the China CDC or the state-sponsored news agency (Sina) which both showed the same data when simultaneously accessed on April 4, 2020. We were able to locate the sex ratio across the number of deaths reported for two countries (Italy and South Korea) via the news media, and on the websites of two states (Washington State and New York City) and the Europe-WHO region.

Several countries reported admissions to an Intensive Care Unit (ICU) as a proxy of severity of disease (e.g., Italy, Belgium) but did not report admissions by sex of the patient. No other metrics of the COVID-19 pandemic shown on various dashboards, such as regional distribution, possible source of infection, ICU hospitalization, number of tests performed, recovery rate, or comorbidities, was reported by sex.

Reported Metrics in the Second Data Capture: April 7–19, 2020

By the time of our second data capture in early April 2020, most, but not all, countries had started reporting cases by sex (typically as percentage of confirmed cases; Table 2). Websites with already detailed information from the first data capture provided richer data and analyses 2 to 3 weeks later. For example, Denmark, the first country to provide numerical data for cases disaggregated by sex and age, provided both graph and numerical data disaggregated by sex, age, and comorbidity for cases, deaths, and hospitalizations in April. For some countries, such as the USA or France, sex-disaggregated data was not provided on the main public dashboard but could be located by accessing links to more specific reports.

Table 2 Diversity and evolution of reported sex ratios for COVID-19 infection

Cases. Graphical or numerical data by sex were reported for 14 countries: in graph-only format for 3 countries (Australia, Germany, Netherlands), and numerical values for another 11 countries. We were able to find sex-disaggregated data in France.only for ICU cases. Brazil provided graphical and numerical data disaggregated by sex and age but reported all patients with Severe Acute Respiratory Distress Syndrome in aggregate, including those caused by influenza A, SARS-CoV-2 or, for the vast majority, still under investigation or undiagnosed.

Deaths. Deaths were reported by sex for 12 of the 20 countries: Australia and Belgium in graph-only format and numerical data for Brazil, Canada, Denmark, France, Italy, Norway, Spain, Sweden, Switzerland, and the USA (CDC, IL, WA, NY City).

Comorbidities: The first (and only) country to report comorbidities (and symptoms) disaggregated by sex was Spain.

Tests: The first (and only) report of sex information on the number of tests performed was found on April 19 for the state of Illinois.

Other metrics: At the time of writing, no information had been found by sex for suspected source of infection or recovery rate.

Variability of Sex Ratio Among Confirmed COVID-19 Cases by Country: March 21–24, 2020

The February reports from China indicated a similar number of men (51.1%) and women (48.9%) among confirmed cases (Aylward & Liang, 2020). However, soon South Korean data began to emerge that looked very different with a larger proportion of women (61.5% vs. 38.5% men) among 8799 confirmed cases (“COVID-19/Coronavirus: Facts and Figures,” 2020; Klein, 2020). Data from Italy at around the same time showed an equally strong sex bias but in an inverse direction with over 60% of infections found in men among 25,000 confirmed cases (Rabin, 2020b). Information was provided from a press release (Italy), the Twitter feed of a reputed sex differences researcher (South Korea), or the data analysis from a private company (South Korea, see legend in Fig. 1). The only other two countries for which we found numerical data in March—Denmark and Portugal—failed to bring any clarity to the disparate data. The proportion of men and women infected with SARS-CoV-2 reported in Denmark was similar to the distribution reported in Italy, but cases reported in Portugal were closer to China with 48.7% of cases found in men. In the USA, the proportion of men among cases ranged from 43% in the state of Washington to 56.2% in California (Table 2).

Fig. 1
figure 1

Visual comparison of sex ratio among cases by pie charts: Screen shots of pie charts found for 2 and 5 countries in March and April, respectively. Longitudinal comparison in Portugal shows an increase in the percentage of women among cases. In the same time period, proportions have been roughly unchanged in South Korea (with ~ 60% women), although the change of color convention of this statista.com graph is misleading. Sex ratios are inverse in Italy (52.9% men) and Switzerland (53% women), while equal numbers of affected men and women were reported in Austria. (Text in English in appropriate color was overlaid over the screenshot when needed for increased legibility). Date of data is indicated

Variability of Sex Ratio Among Confirmed COVID-19 Cases by Country: April 7–19, 2020

More countries reported cases by sex at our subsequent data capture but the disparity between sexes persisted. The format in which those were reported was also variable: number of cases, percentage of men, and/or pie charts. Pie charts were visually impactful for data comparison for the countries that made them available (see Fig. 1). All available numerical data are compiled in Table 3.

Table 3 Availability of sex-disaggregated data captured April 5-10, 2020 for cases and deaths on the official websites of the 20 countries with the highest numbers of cases as of March 21, 2020

We were unable to locate data for the total number of cases disaggregated by sex for 6 countries: Brazil, China, France, Iran, UK, and the USA. Three countries reported cases by sex and age but only in graphical form (Australia, Germany, Netherlands), disaggregated by both sex and age. Visual inspection of the graphs suggested a disproportionate number of men among cases in the Netherlands but not in Germany or Australia (see Fig. 2). Among the 11 countries where numerical data were available (Table 3), the proportion of men in confirmed cases varied by over 13 percentage points, ranging from 53.1% (Italy) to 40% (South Korea). Two countries reported an equal number of men and women among confirmed cases (Austria and Norway; with Spain at 50.8%). Belgium, Canada, Denmark, Portugal, South Korea, and Switzerland reported more cases in women by at least 6 percentage points.

Fig. 2
figure 2

Wide disparity of data display and age-binning for COVID-19 cases disaggregated by sex and age. Data disaggregated by both sex and age were found for 9 of the 20 countries and Europe (Screen captures are shown; Denmark is shown in Fig. 3). ‘Men’ and ‘women’ labels have been added to the screenshots, in the appropriate colors, when not in English. Blue arrows on the Spain graph were added to indicate features (see description in text). Age-binning, reported metric, and type of graphic representation was different for all. We could not find any stated justification on any governmental website for the wide array of age-binning categories. 2A: European Center for Disease Control (CDC), 2B: World Health Organizatin (WHO) Europe; 2C Germany, 2D Australia, 2E Belgium, 2F Italy, 2G Sweden, 2H Canada [see Table 1 for full URLs for left (www.canada.ca) and right (https://experience.arcgis.com) graphs], 2I Spain, 2 J Netherlands

Changes in Confirmed COVID-19 Cases Among Women Between the Two Data Collection Waves

The pie chart representation of the Portuguese data highlights the change in sex ratio among cases between March 22 and April 6. As shown in Fig. 1a, longitudinal comparison in Portugal revealed an increase in the proportion of women among confirmed SARS-CoV-2 cases with 51.3% reported on March 22 and 56.7% reported on April 7. The trend continued with an increase to 58.5% just 6 days later.

During this same time period, the proportion of men and women appeared roughly unchanged in South Korea with ~ 60% of cases reported in women (see Fig. 1b but please note the change of colors representing men and women). Among the other countries which provided pie charts by the time of the second data capture (Fig. 1c), Austria had equal numbers of confirmed cases of infected men and women but Switzerland reported more women among cases (53%). The higher proportion of men among Italian cases noted during the first wave of data collection (60%) fell to roughly 53% by the second wave of data collection.

To verify if the trend held, we also recaptured data for the USA (6 states observed on April 19; see Table 2). Data by sex could no longer be found on the Michigan state website. Washington had a similar proportion of female cases (53%) at both waves of data collection but, in both New York City and California, the proportion of infected women had increased to 47% and 49%, respectively.

To further understand the trend, we turned to the reports of data disaggregated by both sex and age. On March 22, the only country providing data by sex and age was Denmark (Fig. 3). While the data showed many more infected men than women in all age groups on March 22, the trend had inverted by April 8, with many more infected women than men between the ages of 20 and 60 (confirmed cases increased almost eightfold in that timespan; see Fig. 3c).

Fig. 3
figure 3

Mirror bar graphs of the number of confirmed cases by age [Y-axis: age in 10-year bins] and sex [X-axis: number of cases, scale 0–650 (left, right) or 0–260 (center)] in Denmark illustrate widely different distributions over time. The three graphs illustrate the difficulty faced by scholars analyzing data in real time to try and derive evidence-based recommendations. The left (a; cumulative cases up to March 12) and right (c: cumulative cases March 13-April 8) graphs were captured on April 9 on the same website as the March 22 capture (b; center). Red: women; blue: men. (Numbers were also provided in accompanying tables)

Disaggregation of Confirmed Cases by Both Sex and Age

While most countries provided disaggregated data by age, disaggregation by both sex and age was rare in the first data capture. That information, while likely available, was also not included in the reports on Chinese data (Aylward & Liang, 2020; Zhang, 2020).

Data stratified by sex and age was only available in three countries as of March 24. Two—Sweden and Australia—provided graphs without numerical values—one a pyramid graph and the other a bar graph using different age categories (Figs. 2d, g). Denmark provided a pyramid graph (Fig. 3b) and a table with the numbers used to create it. Both Sweden and Denmark clearly reported more SARS-CoV-2-infected men than women at all ages. In the Australian cohort, counts were similar across age groups, except in the 40–49 age range, where women represented only about 40% of the cases.

In the second round of data capture, we were able to find information disaggregated by both age and sex for 9 out of the 20 countries. We also found data across Europe by accessing analytical PDFs linked from the main dashboard. For example, a bar graph, with indicated numbers, could now be found in the rich PDF, updated daily by the Sciensano Institute in Belgium, available only in French. (The dashboard of the Belgian Federal Public Health, visible in French, English, Dutch and German, has no information disaggregated by sex). The Istituto Superiore di Sanità in Rome, Italy, also publishes a daily-updated table with cases and deaths by sex and age, which is available in Italian, in PDF format, as a link off the Italian or English interfaces of the Epicentro website. Switzerland displayed sex and age data on a highly interactive dashboard with cases, deaths numbers, percentages and clickable illustrations (in French and German).

While more information was made available between the first and second data capture, the extreme diversity of representation and metrics made comparing data between countries difficult, as illustrated in Fig. 2. The European CDC (Fig. 2a) displayed the number of cases in Europe in side-by-side bar graphs for men and women, accounting for missing data (“unknown”), while WHO Europe (Fig. 2b) displayed the percentage within each age group among cases of men and women in a mirror bar graph using an oddly expanded X-axis scale. Age was binned in 5-, 10-, 15- or 25-year increments across the lifespan by the European CDC and by grouping ages 0–29 years, then in 10-year bins by WHO Europe. A graph for Germany (Fig. 2c) was similar to that of the European CDC, except it showed only 6, different age groups (0–4, 5–14, 15–34, 35–59, 60–79, 80+) in irregular (5-, 10-, 20-, or 25-year) increments. Two other countries also displaying numbers of cases by sex in side-by-side bar graphs used regular 10-year age bins, Australia and Belgium (shown in Fig. 2d, e, respectively; in addition to the graph, numbers were provided for Belgium but not Australia).

No figures were provided for the Italian data but a very complete table included, in bins of 10 years, the number and proportions of cases (and deaths) by sex and the total number of cases including those cases where sex was not documented.

Two linked websites illustrated the Canadian data with different metrics and graphs: a mirror bar graph of case percentages by age in ten 10-year bins (plotted age 100 to 0, opposite to all other graphs) and a stacked bar graphs of number of cases. The former did not have a legend for the color by sex on the graph (when we returned 5 days later to verify the data, the URL had become inactive). For the latter, the x-axis indicates 4 non-continuous age groups (+ unknown) but 9 bars are shown, and it is unclear what age ranges are actually shown. The Netherlands also showed the number of cases by sex in a stacked bar graph but used different age-binning with 20 categories of regular 5-year increments (see Fig. 2j).

Finally, mirror bar graphs were chosen to display data from Sweden (on March 22, Fig. 2g, but were no longer available in April), Spain (available in April but not in March, Fig. 2i), and Denmark (Fig. 3). Sweden and Denmark graphs showed the number of cases while Spain plotted the proportion of cases in each age bin. Age was binned differently by the three countries: increments of 10 years for Sweden, except between ages 10-30, which were split into 5-year bins; 19 groups of 5 years for Spain; 10 groups of regular 10-year bins for Denmark. Denmark and Spain also provided numerical data in attached tables. Spain also provided an overlaid pyramid of ages of the general population, which illustrated the low rate of infection in people under the age of 20 and the high rate among men above age 50. Interestingly, for women, it suggests a bimodal effect depending on age (starting around the time of menopause and above age 60; blue arrows in Fig. 2i).

Sex Ratios Vary by Age in a Consistent Pattern Across Countries

Figure 2 illustrates the difficulty of comparing results when representations, age-binning, and metrics (number vs. percentage of each age group among cases) are not standardized, and no clear trends were immediately apparent. We used the numerical data made available by Belgium, Denmark, Italy, Norway, Switzerland, and Spain to plot the sex ratio across the ages (Fig. 4; numerical data and data sources are shown in Supplementary Table 1). Both datasets from March 22 and April 9 for Denmark are represented in the graph. These data showed that the sex ratio among cases varied with age following a complex trend. For all 6 countries, the sex ratio decreased from birth to a low at age 20–30 (when ~ 2/3 of cases were in women in Belgium). It then increased up to age 60, plateaued until age 80, and decreased again likely due in part due to the greater proportion of women in the general population in that age range. This trend was observed in all six countries despite their very divergent average sex ratios (range = 45–53%). Strikingly, the sex ratios calculated for the Denmark April 9 data followed the same trend as the other five countries, in sharp contrast to the monophasic trend observed for the March 22 data (trendline, Fig. 4).

Fig. 4
figure 4

The sex ratio of confirmed infections varies by age and with time. Proportion of men by age, binned in 10-year increments, is shown. Data for Denmark are graphed at 2 stages of the epidemics: Mar. 22, when it was the only such data available (dark blue line), and Apr. 9 (medium blue bars), when the sex ratio at all ages mirrored trends in the other countries captured at a similar time. The average percentage of men across all age groups for each country is shown in the inset table. All data, data source, and calculations are shown in Supplemental Table S1. Captures were on Apr. 5 for Belgium, Apr. 7 for Switzerland and Italy (captured data also shown in Fig. 2), Apr. 9 for Spain (April 6 data) and Norway

Men Accounted for the Majority of Deaths of COVID-19 in All Countries

Sex information about deaths had been made available for 3 countries by March 24, 2020: Italy (data March 24, 2020; New York Times), South Korea (March 21, 2020; (Klein, 2020) Twitter feed), and China (February, 2020; Aylward & Liang, 2020; Zhang, 2020). We did not find sex-disaggregated data for deaths on any of the websites for the 20 countries but data were available from the WHO-Europe region, Washington State, and New York City.

China had reported a much higher lethality (deaths among confirmed cases, also known as case fatality) of 4.8% in men versus 2.8% in women (~ 64% of total deaths were among men; (Aylward & Liang, 2020). WHO-Europe reported 71.4% of men among the total of 1032 deaths during the week of March 9–15, very similar to the 71% reported for Italy (cumulative to March 20) and 68.2% for New York City. In contrast, South Korea and Washington State reported much higher proportions of women: 47% (of 102 deaths) and 55% (of 108), respectively. As indicated above both locations had a much higher proportion of infected women as well, even early into the epidemic.

During the second round of data capture, we were still unable to find deaths stratified by sex data for Austria, China, Germany, Iran, the Netherlands, Portugal, South Korea, the UK, or USA (California, Michigan, New Jersey). The graph-only representations for Australia and Belgium appeared to indicate a much higher number of men than women (Figs. 5c–e). For the other 10 countries, the proportion of men ranged from 55% (Norway) to 68% (Italy) (Table 3).

Fig. 5
figure 5

Sexual dimorphism in lethality of COVID-19 infection. a Lethality (or case fatality, calculated as the percentage of deaths among confirmed cases) by sex for data sets where disaggregated number of cases and deaths was found for the same cohort. Data source and calculations shown in Table S2. b Calculated sex ratios (number of men/number of women) for cases and deaths quantify the excess of men among deaths and ICU cases. Three entities (c, d, e) provided graphic visualization of deaths by sex and age. Age-binning, type of graph, and metrics were different in all three. No numbers were provided for Belgium or Australia, and we were unable to calculate a sex ratio

When the numbers of cases and deaths were available in the same cohort, we systematically calculated the lethality for each sex (fraction of deaths per confirmed cases, in men and in women, shown in Fig. 5a) and the excess of men (male-to-female sex ratio) among cases and deaths (Fig. 5b). (Raw data and calculations shown in Supplementary Table S2 and Table 4). Sex ratios ranged from 1.2 (20% more men) in Norway to 2.1 (more than twice as many men) in Italy. All countries and states also showed higher lethality in men compared to women, irrespective of the number or sex ratio in cases or average lethality in the country. Lethality varied greatly from over 15% in Italian men (almost twice as high as in Italian women) to under 2% in Norway (which showed the smallest disparity between men and women lethality). Country-specific policies, infrastructure, climate, and/or lifestyle may underlie the puzzling differences in lethality. Denmark, with a similar number of cases as Norway (both in absolute numbers and per capita, 1329 and 1326/1 M population respectively; data retrieved from Worldometers.com on April 21, 2020) had a lethality three times higher, with a strong male bias. Sweden, with a slightly higher per capita COVID-19 infection rate (1517) recorded the second largest lethality at almost 12% in men (~ 8.5% in women).

Table 4 Sex ratio among hospitalizations and ICU cases

Switzerland, which had the third highest per capita infection rate among countries with at least 5 million inhabitants (3243/1 M; behind only Spain and Belgium and ahead of Italy), had the same sex ratio in cases (~ 0.87) and deaths (~ 1.59) as Denmark, but a much lower lethality, in both sexes. Lethality increased dramatically in New York City as the epidemic progressed (from < 1% to 3.9% for women and 6% for men) but the sex difference was maintained. This is similar to what was observed in China where the lethality had doubled between the two February reports but the difference of lethality between the sexes remained similar. Washington state, which was the first recognized site of COVID-19 outbreak in the USA, already had a lethality around 5% in late March, for both sexes. By April, average lethality was unchanged, but the sex difference had emerged with men experiencing more deaths than women (sex ratio 1.32).

Severity of Disease: ICU and Hospitalizations

A March 20 report from the Intensive Care National Audit and Research Centre (inarc.org) disclosed that, of the 196 critically ill patients in the UK, 70.9% were men (ICNARC, 2020).None of the other reported health characteristics of the cohort (body mass index, comorbidities, length of stay, deaths, and therapies) were disaggregated by sex.

Inpatient observation days in the China CDC report were 342,063 for men and 319,546 for women (Zhang, 2020). This metric has typically been used as a proxy for severity but, in the case of a lethal infectious disease, there can be ambiguity as to whether observation days ended because the patient has died or recovered (i.e., hospital discharges). Additionally, numbers could be confounded by cultural or biological factors that may have increased the likelihood of one sex being admitted at an earlier versus later stage in the disease progression. Nevertheless, we calculated that the average numbers of observation days per patient were similar in both sexes (342,063 days/22981 cases = 14.88 for men; 319,546/21,691 = 14.73 for women) in the China cohort.

Where sex-disaggregated numbers were available (Table 4), data showed that men represented the vast majority of cases admitted in the ICU, with percentages ranging from 64% in Canada to 76% in Norway. (All the countries for which the information was known had roughly similar number of women and men or a higher proportion of women among the total cases). A similar trend, with a lower sex ratio divergence, was seen among hospital admissions, where women represented 41–45%. Similarly, in an analysis of 1099 patients hospitalized with SARS-CoV-2 in China, 41.9% were women (Guan et al., 2020).

Two countries provided numbers of cases and deaths on the same cohorts: Canada among severe hospitalized cases and France among ICU cases. For these cohorts we calculated lethality by sex (Table 4, in bold). In striking contrast to lethality in the infected population at large, we found similar (19.5% in Canada) or slightly higher lethality in women than men (France, 9.3% vs. 8.1%) among those severe cases. This suggests that while women may be developing a less severe (or different, see below) form of COVID-19, those women who do reach the ICU in deep respiratory distress may have a similarly poor chance of survival as men.

Comorbidities and Symptoms

The U.S. National Center for Health Statistics reported that 1097 out of 1879 (58.4%) deaths with pneumonia and COVID-19 were men as of April 4, 2020. In contrast, only 52% of deaths with pneumonia without COVID-19 were among men, and similar numbers of men and women had died from influenza in the same period (2214 men vs. 2253 women; 49.6% men).

Spain was the only country for which we could find a detailed report of comorbidities by sex (data April 6, 2020; data captures and sources for the USA and Spain shown in Supplementary Figure S1). Numbers and percentages for eleven different symptoms and four comorbidities were reported by sex and statistically examined for between sex differences. While it was unclear what the statistical calculations referenced, numerical data indicated that sore throat, vomiting, and diarrhea were more frequent in women. The widely described symptoms of COVID-19 appear to be those most frequently found in men than women: fever, pneumonia, severe acute respiratory distress syndrome, and other respiratory symptoms. Pneumonia, for example, was found in 65% of men, but only 49% of women.

Available data are insufficient to understand what sexually dimorphic traits will eventually be relevant to COVID-19 comorbidities [Further references about the state of knowledge about sex differences in human disease and physiology and SABV research can be found on the repository maintained by Northwestern University (Northwestern University Women’s Health Research Institute, 2020) and an NIH workshop report (Institute of Medicine (US) Forum on Neuroscience and Nervous System Disorders, 2011)].

Discussion

Our review indicated that reporting of sex-disaggregated data is not widespread. It is unclear whether the information is collected but not reported, or if it is not collected at all. Whatever data existed early on seemed to have been available only to the institutions that collected them, and released in priority to news outlets. While this allowed a report of early sexually dimorphic trends (in blog format, prior to peer review, published after we had performed the first data capture), it highlighted an inequality in accessibility to data.

When available, the information was presented in a wide variety of ways, strikingly illustrated in Fig. 2, with extreme diversity in reported metrics: number, rate per 100,000 population, fraction of various variables, percentage of men or of women, and so forth. When possible, we used available data to reverse calculate and present complete data sets with comparable metrics. This was time-consuming, error-prone, and sometimes not possible. For sex- and age-disaggregated data, systematic comparison was not possible because of the variety of age range options chosen by the various entities to bin their data. Arbitrarily binning continuous variables (e.g., age) not only makes comparison between data sets difficult, it might hide—or artificially create—trends. For example, grouping children aged 10–19, across puberty, is likely to mask hormonally influenced outcomes. Statistical analysis needs to be applied to raw data to reveal trends (and guide age-binning), rather than the converse. For sex, provision for a third option, beyond the binary, would be useful to accommodate both various country-specific laws and missing data. Besides legal sex, we would also recommend capturing gender (in a non-binary way), as this is a critical parameter to consider for population health equity research and policy. For example, both gender (social or cultural influences) and sex (biological differences) are thought to influence cardiovascular disease (Spence & Pilote, 2015), a comorbidity of COVID-19 infection, as well as immune response to vaccines (Fink & Klein, 2015).

Where countries have put online high-quality disaggregated data, those are typically coded (contrast coding, e.g., 1 = male, 2 = female). This is useful for automated analysis but increases potential for data entry error. We suggest capturing harmonized data (e.g., country-level male, female, other) sensitive to local usages. Software can then be designed to code the entry (i.e., transform woman, femme, kvinna, donna, mujer, sieviete, babae, or Frau into “2”), merge the standardized data sets, and redeploy them in the native language for easy use by local researchers.

The Danish data illustrate the difficulty of interpreting daily updated data in the exponential growth phase of a pandemic. Policies based on the early trend showing a massive excess of men at all ages would have been obsolete just 3 weeks later. Several hypotheses can be made to explain the increase in the proportion of women over time, all of which require further, currently unavailable sex-disaggregated data to prove or disprove. It is possible that more accurate statistics were obtained with increasing cohort size or that higher lethality in men started reducing the sex ratio of cases as the epidemic progressed. It is also possible that expansion of testing availability resulted in milder cases, or cases with different presentation, becoming counted in women. It could also be a true increase in female rate of infection over time, possibly due to evolution of exposure type. For example, in societies where fewer women work outside of the home, infection of women may become more widespread in a second wave after infected men have brought the disease back to their communities and their caretakers. Sex-disaggregated data about method of transmission or disease symptoms will be required to test these hypotheses.

Men appear more likely than women to die after being diagnosed with COVID-19, but the similar lethality for men and women among severe cases in France and Canada suggest a nuanced picture. If the presentation of the disease is different in men and women, the criteria for testing, established on the earlier, male-biased statistics, may be inadequate to identify infected women. There may also be regional, cultural, and infrastructural specificities resulting in differential reporting of outcomes for men and women. Several anecdotal report suggested that women may be dying at home, and thus be underreported by overwhelmed local authorities.

It is unclear whether the higher lethality reported in patients with comorbidities such as cardiovascular disease, diabetes, lung disease, or cancer, is sexually dimorphic. Both the US CDC and Spanish data suggest that severe lung disease is a feature more frequently associated with COVID-19 in men, for example. Sex differences in lung physiology are well established (Pinkerton et al., 2015; Townsend, Miller, & Prakash, 2012). The Spanish data also indicate that other symptoms may be sexually dimorphic, with diarrhea, vomiting or sore throat more frequently observed in women. The extent of infection in women may not be fully recognized if presentation is different.

We found a single preprint in which the authors designed in silico experiments in search of a mechanistic explanation for the sex differences in SARS-CoV-2 infection (Wei et al., 2020). Using publicly available human tissue gene expression and ChIP-Seq data sets, they proposed an intriguing hypothesis where the androgen receptor (AR) could directly control expression of ACE2, one of the two main proteins for SARS-CoV-2 entry into human cells (the other, TMPRSS2 is a known androgen-responsive gene; Clinckemalie et al., 2013). Analysis of single-cell RNA-sequencing found that more pulmonary alveolar type II cells express ACE2 in men than women, and that ACE2 is expressed in the prostate and in Sertoli and Leydig cells of the testis, possibly providing another entry path for the virus in men. If these in silico explorations withstand the rigors of peer-review and of in vitro or in vivo experiments, it may mean that aspects of COVID-19 pathology will be susceptible to anti-androgen therapy.

Limitations

Instability of weblinks has been a constant issue in the data collection process for this study. For example, worldometers.com is now worldometers.info. The names of the reporting institutions have been provided to help find information in case more hyperlinks become obsolete in the future. While we aimed to provide an unbiased, comprehensive review of all publicly available COVID-19 data, the use of specific search engines, which are built on machine learning algorithms to personalize searchable content, may have inadvertently obstructed our team’s ability to view the full scope of all data available. This should however be alleviated by the facts that 1) all websites were accessed by multiple users using a variety of browsers (Safari, Chrome, IE) and search engines (Google, Bing, DuckDuckGo) and 2) for the most part, we did not do searches per se, but went directly to the official websites.

A more complete discussion of metrics standardization strategies and hypotheses about mechanisms of sex differences in COVID-19 can be found in our companion preprint (Kocher et al. 2020).

Conclusions

The abundance of non-harmonized information illustrates both the nimbleness of digital systems to support responsiveness to a new pandemic and the lack of preparedness of governmental health authorities worldwide for such an event. The paucity of accessible raw datasets and disparate metrics used to capture data make it difficult to inform public health policy. The ascending curve of a pandemic is obviously not the best time to build infrastructure to collect global standardized data for efficient surveillance. Over time, we noted improved reporting, but countries had clearly disparate resources available, and preparedness might have helped alleviate the reporting burden for lower-resourced regions. Lack of data harmonization and sex-disaggregation, however, were ubiquitous issues. It is critical that sex as a biological variable be considered an essential metric, rather than an afterthought. Beyond SABV, the global fight against public health inequalities and the devising of inclusive policies for COVID-19 and all future pandemics will require collecting comprehensive and harmonized public health data (LoTempio et al., 2020), and the thorough intersectional analysis of these data (e.g., Hankivsky et al., 2010), incorporating socioeconomic factors, ethnicity with sex, gender, and age to address the higher morbidity and mortality rates in the most vulnerable communities and target limited health funds optimally.