Background & Summary

Exposure to moderate and extreme heat is associated with increased risk of illness and death1. Although most epidemiologic studies contributing to this conclusion use dry-bulb temperature (i.e., ambient air temperature measured in the shade) as the exposure metric of interest2,3,4,5,6, others have assessed alternative metrics that incorporate humidity – such as the heat index, a combined temperature and humidity metric used by the U.S. National Weather Service7, or the humidex, which is used by the Meteorological Service of Canada – to better estimate the physiologic impact of heat on the human thermoregulatory system8,9,10. Increasingly, public health research on heat is considering metrics that combine additional meteorological conditions, including solar radiation and wind speed, to further contextualize the actual heat stress experienced by populations11,12,13.

Of particular interest is the wet-bulb globe temperature (WBGT), a thermal index originally developed in the 1950s to establish epidemiologically relevant thermal thresholds to prevent heat-related illnesses at US military training camps14. The WBGT is a weighted average of the ambient, wet-bulb, and globe temperatures, which collectively incorporate thermal, solar, and convective heat transfers from ambient temperature, humidity, solar radiation, and wind speed15. In contrast to simpler, more commonly used metrics, such as ambient temperature or heat index, WBGT is measured in conditions of direct solar radiation and is partially mitigated by wind speed, making it an appealing measure for estimating thermal conditions experienced by outdoor workers and athletes. In recognition of this utility, it has been approved by the International Organization for Standardization (ISO), the American Conference of Governmental Industrial Hygienists, and other national and international organizations for use as a thermal stress screening tool16.

However, it should be noted that WBGT has limitations, including its potential underestimation of thermal stress in conditions where sweating is restricted, susceptibility to measurement errors, and variability based on clothing and activity14. Others have also noted that the scale of the measurement is prone to misinterpretation, given that extreme values of WBGT are much lower than what would be considered extreme by ambient temperature standards16. Newer metrics have been developed that may reduce some of these limitations; of interest here is the Universal Thermal Climate Index (UTCI), a heat metric derived from human energy balance models with the goal of being universally applicable, physiologically relevant, and appropriate for use in a range of bioclimatic applications17. Although the UTCI is correlated with WBGT and may similarly reflect thermal perception18, some have advocated for replacing WBGT with UTCI in operational use cases, particularly in athletics19,20.

While WBGT and UTCI have clear utility in bioclimatic contexts, there are challenges for population health researchers hoping to use them in their analyses. For example, measuring WBGT requires specialized equipment that is not widely deployed operationally, and UTCI similarly requires data that are frequently unavailable in meteorological data sets. Although it would be advantageous for public health researchers to have easy access to population-scale estimates of WBGT, UTCI, and other heat metrics, no such database presently exists. Furthermore, no single measure of heat will be universally superior in all contexts10,19, suggesting the need for a single dataset containing multiple metrics for intercomparison.

Concurrently, heat-health researchers are increasingly aware of the value in using gridded meteorological data sets, both for their potential to provide population-applicable estimates of weather experienced across large areas and to avail rural areas without weather stations to epidemiologic analyses21. Recent studies have demonstrated the utility of a range of gridded data sets for such applications22,23,24,25. Although these data sets are a valuable source of information, they contain massive amounts of data that require time, computational resources, and expertise to process that are not available to many public health researchers.

Given these barriers to enhanced epidemiologic research, the broader goal of this data set is to provide: (1) a data set of pre-processed gridded data, population-weighted to the daily county level; and (2) code for calculating various heat metrics, including wet-bulb globe temperature and the Universal Thermal Climate Index using gold-standard approaches. The potential reuse value of these data extends not only to epidemiologic analyses, but to any research that requires daily, county-level estimates of temperature or other heat metrics.

Methods

Overview

Our dataset provides daily minimum, maximum, and mean values of ambient temperature, dew-point temperature, net effective temperature, heat index, humidex, wet-bulb globe temperature, and the Universal Thermal Climate Index, all population-weighted at the county level. We used a high-resolution reanalysis dataset to calculate these hourly heat metrics across the entire contiguous United States (CONUS) from January 2000 to December 2020. We developed an accompanying R package (heatmetrics) that can be used to replicate the calculations on other data sets (Fig. 1).

Fig. 1
figure 1

Schematic overview of the creation of the heatmetrics database.

Reanalysis data

We derived a series of heat metrics using data from the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 Land product (ERA5-Land)26,27. The ERA5-Land variables we obtained included: (1) two-meter air temperature, (2) two-meter dew point temperature, (3) surface pressure, (4) ten-meter zonal and meridional wind velocity vectors, (5) surface solar radiation downward, (6) surface thermal radiation downward, (7) surface net solar radiation, and (8) surface net thermal radiation. The ERA5-Land data are available hourly at a spatial resolution of 0.1 degrees (approximately 9 km) globally over land. We also obtained total sky direct solar radiation at surface from ERA528,29, the reanalysis from which ERA5-Land is derived. We interpolated this from the 0.25-degree ERA5 grid to the 0.1-degree ERA5-Land grid using nearest-neighbor interpolation, following the approach of Yan et al.30.

Converting to local standard time

Data from ERA5-Land are provided hourly in Coordinated Universal Time (UTC), also known as Greenwich Mean Time (GMT). Since many users need minimum and maximum values relative to the local day, we created raster stacks of the calculated hourly data that aligned with the local time zone. To do this, we created rasters of ERA5-Land grids containing centroids of latitude and longitude and then used the lutz R package31 to identify the time zone of each grid cell, which was represented as a numeric offset from UTC time. We created separate rasters for standard time and daylight saving time (for example, Eastern Standard Time has a UTC offset of −5 hours while Eastern Daylight Time has an offset of −4 hours). While most locations had consistent time zones throughout the study period, we manually accounted for time zone changes that occurred in the state of Indiana in 2006 and 2007 by creating shapefiles of affected counties and adjusting the ERA5-Land grid offsets accordingly. The final result was eight rasters representing grid-level UTC offsets: four each for local standard time (LST) and local daylight time (LDT) for years 2000–2005, 2006, 2007, and 2008–2020. We used these time zone rasters as masks to subset the calculated heat metrics corresponding to the local day. The result was stacks of 24 hourly observations for each day that reflected data from 00 local time to 23 local time. We note that although hourly time steps are the highest temporal resolution available in reanalysis data, “true” daily minimum and maximum temperatures can occur between hours. Others have found that, in some instances, this can lead to small differences in estimated relative risks of exposure to ambient heat, but that such differences have considerably smaller magnitudes than the differences in relative risks found between different heat metrics32.

Calculating heat metrics

We calculated hourly heat metrics across all of CONUS between January 2000 and December 2020 on the stacks of ERA5-Land data converted to local time. We then calculated, at the pixel level, the minimum, maximum, and mean values for each variable on every local day. The final results were rasters of CONUS for daily observations of each of the heat metrics described in this section. The database may be updated in the future as ERA5-Land data are updated.

Wet-bulb globe temperature

Lemke and Kjellstrom33 summarized and compared approaches to estimating wet-bulb globe temperature, and they found that the algorithm by Liljegren et al.15 performed most accurately across a range of conditions. We therefore employed this algorithm in our calculations here. In contrast to most estimates of WBGT – which use relatively simple equations – the Liljegren approach applies fundamental laws of physics, thermodynamics, and mass balance to separately model natural wet bulb (TW) and globe temperatures (TG), which together comprise 90% of the WBGT (the remaining 10% is the ambient temperature [TA]; Eq. 1).

$$WBGT=0.70\cdot {T}_{W}+0.20\cdot {T}_{G}+0.1\cdot {T}_{A}$$
(1)

The Liljegren algorithm was originally written in the C programming language but was translated to Javascript and made available in a web interface by the Occupational Safety and Health Administration (OSHA)34. We translated this Javascript into an R package and processed it on raster stacks of the ERA5-Land data.

The WBGT algorithm requires as inputs the following variables (Table 1): year, month, day, hour, minute, UTC offset, averaging time of meteorological observations, latitude, longitude, temperature, relative humidity, incident solar radiation, wind speed, surface pressure, height of wind speed measurement, vertical temperature difference between observed temperature and wind-speed-height temperature, and urban or rural land cover. Although most of these variables are provided directly by ERA5-Land, a few needed to be derived first. Relative humidity was calculated using air temperature and dew-point temperature, and the wind speed was calculated using the zonal and meridional wind velocity vectors (Table 1). To identify ERA5-Land grid cells as urban or rural, we used the 2011 National Land Cover Database35 and assigned grid cells as “urban” (1) if at least 33% of the 30-meter by 30-meter land cover pixels within each ERA5-Land grid cell were classified as “developed, low intensity,” “developed, medium intensity,” or “developed, high intensity.” All other grid cells were classified as “rural” (0). Note that the contribution of urban/rural to the WBGT calculation is marginal and only affects the conversion of wind speed from higher altitudes to two-meter equivalents.

Table 1 Inputs to wet-bulb globe temperature (WBGT) algorithm.

In addition to the pre-processing described above, we made a few small modifications to the Liljegren algorithm, as described here and as comments throughout the R source code. First, in the calculation of the natural wet-bulb temperature, the original algorithm used a static enhancement factor of 1.004 when calculating the saturation vapor pressure; however, this factor assumes a barometric pressure of at least 800 hPa. To accommodate lower pressures for high-elevation locations, we applied an enhancement factor that is a function of barometric pressure (second quantity of Eq. 2), following Equation 8 in Buck (1981, p. 1532)36.

$${e}_{sat}=\left(6.1121\cdot \exp \left[\frac{17.502\cdot \left({T}_{air}-273.15\right)}{{T}_{air}-32.18}\right]\right)\cdot \left(1.0007\cdot 3.46\cdot 1{0}^{-6}\cdot {P}_{air}\right)$$
(2)

Where Tair is the ambient temperature in Kelvin and Pair is the barometric pressure in hPa.

Second, we fixed a small error in the stability classes, which are used to estimate the 2-meter wind speeds when the input wind speeds are measured at a different height (as is the case in ERA5-Land, which reports 10-meter wind speeds). In the original algorithm, nighttime conditions with wind speeds between 2 and 2.5 m/s were given stability classes of “E” and “F” for positive and negative lapse rates, respectively; we changed this to the correct values of “D” and “E,” consistent with guidance from EPA documentation (see Table 6–7 in reference)37.

Third, we updated the algorithm for calculating the heat of vaporization to follow the approach of Meyra et al.38, which was found to be more accurate than the traditional Watson equation. Although we believe this to be a more-accurate method, we find that it typically changes the estimate of the wet-bulb temperature by less than 0.1 degrees Celsius.

Finally, we changed the minimum wind speed from 0.13 m/s to 0.5 m/s for a more-conservative estimate of WBGT that prevents unreasonably high estimates that result from very low wind speeds. Lemke and Kjellstrom (2012)33 recommend a more-conservative minimum wind speed of 1 m/s for assessing WBGT effects on outdoor workers, noting that typical bodily movement results in an apparent wind speed on the skin of at least 1 m/s. Our value of 0.5 m/s is a balance between these values and helps capture the WBGT for stationary individuals, as our index is not exclusively for outdoor workers. Note that this wind speed adjustment was done as a pre-processing step and that the minimum wind speed is only directly set to 0.5 m/s in the WBGT function of the heatmetrics R package when supplying wind speeds at a height other than two meters.

We conducted a sensitivity analysis to determine the impact that these changes to the Liljegren algorithm had on the final county-level mean values. We found that daily maximum WBGT estimates for August 2020 across all available CONUS counties using WBGT algorithms with and without the aforementioned alterations were extremely similar, with r2 = 99.99%, a mean absolute difference of 0.08 °C, and a maximum absolute difference of 0.25 °C.

Universal Thermal Climate Index (UTCI)

We calculated UTCI following the approach of Di Napoli et al.39, as implemented in the ECMWF python library, thermofeel40. This method uses the sixth-order polynomial regression approximation given by Bröde et al. (2011), which is a highly accurate approximation of UTCI with a root-mean square error of 1.1 degrees Celsius41. The equation takes as inputs the ambient temperature, 10-meter wind speed, vapor pressure, and mean radiant temperature (Tmrt). We calculated mean radiant temperature following the approach of Di Napoli et al.42, which approximates Tmrt using hourly measurements of total downward surface solar radiation (direct and diffuse), surface net solar radiation, downward surface thermal radiation, surface net thermal radiation, downward direct surface solar radiation, and cosine of the solar zenith angle (cza). As was done in Di Napoli et al.42, we calculated the average daytime cza in order to minimize errors that arise during sunrise and sunset hours, described comprehensively by Hogan and Hirahara43. We used this same integrated cza approach in the calculation of WBGT and note that, although it is the most-accurate approach, others have found that it actually has a very small impact (<0.01 °C on average) on the overall estimation of UTCI40. Although the Bröde et al. (2011) algorithm is suitable for wind speeds up to 30.3 m/s, we followed Di Napoli et al. (2021) in capping wind speeds at 17 m/s and marking as missing (“NA”) observations above this threshold, based on findings of extremely low UTCI values at these tropical-storm-force wind speeds44. Finally, consistent with recommendations in Bröde et al.41, we constrained the vapor pressure input to be consistent with a relative humidity of ≥5%, the lower bound for which their algorithm is validated: in cases where relative humidity was less than 5%, we set the vapor pressure equal to the saturation vapor pressure multiplied by 0.05. This had only a minimal impact on the final county UTCI values: 99.93% of county-day maximum UTCI values were completely unaffected. Among the county-days that did have the adjustment, the mean absolute difference in maximum UTCI was 0.06 °C and the maximum absolute difference was 0.58 °C.

Other heat metrics

Most of the other heat metrics in this data set use relatively straightforward equations (Table 2). The one exception is heat index: we calculated this variable using the weathermetrics R package7, which implements the approach to calculating heat index that is used by the National Weather Service. Note that, for consistency with how we calculated WBGT, we set the minimum wind speed to 0.5 m/s for all variable calculations that use wind speed.

Table 2 Summary of the methods used to calculate the heat metrics in the data set.

Calculating population-weighted county means

Daily heat metric values are reported as population-weighted county mean values. We used high-resolution (approximately 250 m × 250 m) population data from the Joint Research Centre (JRC) of the European Commission45 to calculate spatial weights for each ERA5-Land grid cell within each county based on the proportion of the county population residing in that grid cell. For example, if the sum of the high-resolution population points within a particular grid cell in County A were equal to 10% of the total county population, then that grid cell would get weighted 10% toward the overall county mean. Temperatures in the more densely populated parts of counties were thus weighted more heavily than the less-populous parts, resulting in metrics that are likely more relevant to population-based studies. To account for potential population shifts over the period analyzed, we used two sets of population estimates based on availability in the JRC data: we used 2000 population distributions for county means from 2000–2009 and 2015 populations for 2010–2020.

To account for missing data, we added flag variables to indicate whether the county estimates for a particular day were based on non-missing grid cells representing less than 50% of the population. This is applicable because ERA5-Land is available only over land areas in which the grid cell is comprised of no more than 50% ocean; this ensures that the meteorological data are representative of land areas, but it also means that some small island and coastal areas are excluded. An additional source of missing data is from particular hourly values being marked as NA (for example, hourly UTCI values when wind speeds exceed 17 m/s). A grid cell was marked as NA for a particular variable-day if fewer than 21 hourly observations were available. We calculated the percent of county populations that were represented by non-missing ERA5-Land data for every variable on every day and added flags as follows: “0” means the estimate is based on data representing ≥50% of the population, “1” covers 10–49% of the population, “2” covers <10%, and “3” means the data are completely missing (variable-days in this case are marked as NA). These flags affect only an extremely small portion of the data set: >99.7% of county-days have no flag, and only two counties in all of CONUS are missing entirely from the data set (Monroe County, Florida [containing the Key West archipelago] and Nantucket County, Massachusetts [containing the island of Nantucket]).

Data Records

The heatmetrics data are accessible via figshare46. The variables currently available for download are described in Table 3. At present, the data set includes population-weighted estimates at the county level, which can be queried using the state-county Federal Information Processing Standard (FIPS) identifier. We also provide a separate data set of unweighted county mean values, which were created by taking a simple average of all grid cells in a county, which are also available via figshare47. Variable names are the same between the two data sets, so users should take care to download the applicable file for their needs and rename variables as appropriate if using both data sets simultaneously.

Table 3 Description of variables available in the heatmetrics database.

Technical Validation

The heatmetrics data set employs existing algorithms and an established reanalysis product that have all been peer-reviewed and frequently cited in the literature. Please see accompanying references and citations therein for the input data set used, ERA5-Land, for model development and validation27. The WBGT algorithm used here is based on the Liljegren approach, which was found to be accurate to within 1 °C in the developers’ testing15, and independently verified as being the most accurate across different estimation methods33. Similarly, we followed the approach of Di Napoli et al.39, as implemented by Brimicombe et al.40 for operational distribution through the European Centre for Medium-Range Weather Forecasts (ECMWF); this algorithm employs the UTCI approximation reported by Bröde et al. (2011), which was found to have a root-mean square error of approximately 1.1 °C.

Disclaimers

This data set contains modified Copernicus Climate Change Service information (2022), as described and cited in the manuscript. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains. The data set and software are provided by the manuscript authors “as is” with no warranty of any kind.