Introduction

Understanding temperature variations across depths and geographies holds significance to many domains. Temperature is fundamental to modeling heat distribution, thermal anomalies, groundwater flow and management, volcanic eruptions, stresses and earthquakes, hydrocarbon maturity, carbon sequestration, subsurface energy storage, amongst others (Kukkonen et al. 1994; Chen et al. 2016; Vilarrasa and Rutqvist 2017; Head et al. 2003). In particular, accurate modeling of temperatures at significant depths within the Earth’s crust is central to the exploration and development of geothermal energy. A conventional geothermal resource requires three ingredients: (1) heat source, (2) fluid for heat transport, and (3) permeable/fractured rock for fluid flow. Other unconventional geothermal systems can also be viable based on site-specific circumstances (DiPippo 1991; Mock et al. 1997). Specifically, enhanced geothermal systems (EGS) only require sufficiently high-temperature resources as heat transport and fluid flow can be achieved through water injection and rock stimulation, respectively (Kana et al. 2015). Many recent efforts were focused on assessing the EGS resource potential across the United States (Blackwell et al. 2006; Augustine 2016; Augustine et al. 2019; Tester et al. 2006; Williams et al. 2008) using either regional or national temperature-at-depth maps.

Regional surface heat flow and three-dimensional temperature maps have been constructed for the Appalachian Basin (Frone et al. 2015; Smith 2019), Great Basin (Coolbaugh et al. 2005; DeAngelo et al. 2023; Burns et al. 2024; Ayling et al. 2022), Snake River Plain (Batir et al. 2020; Nielson et al. 2012), and Cascades (Frone et al. 2015; Blackwell et al. 1990). Methods typically included analytical or numerical models of conductive heat flow for one-dimensional vertical or two-dimensional cross-sectional domains, and the compilation of these to three-dimensional temperature maps. Whereas regional models are useful, they are spatially limited and would result in inconsistencies if concatenated. This prompts the need for national models to guide conventional and non-conventional geothermal exploration and development. A similar approach was generally followed to construct national surface heat flow and temperature maps (Blackwell et al. 2011; Morgan and Gosnold 1989; Lachenbruch and Sass 1977; Boyd 2019). Most notably, researchers at the Southern Methodist University (SMU) developed temperature-at-depth maps as part of a 2006 study on The Future of Geothermal Energy, mainly focused on assessing the potential of EGS across the United States (Blackwell et al. 2011; Tester et al. 2006; Blackwell et al. 2006). However, the SMU maps have several limitations:

  • They were developed at a gridding interval of 5 min (or, \(0.08333^{\circ }\)) of spatial resolution, which translates to an average spacing of about 8 km representing an area of about \(64 \ km^2\) per grid cell. A typical 250 MWe EGS plant might require about \(10-20\) \(km^2\) of reservoir planar area to accommodate the thermal resource needed, assuming that heat removal occurs in a 0.5-km-thick region of hot rock at depth (Tester et al. 2006). With such a large areal extent, a \(64 \ km^2\) grid cell is likely to filter out many local heat anomalies. These maps also ignore EGS resources shallower than 3 km entirely, and only model depth intervals between 3–10 km (Augustine 2016).

  • The analytical approach used in developing these temperature-at-depth maps requires knowledge of heat flow across the conterminous United States. Note that all hydrothermal system-influenced data (very high values, i.e., generally greater than \(120 \ mW/m^2\) were conveniently excluded in this process to avoid the difficulty involved in analytically modeling regions with hot fluid upflow without overestimating temperatures at neighboring sites. This inconveniently leads to smoothing many local heat anomalies, which developers would rather like to capture for economic EGS development.

  • In developing such heat flow maps, a minimum curvature algorithm was used. However, such interpolation algorithms (1) are less effective with irregularly spaced data points, and (2) tend to generate smooth fits that dismiss potential heat anomalies in undersampled regions.

  • SMU adopted a two-layer model where the sediment and basement contributions were captured separately. This requires knowledge of either unavailable/assumed or interpolated spatial properties across layers, such as rock thermal conductivity, measured sediment heat flow, basement heat flow, radioactive depth variable, and radioactive heat generation.

Interested in estimating shallow, low-temperature geothermal resources across the conterminous United States, researchers at the National Renewable Energy Laboratory (NREL) developed a temperature-at-depth model to capture depths shallower than \(3 \ km\) (Mullane et al. 2016). Approximately 300,000 bottomhole temperature (BHT) measurements were used in that study. To reduce the nugget effect associated with the variogram of the kriging algorithm used in that study, the bottomhole temperature observations were calibrated to the nearest \(500 \ m\) depth interval using geothermal gradients obtained from the 3.5-km temperatures originally predicted by the SMU temperature-at-depth maps. Bottomhole temperature measurements were aggregated into \(16 \ km^2\) grid cells using the median statistic, resulting in a reduced dataset with 71,000 records. Then, they fit a linear regression model to predict bottomhole temperature using (1) depth, and (2) the 3.5-km temperatures originally predicted by the SMU temperature-at-depth maps. Next, they applied ordinary kriging to the linear regression residuals, which were used to improve local fit. In addition to the aforementioned caveats associated with the SMU temperature-at-depth maps, these shallower maps lack spatial and depth-wise resolution due to downsampling real bottomhole temperature measurements. Our investigation shows that subtracting the 3-km temperature-at-depth map developed by NREL from the 3.5-km temperature-at-depth map developed by SMU results in an average difference of over \(-40^{\circ } \ C\).

Herein, we describe an alternative approach to developing a national thermal model of the Earth’s crust. Particularly, we aim to predict three quantities simultaneously: temperature-at-depth, surface heat flow, and rock thermal conductivity, from surface and down to \(7 \ km\) with \(18 \ km^2\) and \(1 \ km\) spatial and depth-wise resolutions. Our approach involved rigorous data aggregation and curation of various spatial and physical properties, such as thermal quantities, depth, geographic coordinates, elevation, sediment thickness, magnetic anomaly, gravity anomaly, gamma-ray flux of radioactive elements, seismicity, electrical conductivity, and proximity to faults and volcanoes. Integrated into a graph of points spread across the contiguous United States, this dataset was then fed into an Interpolative Physics-Informed Graph Neural Network (InterPIGNN) to interpolate between points of supporting data while approximating Fourier’s Law of heat conduction.

Data collection

Various physical quantities and data sources are required toward an accurate estimate of temperature-at-depth across the conterminous United States. Measurements span various geographic locations (Easting/Northing) and/or depth. We collected measurements for three thermal quantities: bottomhole temperature, heat flow, and rock thermal conductivity. Additionally, we allocated measurements with sufficiently high spatial resolution of other physical quantities which can be explicitly or implicitly correlated with temperature-at-depth, i.e., average surface temperature, elevation, sediment thickness, magnetic anomaly, gravity anomaly, uranium radiation, thorium radiation, potassium radiation, seismicity, electrical conductivity, and proximity to faults and volcanoes. Note that these measurements have various spatial resolutions, hence we downsampled/upsampled them to populate the contiguous United States with the target resolution of \(18 \ km^2\) using inverse distance weighting (Lu and Wong 2008). Note that all visualizations reported in this article were generated by the authors accordingly. In the following, we briefly describe the source and distribution of input quantities, and the reason for inclusion as input features.

Temperature, heat flow, and thermal conductivity

Constructing a three-dimensional thermal model of the subsurface requires measurements of thermal quantities, such as subsurface temperature, heat flow, and rock thermal conductivity. Bottomhole temperature measurements are acquired through direct wellbore temperature measurements at one or more depths. The collected temperature measurements could be associated with geothermal, oil & gas, groundwater, observation, and monitoring wells. After exhaustive search and curation of bottomhole temperature measurements, we identified nine different raw and aggregated data sources, which we integrated into a single database. These sources are SMU node of the National Geothermal Data System, Association of American State Geologists (ASSG), USGS, Utah Geological Survey (UGS), Colorado Geological Survey (CGS), Maine Geological Survey (MGS), Washington State Department of Natural Resources (WSDNR), Geothermal Information Layer for Oregon (GTILO), and Great Basin Center for Geothermal Energy (GBCGE) (SMU 2023; National Renewable Energy Laboratory 2023; Blackwell et al. 2011; Augustine 2013; Darton 1920; Blackett et al. 2013; Korosec and Kaler 1980; Black 1994; Bowen et al. 1978; Calvin 2010).

Bottomhole temperature measurements need to be corrected because of the temperature differences between the rock and the wellbore fluid (Jessop 1990). Due to mud circulation and borehole temperature disequilibrium, this thermal disturbance is not equally distributed along depth. Consequently, compared to the adjacent rock, deeper and shallower wellbore sections indicate lower and higher temperatures, respectively (Schumacher and Moeck 2020). Motivated by other studies on temperature modeling in the United States (Mullane et al. 2016; Elizondo et al. 2023; Blackwell et al. 2010), we used the Harrison correction (Harrison et al. 1982) to correct bottomhole temperature measurements. We found this method to be more suitable to this study as it can be used with only temperature and depth measurements available. Meanwhile, duplicated bottomhole temperature measurements across the different data sources were detected and filtered out based on the median statistic. Note that records are considered to be duplicates if they have the same depth, Easting, and Northing when rounded to the nearest 0.1 m, 0.1 km, and 0.1 km, respectively. We also applied threshold-based filters to only keep points with depth and temperature in the ranges of \(0-7 \ km\) and \(0-500 \ ^\circ C\), respectively. As shown in Fig. 1, our database included a total of 400, 134 unique bottomhole temperature measurements with minimum and maximum corrected bottomhole temperature measurements of 0.16 and \(360^{\circ } \ C\), respectively (going forward, we will drop the term “corrected” for brevity). Fig. 2 shows all data points projected on a two-dimensional plane and colored based on source and bottomhole temperature, respectively. Noting that bottomhole temperature measurements are conducted at specific depths is important to constructing a three-dimensional model of temperature.

Fig. 1
figure 1

Log-scale plot of bottomhole temperature record count per data source

Fig. 2
figure 2

Bottomhole temperature records projected on the conterminous United States, colored based on data source (SMU 2023; National Renewable Energy Laboratory 2023; Blackwell et al. 2011; Augustine 2013; Darton 1920; Blackett et al. 2013; Korosec and Kaler 1980; Black 1994; Bowen et al. 1978; Calvin 2010)

The SMU dataset also aggregated data for heat flow and rock thermal conductivity (Blackwell et al. 2011). As seen in Fig. 3, this dataset indicates heat flow estimates as low as nearly zero and as high as \(23,012 \ mW/m^2\) at Yellowstone, with a median of \(59 \ mW/m^2\). Generally, we observed several anomalous heat flow estimates around calderas, such as Yellowstone, Newberry, and Valles. Meanwhile, records indicate rock thermal conductivity values ranging between 0.33 and \(7.24 \ W/m-K\), with median of \(2.13 \ W/m-K\), as seen in Fig. 4. It is important to note that laboratory measurements of rock thermal conductivity \(\lambda ^{(lab)}\) do not reflect in situ conditions. Generally, rock thermal conductivity varies directly with pressure (or, depth) and inversely with temperature (Abdulagatov et al. 2006). Hence, we used the relation seen in Eq. 1 to account for changes in rock thermal conductivity \(\lambda\) with respect to temperature T and depth z. A typical value for the pressure coefficient c is \(1.5 \times 10^{-3} \ km^{-1}\) (Chapman 1986). Meanwhile, the temperature coefficient b can vary significantly across the crustal thickness. Typically, b is in the range of \(1.0 \times 10^{-4} \ C^{-1}\) and \(1.5 \times 10^{-3} \ C^{-1}\) (Chapman 1986). We chose b of \(1.5 \times 10^{-3} \ C^{-1}\) which we will justify in the "Physics-Informed Graph Neural Networks" Section where we discuss our algorithmic approach. With respect to our collected bottomhole temperature dataset, this choice of b and c resulted in variable correction of thermal conductivity, as seen in Fig. 5.

$$\begin{aligned} \lambda (T,z) = \lambda ^{(lab)} \frac{1+cz}{1+bT} \end{aligned}$$
(1)
Fig. 3
figure 3

Heat flow records for the conterminous United States, colored based on magnitude. Whereas the gathered heat flow data reached as high as \(23,012 \ mW/m^2\), we set \(300 \ mW/m^2\) as an upper threshold for visual convenience. Data were retrieved from the SMU node of the National Geothermal Data System (SMU 2023)

Fig. 4
figure 4

Thermal conductivity records for the conterminous United States, colored based on magnitude. Data were retrieved from the SMU node of the National Geothermal Data System (SMU 2023)

Fig. 5
figure 5

Laboratory measured values of thermal conductivity were corrected for in situ pressure and temperature conditions, and these values were used for interpolation to create in situ maps of thermal conductivity

We also collected surface temperature data based on mean air temperature maps developed by the National Weather Service for the period of 1990-2020 (National Weather Service 2020). As seen in Fig. 6, the average surface temperature ranged between \(-7\) and \(25^{\circ } \ C\), with mean and standard deviation of \(6.5\pm 7.6^{\circ } \ C\).

Fig. 6
figure 6

Average annual surface temperature for the period of 1990-2020 across the conterminous United States, colored based on magnitude. Data were retrieved from the National Weather Service (National Weather Service 2020)

Elevation and sediment thickness

Additionally, we gathered high-resolution maps depicting elevation and sediment thickness across the conterminous United States. Elevation and sediment thickness are indicative of erosion, sedimentation, and rock properties along depth which govern heat flow (Fukahata and Matsu’ura 2001; Kim et al. 2020). Elevation data were queried from the National Map’s Elevation Point Query Service offered by the USGS, which has an overall resolution of 0.53 m (The United States Geological Survay (USGS) 2023; Arundel et al. 2018). As seen in Fig. 7, elevation ranged between \(-84\) and \(4,192 \ m\), with mean and standard deviation of \(791\pm 726 \ m\). Sediment thickness data were acquired through the Oak Ridge National Laboratory Distributed Active Archive Center (2023). This dataset provides estimates of the thickness of the permeable layers above bedrock with spatial resolution of \(1 \ km^2\) based on input data for topography, climate, and geology. These data are modeled to represent estimated thicknesses by landform type for the geological present (Pelletier et al. 2016). Seen in Fig. 9, this dataset has a resolution of up to \(50 \ m\), where values of \(50 \ m\) indicate sediment thickness of \(\ge 50 \ m\) while lower values are indicative of the exact value estimated by the data source. DeAngelo et al. (2023) postulated that topography is a mixed signal for the Great Basin, with local topography related to geologic structure and regional trends is associated with crustal thickening or regional processes. For this reason, we included detrended elevation to emphasize local topography. We additionally included detrended sediment thickness to delineate structural variations at shallow depths. Detrended maps for both elevation and sediment thickness are seen in Figs. 8 and 10. We followed a simple procedure using a \(30 \ km \ \times \ 30 \ km\) moving window to compute a trend map for each quantity, which was then subtracted from the corresponding raw map to finally arrive at the detrended map.

Fig. 7
figure 7

Elevation records for the conterminous United States, colored based on magnitude. Elevation data were queried from the National Map’s Elevation Point Query Service offered by the USGS (The United States Geological Survay (USGS) 2023; Arundel et al. 2018)

Fig. 8
figure 8

Detrended elevation map for the conterminous United States, computed using a \(30 \ km \ \times \ 30 \ km\) moving window to emphasize variations in local topography

Fig. 9
figure 9

Sediment thickness records for the conterminous United States, colored based on magnitude. Values of 50m indicate sediment thickness of \(\ge 50 m\) while lower values are indicative of exact value estimated by the data source. Sediment thickness data were acquired through the Oak Ridge National Laboratory Distributed Active Archive Center (2023)

Fig. 10
figure 10

Detrended sediment thickness map for the conterminous United States, computed using a \(30 \ km \ \times \ 30 \ km\) moving window to delineate structural variations at shallow depths

Magnetic and gravity anomalies

Temperature variations affect rock magnetic properties, hence the latter could be a useful input feature to our model (Kars et al. 2023). Additionally, gravity was found to correlate well with geothermal gradients and assist in estimating sediment thickness, which in turn is correlated with heat flow (Atef et al. 2016). Magnetic and gravity anomalies were acquired through the USGS Mineral Resources Online Spatial Data (The United States Geological Survay (USGS) 2023). This data is a result of a joint effort between the USGS, the Geological Survey of Canada, and Consejo de Recursos Minerales of Mexico (Bankey et al. 2002). The airborne measurement of the Earth’s magnetic field over North America describes the magnetic anomaly caused by variations in Earth materials and structure. We accessed the corrected aeromagnetic grid data with a resolution of \(1 \ km^2\) per grid cell. Seen in Fig. 11, magnetic anomalies for the conterminous United States ranged between \(-1,513\) to \(2,614 \ nanoteslas\), with mean and standard deviation of \(41\pm 206 \ nanoteslas\). Bouguer gravity anomaly data were based on National Information Mapping Agency files, which were gathered and curated by the USGS with a spatial resolution of nearly \(10 \ km^2\) per grid cell. Gravity anomalies are produced by density variations within the rocks of the Earth’s crust and upper mantle. The free-air correction reduces the measurement to sea level by assuming there is no intervening mass as a uniform slab of constant density, and the Bouguer correction includes the effects of constant density topography within 166.7 km of the measurement location (Kucks 1999). Seen in Fig. 12, Bouguer gravity anomaly ranged between \(-224\) and \(105 \ milligal\), with mean and standard deviation of \(-8.9\pm 19.8 \ milligal\).

Fig. 11
figure 11

Magnetic anomaly for the conterminous United States, colored based on magnitude. Whereas the gathered magnetic anomaly data ranged between \(-1399\) and \(2418 \ nanoteslas\), we set \(-1000\) and \(1000 \ nanoteslas\) as lower and upper thresholds, respectively, for visual convenience. Magnetic anomaly data were acquired through the USGS Mineral Resources Online Spatial Data (The United States Geological Survay (USGS) 2023)

Fig. 12
figure 12

Bouguer gravity anomaly for the conterminous United States, colored based on magnitude. Gravity anomaly data were acquired through the USGS Mineral Resources Online Spatial Data (The United States Geological Survay (USGS) 2023)

Radioactive elements

Decay of radioactive isotopes is a major source of heat within the Earth (Baumgardner 2000). Aerial gamma-ray surveys measure the gamma-ray flux produced by radioactive decay of the naturally occurring elements K-40, U-238, and Th-232 in the top few centimeters of rock or soil. With a spatial resolution of nearly \(2.5 \ km^2\), the USGS aggregated and curated aeroradiometric data for the conterminous United States, which originated to the National Uranium Resource Evaluation Program of the United States Department of Energy (Duval et al. 2005; Hill et al. 2009). Radioactive decay of K-40, U-238, and Th-232 data are reported in the units of percent potassium (% K), parts per million equivalent thorium (ppm eTh), and parts per million equivalent uranium (ppm eU), respectively. Seen in Figs. 13, 14, and 15, radioactive decay of K-40, U-238, and Th-232 shows mean and standard deviation of \(1.17\pm 1.53 \ \% \ K\), \(1.72\pm 1.65 \ ppm \ eU\), and \(6.19\pm 3.64 \ ppm \ eTh\), respectively.

Fig. 13
figure 13

K-40 radioactive decay for the conterminous United States, colored based on magnitude. Whereas measurements indicated values as high as \(100 \ \% K\), we set \(5 \ \% K\) as an upper threshold for visual convenience. Data were retrieved from the National Uranium Resource Evaluation Program of the United States Department of Energy (Duval et al. 2005; Hill et al. 2009)

Fig. 14
figure 14

U-238 radioactive decay for the conterminous United States, colored based on magnitude. Whereas measurements indicated values as high as 264 ppm eU, we set 5 ppm eU as an upper threshold for visual convenience. Data were retrieved from the National Uranium Resource Evaluation Program of the US Department of Energy (Duval et al. 2005; Hill et al. 2009)

Fig. 15
figure 15

Th-232 radioactive decay for the conterminous United States, colored based on magnitude. Whereas measurements indicated values as high as 359 ppm eTh, we set 30 ppm eTh as an upper threshold for visual convenience. Data were retrieved from the National Uranium Resource Evaluation Program of the United States Department of Energy (Duval et al. 2005; Hill et al. 2009)

Seismicity and electrical conductivity

Studies indicate that seismic wave velocity is inversely correlated with rock temperature, while electrical conductivity is proportionally correlated with rock thermal conductivity (Poletto et al. 2018); Schwarz and Bertermann (2020). Seismic wave velocity and electrical conductivity are highly affected by the subsurface rock and fluid properties, and in situ conditions (e.g., porosity, pore pressure, temperature, elastic moduli). Generally, with increasing in situ temperatures, seismic wave velocity and thermal conductivity decrease (Poletto et al. 2018; Yoshino 2011). We acquired seismic and electrical conductivity measurements nationwide through the USArray project, which deployed transportable seismic and magnetic stations across the United States over years to passively capture the Earth’s seismic activities and magnetic field. We particularly used inverted and processed maps generated by various studies based on data captured by this project, including the average geothermal gradient at the mantle (SAGE Facility 2023; Shinevar et al. 2023), crustal thickness (SAGE Facility 2015; Schmandt et al. 2015), compressional-shear wave velocity ratio (SAGE Facility 2016; Shen and Ritzwoller 2016), velocity perturbation (SAGE Facility 2020; Golos et al. 2020), and electrical conductivity (SAGE Facility 2023; Murphy et al. 2023). Figs. 16, 17, and 18 demonstrate some of these properties after we upsampled them to our target spatial resolution of \(18 \ km^2\). A depth of \(2 \ km\) was particularly visualized for compressional wave velocity because it showed several highly porous sedimentary basins where seismic velocity was inversely correlated with rock porosity, which demonstrates how seismic velocity is driven by various factors besides temperature. While properties vary with depth (e.g., seismic velocities, velocity ratio, electrical conductivity), we additionally normalized each quantity along depth to eliminate its naturally increasing or decreasing trend with depth.

Fig. 16
figure 16

Average geothermal gradient at a mantle depth of \(60 \ km\) across the contiguous United States, as estimated using seismic measurements. Data were retrieved from the USArray project (SAGE Facility 2023; Shinevar et al. 2023)

Fig. 17
figure 17

Crustal thickness across the contiguous United States as estimated using seismic measurements. Data were retrieved from the USArray project (SAGE Facility 2015; Schmandt et al. 2015)

Fig. 18
figure 18

An example of the compressional wave velocity from a depth of \(2 \ km\). Compressional wave velocity to a depth of \(7 \ km\) km was used, providing a three-dimensional input to our model. Data were retrieved from the USArray project (SAGE Facility 2016; Shen and Ritzwoller 2016)

Faults and volcanoes

Faults and fractures serve as pathways for the movement of thermal fluids from deep within the Earth’s crust to the surface, creating hotspots of geothermal activity. For instance, in the Basin and Range Province, faulting resulted in permeable pathways for convective heat flow (Faulds et al. 2013). Similarly, volcanic areas such as those in the Cascade Range are associated with magmatic intrusions that heat surrounding rock and groundwater, forming high-enthalpy geothermal reservoirs (Blackwell et al. 1990). We acquired locations of faults in the United States through the USGS Earthquake Hazards Program (2024). Meanwhile, locations of volcanoes in the United States were determined using a global dataset published by the National Centers for Environmental Information (2024). We processed both datasets to create two feature layers representing the Euclidean distance to the nearest fault and volcano in meters, as seen in Figs. 19 and 20, respectively. The distribution of distances to faults and volcanoes is naturally skewed (i.e., most faults and volcanoes are in the Western region), hence we applied a logarithmic transformation to treat skewness and approximately conform to normality.

Fig. 19
figure 19

Log-transformed Euclidean distance to the nearest fault across the United States

Fig. 20
figure 20

Log-transformed Euclidean distance to the nearest volcano across the United States

Methods

Irregular three-dimensional point clouds are typically unordered, sparse, and unevenly distributed. They are often associated with point-level features but with various spatial resolutions (e.g., sensor data where measurements are taken at different three-dimensional locations). We seek to interpolate irregularly spaced points onto a regular three-dimensional grid. We use methods involving graphical data structures (i.e., graph-based) to supplement our desired variables (i.e., temperature-at-depth, surface heat flow, and rock thermal conductivity) with other correlated variables measured or estimated on regular three-dimensional grids (e.g., geophysical property maps).

Popular three-dimensional spatial interpolation techniques include nearest neighbors (Rukundo and Cao 2012), inverse distance weighting (Lu and Wong 2008), cokriging (Nerini et al. 2010), radial basis function kernels (Bhatia and Arora 2016; Mo et al. 2022), and spline methods (Hutchinson 1995; Wahba 1990). Each of these methods suffer one or more caveats, e.g., discrete and abrupt surfaces in the presence of dense spatial measurements, smoothing, dismissal of spatial structure information and correlation, linearity, lack of sufficient non-linearity, use of feature engineering and heuristics, and local trends (Qiao et al. 2019; Li and Heap 2014). Multiple deep learning architectures were proposed to treat irregularly spaced points for various tasks, including classification, regression, segmentation, amongst others. They can be lumped into three categories: (1) point-based methods (e.g., PointNet, PointNet++, PointWeb) (Qi et al. 2017a, 2017b; Zhao et al. 2019; (2) convolution-based methods (e.g., Relation-shape CNN, DensePoint, Pointwise-CNN) (Liu et al. 2019a, 2019b; Hua et al. 2018; (3) registration methods (e.g., PPFNet33, FoldNet, PointNetLK) (Besl and McKay 1992; Deng et al. 2018a, 2018b; Gojcic et al. 2019; Choy et al. 2019; Aoki et al. 2019); and (4) graph-based methods (e.g., Edge-Conditioned Convolution, EdgeConv, 3DGCN) (Simonovsky and Komodakis 2017; Wang et al. 2019; Lin et al. 2020). Whereas these architectures can input irregular three-dimensional point clouds, they are not particularly designed for interpolation where the target quantity can explicitly inform predictions at neighboring grid cells. Additionally, the presence of extensive sparsity in point cloud data makes it challenging to generalize spatially. We treated these challenges using a novel Interolative Physics-Informed Graph Neural Network (InterPIGNN). The interpolative aspect of this architecture incorporates a graph-based operator which, unlike convolutional neural networks, does not require input data to lie on regular grids. Graph-based architectures are capable of extracting point cloud features without the need for heuristically projecting them on a regular grid. Meanwhile, the physics-informed aspect involves satisfying physical laws related to the problem of interest. In this section, we describe the InterPIGNN module along with the data preprocessing and modeling steps. We conveniently use the three-dimensional thermal Earth modeling task as an example interpolation task to further motivate the InterPIGNN module.

Interpolative convolutional module

The InterNet convolutional module extends a classical graph neural network (GNN) to spatially interpolate irregularly spaced measurements of one or more properties (e.g., temperature) while using supplemental point and gridded measurements or estimates of other quantities (i.e., correlated datasets) to inform the interpolation. It particularly augments the EdgeConv architecture (Wang et al. 2019) by incorporating interpolative operations suitable for our application. In three-dimensional point cloud interpolation tasks across space of interest, \(\Omega\), a target quantity (or, multiple target quantities) y is measured at a sparsely distributed finite set of points, which we denote as the set of anchor nodes \(V_a\) with \(|V_a| = n_a\), where \(|\cdot |\) denotes the size of a set. The objective is to predict or interpolate the target quantity at a finite set of desired points (regular or irregular) sampled from \(\Omega\), which we denote as the set of grid nodes \(V_g\) with \(|V_g| = n_g\). We denote the union set of the anchor and grid nodes as \(V = V_a \bigcup V_g\) with \(|V| = n_a + n_g = n\). Each node \(v \in V\) is associated with a position vector \(\varvec{p_v} \in \Omega\) (in three-dimensional point clouds, \(\Omega = {\mathbb {R}}^3\)).

We convert the point cloud into a directed graph representation \(G = (V, E)\) by constructing a set of directed edges \(E \subseteq V_a \times V\) with \(|E| = m\). A directed edge \(e_{uv}\) is constructed between nodes \(u \in V_a\) and \(v \in V\) if \({\varvec{p}}_u\) belongs to (1) the k-nearest neighbors of \({\varvec{p}}_v\), and/or (2) a sphere centered at \({\varvec{p}}_v\) with radius r, where k and r are hyperparameters. An edge \(e_{uv}\) has edge weight \(w_{uv}\) defined as the inverse Euclidean distance and normalized such that \(\sum _{u \in {\mathcal {N}}(v)} w_{uv} = 1\), where \({\mathcal {N}}(v)\) is the set of neighbors of the node v.

Following the GNN framework of message passing, aggregation, and combination, we define our InterPIGNN convolutional module which can be used flexibly as a nonlinear preprocessor or transformer in any deep learning algorithm inputting point clouds. Consider an GNN architecture based on InterPIGNN with L neural layers. At layer \(l=0\), the hidden state of each node \(v \in V\) is initialized using the raw feature vector, \({\varvec{h}}_v^{(0)} = {\varvec{x}}_{v}\). We also assign each edge \(e_{uv}\) an attribute \({\varvec{a}}_{uv}^{(l)} = {\varvec{h}}_u^{(l)} - {\varvec{h}}_v^{(l)}\) that is updated across convolutional layers. For a target node \(v \in V\), the InterPIGNN module, seen in Equation 3, consists of four neural networks: (1) root network \(f_{\theta _r}\) to convolve root features; (2) neighbor network \(f_{\theta _n}\) to transform each individual message passed by neighboring nodes including features, positions, edge attributes, and labels; and (3) global network \(f_{\theta _g}\) to transform aggregated messages; and (4) post-processing network \(f_{\theta _p}\) to jointly transform the outputs of the root and global networks. Note that \(\parallel\) represents the vector concatenation operator. Also, note that permutation-invariant aggregation is achieved in the average pooling operation. The final hidden \({\varvec{h}}_v^{(L)}\) for each node \(v \in V\) can then be used for the desired task:

$$\begin{aligned} {\varvec{h}}_v^{(l+1)}= f_{\theta _p} \Bigg [f_{\theta _r}({\varvec{h}}_v^{(l)}) + f_{\theta _g}\Bigg (\frac{1}{|{\mathcal {N}}(v)|} \sum _{u \in {\mathcal {N}}(v)} \Big [w_{uv} \cdot f_{\theta _n}({\varvec{x}}_u \parallel {\varvec{p}}_v \parallel {\varvec{a}}_{uv}^{(l)} \parallel y_u)\Big ]\Bigg )\Bigg ], \end{aligned}$$
(2)
$$\begin{aligned} {\varvec{a}}_{uv}^{(l)}= {\varvec{h}}_{u}^{(l)} - {\varvec{h}}_{v}^{(l)}. \end{aligned}$$
(3)

Three-dimensional heat conduction

Temperature within the Earth’s subsurface is governed by a complex interplay of physical processes, mainly thermal conduction, which drives heat transfer through the subsurface materials. A significant contribution also comes from the radioactive decay of isotopes such as uranium, thorium, and potassium, which generates heat. In addition, convective movements of magma and groundwater can redistribute heat vertically and horizontally through advection, altering temperature profiles. The latent heat associated with phase changes of water and tectonic activities, such as volcanic upwelling, further influence subsurface temperatures (Jaupart et al. 2007). Collectively, these factors establish the thermal state of the subsurface, varying significantly with geographic location, depth, and local geophysical and hydrological conditions.

Using the collected data for this study, we could satisfy the steady-state three-dimensional heat conduction in the Earth’s subsurface, which is represented by a partial differential equation seen in Eqs. 4 and 5 (Cengel et al. 2012; Mareschal and Jaupart 2013). Subsurface temperature T, rock thermal conductivity \(\lambda\), crustal heat flow \(Q_c\), and local heat generation \({\dot{g}}\) vary spatially according to easting x, northing y and depth z. Meanwhile, surface heat flow Q is only a function of x and y:

$$\begin{aligned}{} & {} \frac{\partial }{\partial x} \Big (\lambda \frac{\partial T}{\partial x}\Big ) + \frac{\partial }{\partial y} \Big (\lambda \frac{\partial T}{\partial y}\Big ) + \frac{\partial }{\partial z} \Big (\lambda \frac{\partial T}{\partial z}\Big ) + {\dot{g}} = 0, \end{aligned}$$
(4)
$$\begin{aligned}{} & {} Q_c(z) = Q - \int _{0}^{z} {\dot{g}} \ dz'. \end{aligned}$$
(5)

Physics-informed graph neural networks

Anchor nodes were defined as points where measurements were available for one or more of the target thermal quantities. We additionally constructed a total of \(n_g = 4,279,536\) grid points to cover the desired spatial extent of our task. Edges were constructed following the k-nearest neighbors strategy with \(k=5\). The three-dimensional position vector \({\varvec{p}}\) was defined using depth, Easting, and Northing. Node labels y were set to the corresponding measurements of thermal quantities at anchor nodes, but they are unknown at grid nodes.

Anchor nodes were split into \(80\%\), \(10\%\), and \(10\%\) for training, validation, and testing, respectively. To decorrelate spatial features and ensure proper model evaluation, we split the data using three-dimensional cubic blocks, each with length of \(10 \ km\). Feature vectors were normalized using the training split mean \(\varvec{\mu }\) and standard deviation \(\varvec{\sigma }\) statistics, seen in Equation 6. Toward satisfying the three-dimensional conductive heat transfer laws, we created three GNNs, namely \(T_{\theta }\), \(Q_{\theta }\), and \(\lambda _{\theta }\) to predict subsurface temperature, heat flow, and thermal conductivity simultaneously. Each of these networks utilized the interpolative convolutional module where \(f_{\theta _r}\), \(f_{\theta _n}\), \(f_{\theta _g}\), and \(f_{\theta _p}\) were each chosen as two-layer perceptrons with 128 and 64 neurons. To ensure proper first and second gradient flow when training InterPIGNN, we used Mish as an activation function. It has an unbounded positive domain which alleviates issues of gradient saturation, a bounded negative domain which enhances generalization, and a continuously differentiable range (Misra 2019). Moreover, we incorporated two types of dropouts: (1) traditional network neuron dropout of \(10\%\) to encourage generalization across input samples (Srivastava et al. 2014), and (2) graph edge dropout of \(1\%\) to encourage generalization across neighboring nodes (Papp et al. 2021).

Whereas \(Q_{\theta }\) and \(\lambda _{\theta }\) were each designed to output a single quantity, i.e., \({\hat{Q}}\) and \({\hat{\lambda }}\), respectively, \(T_{\theta }\) involved a final node-level feedforward neural layer to output two quantities: average surface-to-depth geothermal gradient \({\hat{g}}\), and surface temperature correction \(\Delta {\hat{T}}_0\). Consequently, the predicted subsurface temperature at depth z was computed as \({\hat{T}}(z) = T_0 + \Delta {\hat{T}}_0 + {\hat{g}} \cdot z\), where \(T_0\) was surface temperature as seen in Fig. 6. We also enforced positivity using the absolute value of all network outputs, such that \({\hat{T}}, {\hat{Q}}, {\hat{\lambda }} \ge 0\).

$$\begin{aligned} \varvec{{\overline{x}}} = \frac{{\varvec{x}} - \varvec{\mu }}{\varvec{\sigma }} \end{aligned}$$
(6)

In this physics-informed supervised regression task, we incorporated multiple loss terms. Temperature loss \(L_T\), heat flow loss \(L_Q\), and thermal conductivity loss \(L_{\lambda }\) were used to learn the measured data of thermal quantities, seen in Eqs. 7, 8, and 9, where \(V_{a}^{(train)}\) represented the training anchor nodes. Note that \(\lambda ^{(lab)}\) was only used as a lower bound where the model was not penalized during training for \(\lambda _v^{(lab)} \le {\hat{\lambda }}_v \le \lambda _v\). Coupling this formulation with a large b, seen in Eq. 1, we only set a relatively relaxed lower bound \(\lambda _v^{(lab)}\) for nodes \(v \in V_{a}^{(train)}\) to avoid making assumptions in correcting rock thermal conductivity measurements from laboratory to in situ conditions. Considering a finite and homogeneous element (i.e., fixed average thermal conductivity \({\overline{\lambda }}\)) in three-dimensional space at depth z, we combined Eqs 4 and 5 to quantify the deviation from the three-dimensional conductive heat transfer as a loss term \(L_{pde}\), seen in Eq. 10. Additionally, we noted that deep subsurface temperatures generally increase along depth given intervals of hundreds of meters (except for rare scenarios, such as subduction zones and deep aquifers undergoing convection to shallower depths (Ziagos and Blackwell 1986)), hence we introduced a loss term \(L_{gg}\) to learn positive geothermal gradients, seen in Eq. 11. Furthermore, we incorporated a boundary condition loss term \(L_{bc}\), seen in Eq. 12, to satisfy that the predicted heat flow across depths is greater than the minimum heat flow at the Moho layer, which was estimated as the product of the upper mantle thermal conductivity \(\lambda _{um}\) and \(60 \ km\) geothermal gradient \(\Big (\frac{\partial T_v}{\partial z}\Big )_{60km}\) seen in Fig. 16. Whereas studies estimated that the upper mantle thermal conductivity ranges between \(4-7 \ W/m-K\) (Grose and Afonso 2019; Gubbins and Herrero-Bervera 2007), we set it to \(4 \ W/m-K\) to approximate a lower bound of the Moho heat flow. This was a soft inequality constraint that was minimized when the predicted heat flow was greater than the estimated minimum Moho heat flow. Across all loss terms, derivatives were computed using automatic differentiation of neural networks (Baydin et al. 2018). Finally, the total loss L was computed as the weighted sum using hyperparameters \(\{\gamma _T, \gamma _Q, \gamma _\lambda , \gamma _{pde}, \gamma _{gg}, \gamma _{bc}\}\), seen in Eq. 13. Minimizing multi-objective loss functions is an inherently challenging aspect of physics-informed neural networks. In this work, we alleviated this challenge using a self-adaptive algorithm called ReLoBRaLo, relative loss balancing with random lookback (Bischof and Kraus 2021) (see Appendix A1).

$$L_{T} = {\mathbb{E}}_{{v\sim V_{a}^{{(train)}} }} \left[ {\left( {\hat{T}_{v} - T_{v} } \right)^{2} } \right],$$
(7)
$$L_{Q} = {\mathbb{E}}_{{v\sim V_{a}^{{(train)}} }} \left[ { {\left(\hat{Q}_{v} - Q_{v} \right)^{2} } } \right],$$
(8)
$$L_{\lambda } = {\mathbb{E}}_{{v\sim V_{a}^{{(train)}} |\;\hat{\lambda }_{v} \in \left[\lambda _{v}^{{(lab)}} ,\lambda _{v} \right]}} \left[ {\left( {\hat{\lambda }_{v} - \lambda _{v} } \right)^{2} } \right],$$
(9)
$$\begin{aligned}{} & {} \begin{aligned} L_{pde}&= {\mathbb {E}}_{v \sim V_g} \Bigg [\Bigg (\overline{\lambda _v} \Big (\frac{\partial {\hat{T}}_v}{\partial x} + \frac{\partial {\hat{T}}_v}{\partial y} + \frac{\partial {\hat{T}}_v}{\partial z}\Big ) + {\hat{Q}}_v \\&\quad + \int _{0}^{z_v} \Bigg (\frac{\partial }{\partial x} \Big ({\hat{\lambda }}_v \frac{\partial {\hat{T}}_v}{\partial x}\Big ) + \frac{\partial }{\partial y} \Big ({\hat{\lambda }}_v \frac{\partial {\hat{T}}_v}{\partial y}\Big ) + \frac{\partial }{\partial z'} \Big ({\hat{\lambda }}_v \frac{\partial {\hat{T}}_v}{\partial z'}\Big )\Bigg ) dz'\Bigg )^2 \Bigg ],\\ \end{aligned} \end{aligned}$$
(10)
$$\begin{aligned} L_{gg}= {\mathbb {E}}_{v \sim V_g} \Big [ \max \Big (-\frac{\partial {\hat{T}}_v}{\partial z}, 0\Big )^2\Big ], \end{aligned}$$
(11)
$$\begin{aligned} L_{bc}= {\mathbb {E}}_{v \sim V_g} \left [ \max \left ( \lambda _{um} \cdot \left (\frac{\partial T_v}{\partial z}\right )_{60\, \text{km}} - {\hat{Q}}_v, 0\right )^2\right ], \end{aligned}$$
(12)
$$\begin{aligned} L= \gamma _T L_T + \gamma _Q L_Q + \gamma _{\lambda } L_{\lambda } + \gamma _{pde} L_{pde} + \gamma _{gg} L_{gg} + \gamma _{bc} L_{bc}. \end{aligned}$$
(13)

In this study, we minimized L using the Adam optimizer (Kingma and Ba 2014) for a total of 400 epochs, with initial learning rate of 0.001, which was reduced when the validation loss stopped to improve. Unlike uncorrelated data structures, records (i.e., nodes) are interdependent in graph settings, where the prediction for a given node is affected by its neighborhood. In sampling a batch of nodes, we created a node-induced subgraph. However, especially in large and sparse graphs, it is very likely that the neighborhood of the sampled nodes would be misrepresented in this subgraph. Consequently, the InterPIGNN model would not be able to properly leverage neighborhood information during training, which would hinder the learning process. Our graph was fairly large with \(|V| = n = 4,679,670\) nodes and \(|E|=23,398,350\) edges, so it was extremely sparse (i.e., graph density is defined as \(|E|/(|V|(|V|-1))\)). To resolve this challenge, we further included the neighborhood of those already randomly sampled batch nodes. Finally, we used Monte Carlo Dropout (Gal and Ghahramani 2016) to quantify model uncertainty (see Appendix A2) and Integrated Gradients (Sundararajan et al. 2017) to infer which features were most important to the model predictions of the target thermal quantities (see Appendix A3).

Results

We first examined the performance of our InterPIGNN model compared to the combined SMU/NREL temperature-at-depth model, linear regression, feedforward neural network, and EdgeConv. Except for the combined SMU/NREL model, all of these models made use of the physical quantities described in this study. Whereas these algorithms have inherently different architectures, we ensured that they were trained and tested using the same preprocessing and postprocessing steps. Seen in Fig. 21, we compared the different models using the mean absolute error based on the \(10\%\) hold-out test set and found that the combined SMU/NREL model, linear regression, feedforward, EdgeConv, InterPIGNN achieved 49.5, 13.0, 6.5, 5.7, and \(6.4^\circ C\), respectively. When discarding the physics-informed loss terms, our InterNet model had mean absolute error of \(4.8^\circ C\) but, despite honoring the data most closely compared to the other models, it did not generalize well across the United States, which had originally prompted the need for incorporating physical laws in this work (see the "Discussion" Section). InterPIGNN simultaneously generated predictions of surface heat flow and rock thermal conductivity, which were found to have mean absolute errors of \(6.9 \ mW/m^2\) and \(0.04 \ W/m-K\), respectively (note that the rock thermal conductivity error was reported with respect to corrected measurements assuming b of \(1.5 \times 10^{-3} \ C^{-1}\)). Meanwhile, our model closely satisfied Fourier’s Law of conductive heat transfer with mean absolute error of \(0.7 \ mW/m^2\).

Fig. 21
figure 21

Comparison of the mean absolute error of the predicted temperature-at-depth on the \(10\%\) hold-out test set for different algorithms (smaller values are better)

Fig. 22 shows the predicted surface heat flow across the contiguous United States. Heat flow values were limited to a range of \(0-150 \ mW/m^2\) for visual convenience. Whereas the SMU maps conveniently excluded heat flow measurements greater than \(120 \ mW/m^2\), our model overcame this limitation by incorporating all heat flow measurements. The minimum, mean, and maximum predicted surface heat flow values were \(28.9 \ mW/m^2\), \(73.6 \ mW/m^2\), and \(377.1 \ mW/m^2\), respectively. To account for outliers, we computed the median surface heat flow of \(67.2 \ mW/m^2\), which we found to be closely comparable to the global average continental heat flow of \(65 \ mW/m^2\) (Pollack et al. 1993). Figs. 40, 41, 42, 43, 44, 45, and 46 show the spatial spread of thermal conductivity predictions. For visual convenience, values were bounded to a range of \(1.0-5.0 \ W/m-K\). The minimum, mean, and maximum predicted rock thermal conductivity values were \(1.08 \ W/m-K\), \(2.25 \ W/m-K\), and \(5.05 \ W/m-K\), respectively.

Fig. 22
figure 22

Predicted surface heat flow map. Heat flow values were limited to a range of \(0-150 \ mW/m^2\) for visual convenience

Figs. 23, 24, 25, 26, 27, 28, and 29 show the predicted temperature-at-depth across the conterminous United States for depths of 1-7 km, respectively. Temperatures were limited to a range of \(25-300^\circ C\) for visual convenience. The greatest temperature predicted at a depth of 7 km was \(485.5^\circ C\) at the Roosevelt Hot Springs area in Utah. As anticipated, multiple high-temperature spots were also observed around volcanic and thermally active areas, such as Yellowstone Caldera, Valles Caldera in New Mexico, La Garita Caldera in Colorado, Newberry Volcano in Oregon, The Great Basin in Nevada and neighboring states, The Geysers in California, Coso Volcanic Field in California, Salton Buttes in California, amongst others. Seen in Figs. 30, 31 and 32, our model resulted in physical predictions with generally increasing temperatures along depth and declining geothermal gradients.

Fig. 23
figure 23

Predicted temperature-at-depth map at a depth of 1 km

Fig. 24
figure 24

Predicted temperature-at-depth map at a depth of 2 km

Fig. 25
figure 25

Predicted temperature-at-depth map at a depth of 3 km

Fig. 26
figure 26

Predicted temperature-at-depth map at a depth of 4 km

Fig. 27
figure 27

Predicted temperature-at-depth map at a depth of 5 km

Fig. 28
figure 28

Predicted temperature-at-depth map at a depth of 6 km

Fig. 29
figure 29

Predicted temperature-at-depth map at a depth of 7 km

Fig. 30
figure 30

Subsurface temperature profiles across grid nodes \(V_g\) (gray lines) and on average (red line)

Fig. 31
figure 31

Average geothermal gradient profiles across grid nodes \(V_g\) (gray lines) and on average (red line). This quantity represents the average over all geothermal gradients from surface to the corresponding depth, yielding smoother trends across locations

Fig. 32
figure 32

Local geothermal gradient profiles across grid nodes \(V_g\) (gray lines) and on average (red line). Herein, the local geothermal gradient reflects variations across locations for \(1-km\) depth intervals

With uncertainties surrounding subsurface physical quantities, we used Monte Carlo Dropout to estimate the model epistemic uncertainty associated with our predictions of temperature, surface heat flow, and rock thermal conductivity. Fig. 33 shows the distribution of uncertainty across our target thermal quantities. the median uncertainty in temperature, surface heat flow, and rock thermal conductivity was \(9.63\%\), \(8.04\%\), and \(3.71\%\), respectively. Uncertainty in temperature predictions followed a bimodal distribution such that the near-zero mode corresponded to zero-depth predictions, which were easier to capture since air temperatures were used as input to our model. Uncertainties were also computed at each grid node in \(V_g\) and conveniently visualized as spatial maps as seen in Figs. 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, and 61. For visual convenience, temperature and surface heat flow uncertainties were bounded to a range of \(0-50^\circ C\) and \(mW/m^2\), respectively, while the uncertainty in rock thermal conductivity was bounded to a range of \(0.0-0.5 \ W/(m \cdot C)\). We generally observed that temperature uncertainty increased on average with depth, which was attributed to the increasing sparsity of bottomhole temperature measurements along depth.

Fig. 33
figure 33

Temperature-at-depth uncertainty distribution as modeled using Monte Carlo Dropout

To verify the model performance and demonstrate its utility, we evaluated its predictions against measured temperature log data. This involved 15 wells from multiple geothermal projects: Utah FORGE wells 58-32, 78-32, OH1, OH4, and Acord 1-26, Fervo Energy well Frisco-1, Cornell University Borehole Observatory (CUBO) well, Fallon Forge 88-24, Snake River Wells WO-2 and Camas-1, HOTSPOT Kimberly well, Paynton, Texas well, California well 5190016, Coso Forge well 83-11, and Brady’s Hot Springs well SP-2 (Allis et al. 2018; Cladouhos et al. 2015; Purwamaska and Fulton 2023; Podgorney 1991; Blankenship 2017; Shervais et al. 2013). In the case of temperature logs which were run soon after well drilling and completion, we applied the Harrison correction (Harrison 1983) to account for the disequilibrated thermal conditions in the wellbore as a result of drilling and completion fluid circulation. It is important to note that these corrections are correlations derived based on data measured in the field before and after thermal equilibrium across wellbores. Thus, the log temperature data might be a few degrees Celsius off compared to the actual rock temperatures at those locations. We also included the NREL and SMU predictions to demonstrate how they differ from our model. Seen in Fig. 34, the InterPIGNN model predictions closely matched most of the temperature log data. Due to the increasing spatial sparsity of bottomhole temperature measurements along depth, our model showed reasonable uncertainty magnitudes that were generally increasing with depth. We noted that our model was superior to the NREL model, where the latter often overestimates subsurface temperatures. Additionally, unlike the SMU model whose predictions tend to frequently show uniform temperatures along deep intervals, our model showed increasing temperatures along depth.

Fig. 34
figure 34

Verification of the proposed InterPIGNN model using temperature log data across different locations and projects. Whereas the solid line represents the InterPIGNN model prediction, the shaded area represents the uncertainty band. The NREL and SMU models were also visualized in dashed lines

To further understand the behavior of the InterPIGNN model and attribute the various spatial predictions to different physical quantities, we conducted explainability analysis using Integrated Gradients. We approximated the global feature importances by summing attributions per features. We excluded the thermal and spatial quantities in this process because the former are our target variables, while the latter is thoroughly expressed in maps across figures. These were then normalized and the top six predictors were identified for the target quantities of temperature, surface heat flow, and thermal conductivity as seen in Figs. 35, 36 and 37, respectively. The magnitude and sign of each bar in these charts indicate the level and direction of influence, respectively, of a feature with respect to the corresponding target thermal quantity.

Fig. 35
figure 35

Normalized and sorted global feature attributions computed using Integrated Gradients with respect to temperature predictions

Fig. 36
figure 36

Normalized and sorted global feature attributions computed using Integrated Gradients with respect to surface heat flow predictions

Fig. 37
figure 37

Normalized and sorted global feature attributions computed using Integrated Gradients with respect to thermal conductivity predictions

Discussion

We first noted that our model was an improvement to the combined SMU/NREL model, which indicated that the considered physical quantities were overall useful predictors of bottomhole temperature. Whereas using GNN algorithms was advantageous, the proposed InterPIGNN model was found to provide improved extrapolation capabilities by including the target quantities (e.g., temperature, surface heat flow, and rock thermal conductivity) as neighboring features in the message-passing computation and incorporating physical laws. Beyond comparison on the basis of mean absolute error, we observed that incorporating Fourier’s Law of heat conduction resulted in predictions of temperature-at-depth and surface heat flow that were diffusive with minimal spatial artifacts. We also explored the effect of incorporating Fourier’s Law in the model training process. Fig. 38 shows temperature predictions at \(3 \ km\) for a model that did not train on physics-informed loss terms, where it yielded non-physical results with spatial specs and artifacts in comparison to physics-informed predictions seen in Fig. 25. Beyond the visual inspection, the solely data-driven predictions of Fig. 38 violate Fourier’s law of conductive heat transfer, seen in Eq. 4. Hence, they are less likely to generalize well when extrapolating temperature across zones where bottomhole temperature measurements are not available, such as deep rocks at \(> 5 \ km\).

Fig. 38
figure 38

Temperature-at-depth predictions for a model that was not physics-informed, showing significantly visible and non-physical spatial specs and artifacts

Seen in Fig. 22, our predictions showed elevated surface heat flow across the Western United States, largely due to the presence of the Pacific Ring of Fire where tectonic movements favor volcanic activities and geothermal heat transfer. In particular, multiple locations stand out such as Yellowstone in Wyoming, Basin and Range Province extending mainly across Nevada and neighboring states with thinner crustal thickness, Cascade Range starting from northern California and reaching to British Columbia, Canada and including active volcanoes like Mount St. Helens and Mount Rainier, Coso Volcanic Field in California, Rio Grande Rift stretching from New Mexico to Colorado, and Imperial Valley in California partly due to the San Andreas Fault system. Away from the Western flank, elevated surface heat flow values were also observed at the Appalachian Mountains and along the Gulf Coast.

Unlike the diffusive heat flow and temperature quantities, thermal conductivity is a rock property with compartmentalized distribution, as can be observed in the raw measurements and model predictions. Thermal conductivity is governed by various parameters such as rock composition, porosity, fluid saturation, temperature, and pressure. Thermal conductivity tends to be lower in dry and unconsolidated rocks and varies with rock composition (Song et al. 2023). Seen in Fig. 39, our predictions showed a fairly constant average rock thermal conductivity along depth.

Fig. 39
figure 39

Subsurface thermal conductivity profiles across grid nodes \(V_g\) (gray lines) and on average (red line)

Whereas we evaluated epistemic uncertainty using Monte Carlo Dropout, we also emphasize the importance of aleatoric uncertainty. Inaccuracies in borehole and laboratory measurements of rock temperature-at-depth and thermal conductivity were likely present due to sensor error, measurement conditions, corruption of input data, and biased spatial oversampling of measurements at shallower depths and around locations with known natural resources (e.g., geothermal, hydrocarbon, underground water). Whereas we used the Harrison correction (Harrison et al. 1982) to correct bottomhole temperature to rock temperature at the same location and treated uncertainty in rock thermal conductivity in our neural network loss function formulation, seen in reference 9, aleatoric uncertainty was not fully eliminated. Introducing more raw and inferred input features (e.g., detrended elevation, normalized seismic quantities) statistically reduced model overfitting in the presence of aleatoric uncertainty (Hüllermeier and Waegeman 2021). However, additional investigation and handling of this type of uncertainty could further improve predictions of subsurface thermal quantities.

Using Integrated Gradients and accumulating the model prediction gradients with respect to the input quantities, we estimated which features, on average, mostly govern model predictions of thermal quantities, seen in Figs. 35, 36, and 37. Raw and inferred seismic features across the crust and mantle (i.e., depths of \(20 \ km\), \(40 \ km\), and \(60 \ km\)) were notably important in predicting thermal quantities. Electrical conductivity was the most positively correlated feature with temperature and heat flow, where higher temperatures lead to more active ionic mobility in the brine that saturates the rock (Han et al. 2024). We observed higher temperature predictions in locations with spatially increased concentrations of thorium radiation at shallow layers. Proximity to faults was found to be positively correlated with temperature, which is sensible as convective heat flow and hot springs are associated with faulted zones (López and Smith 1996). Detrended along depth, normalized P-wave velocity in particular was strongly indicative of decreasing temperature-at-depth, which is physically accurate as the rise in rock temperature causes rock rigidity and bulk modulus to decrease which leads to a reduction in seismic velocities (Qi et al. 2021). Compressional wave velocity was negatively correlated with rock thermal conductivity, which we attributed to the strongly negative correlation between compressional wave velocity and rock porosity, where increased porous volumes translate to decreased rock thermal conductivity (Boulanouar et al. 2013). In the presence of interdependence across input features (i.e., physical quantities in our study), it is important to recall that explainability of neural networks is nuanced and need to be thoughtfully interpreted.

The proposed study approach has several limitations. First, whereas our modeling approach approximately complies with Fourier’s law of conductive heat transfer, it does not explicitly incorporate physical laws of convective heat transfer which would enhance predictions at convectively active areas around the United States. Second, we used the Harrison correction (Harrison et al. 1982), to account for the disequilibrated thermal conditions in the wellbore as a result of drilling and completion fluid circulation. Since this correction was originally developed based on temperature measurements in Oklahoma, modeling could be improved by considering different corrections across locations. Furthermore, aleatoric uncertainty was present across measurements due to various reasons (e.g., sensor error, measurement conditions, corruption of input data), which was not modeling in this study. Third, we incorporated physical laws using loss terms in the machine learning framework, requiring robust multi-objective optimization, which is challenging. An alternative data-driven approach could embed physical laws in the model design (e.g., neural network architecture) rather than the loss function to avoid the need for multi-objective optimization. Finally, it is important to note that our feature explainability technique, i.e., Integrated Gradients, does not resolve feature interdependence, hence improved interpretation techniques could leverage a deeper understanding of the model predictions.

Conclusion

We aggregated and processed bottomhole temperature measurements for the conterminous United States from multiple sources. To construct a thermal Earth model using these records, we introduced a novel interpolative physics-informed graph neural network module, named InterPIGNN, that was suitable for the interpolation of point cloud data structures. We constructed surface heat flow, and temperature and thermal conductivity predictions for depths of \(0-7 \ km\) at an interval of \(1 \ km\) with spatial resolution of \(18 \ km^2\) per grid cell. Our model showed superior temperature, surface heat flow and thermal conductivity mean absolute errors of \(6.4^\circ C\), \(6.9 \ mW/m^2\) and \(0.04 \ W/m-K\), respectively. Our model closely satisfied Fourier’s Law of conductive heat transfer with mean absolute error of \(0.7 \ mW/m^2\). We estimated the median epistemic uncertainty associated with our predictions of temperature, surface heat flow, and rock thermal conductivity to be \(9.63 \%\), \(8.04 \%\), and \(3.71 \%\), respectively. We also found that elevation, seismic, and electrical conductivity measurements had the most significant influence on the model predictions of the target thermal quantities.