1 Introduction

Trees have evolved strategies to endure moderate drought episodes through physiological and morphological adaptations. These adaptations help maintain a balance between cooling mechanisms in the crown while preventing excessive water loss and carbon starvation. Such adaptions involve the regulating stomatal conductance, reducing leaf surface area and solar tracking [1, 2]. Additionally, certain wood traits enable xylem to withstand hydraulic failure [3]. Even moderate drought periods increase the likelihood of mortality [4] and can lead to reduced growth [5, 6] regardless of climate change [7]. Recent extreme drought events, such as those in Europe in 2018 and 2019 [8, 9] caused by climate change-induced warming and shifts in precipitation patterns [10], have raised concerns about amplified tree mortality and die-off across various climate zones [11].

European beech (Fagus sylvatica), known for its high shade tolerance, has historically outcompeted other tree species in many parts of Europe [12]. However, in recent years, beech has shown declining growth throughout Europe [13,14,15,16]. Nevertheless, it may gradually acclimatise to drought over time [17].

Typical drought episodes can result in reduced carbon uptake due to decreased stomatal conductance, premature leaf senescence [18, 19], and a decrease in foliage in the following year due to reduced bud availability [20]. Prolonged drought episodes, especially for anisohydric plant species, can cause irreversible damage, including xylem embolism [15], resulting in permanent damage to the hydraulic system [21, 22].

One approach to assess drought-tolerant species involves classifying a tree’s hydraulic strategy along the anisohydric and isohydric spectrum [18, 23]. Despite numerous studies on tree hydric behaviour, there is no mathematical model describing this trait and is typically categorised based on the relationship between stomatal conductance gs and leaf water potential Ѱ1 [24]. Isohydric plants are known for reducing transpiration by closing stomata during water shortages, which reduces CO2 assimilation [25]. Anisohydric plants, on the other hand, keep stomata open for longer periods during water shortage, making them more vulnerable to hydraulic failure but maintaining higher CO2 uptake during drought episodes [8, 26]. The anisohydric strategy, in essence, necessitates more water to keep leaves cool during extreme heat and relies on significant fluctuations in tree stem (xylem) water content, often relying on nocturnal refilling [27]. Within each species, variations in hydric behaviour can also occur due to genetic variation in terms of drought stress tolerance [8, 10]. Species with high phenotypic plasticity may allow individuals to adapt to changing climate conditions [15]. Categorizing tree species, and even specific provenances [10], into hydric behavioural classes through quantification of stomatal conductance with gas exchange measurements and leaf temperature can assist in assessing drought stress tolerance in the face of climate change. However, it is important not to assume such categorization and to adopt a comprehensive holistic approach [8] particularly in terms of whole-tree carbon balance [21]. In practical terms, central European species are rarely strictly either anisohydric or isohydric, but rather they are typically evaluated in reference to other species. For instance, Quercus species tend to be more anisohydric than Fagus, while Pinus is often more isohydric than Fagus. A better understanding of hydric behaviour among species and individuals at the regional scale could significantly assist in focusing forest management goals.

An increased awareness of the impact of drought on tree productivity and survival [17] is essential for selecting appropriate species and provenances to enhance silvicultural practices, particularly regarding drought stress adaptation [28]. However, the effects of prolonged and extreme drought conditions on forests in the future remain relatively unknown, and the capacity of trees to acclimate is often underestimated [17, 29, 30]. Pretzsch et al. [17] demonstrated that during a 5-year experiment involving induced drought, beech acclimated faster than spruce, while spruce acclimatised more rapidly when mixed with beech. This suggests that some species could acclimatise to extended drought stress over time, within a generation, providing hydraulic failure is avoided. Detecting hydric behaviour can assist in determining tree species mixing strategies, identifying species that coexist well during drought, and assessing hydric variability within species.

The use of thermal infrared (TIR) sensors has proven valuable for non-destructive water content retrieval and stomata closure detection in plants [31,32,33]. Recent advancements in sensors mounted on unoccupied aerial vehicles (UAVs) have created opportunities to acquire thermal imagery from above crop or forest canopies. Gómez-Candón et al. [34] utilised UAV-based thermal imagery to detect elevated canopy temperatures in non-irrigated trees while using reference ground targets for temperature accuracy validation. Simpson et al. [35] implemented UAV thermography and multispectral data to produce evapotranspiration maps for oak trees. However, achieving accuracy with low-cost thermal imagers can be challenging, leading to several studies aimed at assessing and enhancing thermal imaging acquisition methods [36,37,38,39,40,41,42,43]. Challenges affecting the accuracy of thermal imaging can arise from the influence of meteorological variables, including air temperature (AT), relative humidity (RH), solar radiation (SR), and wind speed (WS) which not only impact tree canopy temperature but also influence the sensor itself. Additionally, other issues can affect thermal sensors, such as sensor drift, internal calibration, “bad pixels”, variations in “spot-size”, variations in leaf angle due to solar tracking, and the need to exclude certain pixels through masking.

The primary objective of this study is to investigate the feasibility of obtaining precise thermal imagery at the individual tree level, a topic of interest for intensive forest monitoring plots (i.e., ICP forests Level II). In pursuit of this, we developed a single-shot method using the Micasense Altum sensor to capture tree crown temperature data. This data serves the dual purpose of calculating the leaf-to-air vapor press deficit (LVPD) and constructing a model for tree water deficit (TWD). Our approach involves both indoor and outdoor experiments, incorporating leaf temperature sensors for validation purposes. The specific aims of our study are as follows:

  1. I.

    To determine the minimum number of pixels required to obtain accurate temperature measurements and to investigate whether dry vegetation with lower emissivity has an impact on accuracy.

  2. II.

    To evaluate the accuracy of tree crown temperature measurements in comparison to upper canopy-mounted leaf temperature sensors across repeated missions under varying weather conditions throughout the growth season.

  3. III.

    To assess the dispersion of TIR values obtained through grid-type acquisition methods against single-shot acquisition techniques.

  4. IV.

    To explore the potential for modelling TWD using TIR tree crown data in conjunction with meteorological data.

2 Materials and Methods

2.1 Study Area

The Britz Research Station is situated roughly 50 km northeast of Berlin, Germany (52.87° N 13.83 ° S, 42 m above sea level) and under the management of the Thünen Institute for Forest Ecosystems (www.thuenen.de). Established in 1972, the research station was originally designed for forest hydrology research and has expanded over the years to encompass various facets of intensive forest monitoring. Since 2018, the research station has integrated UAV technology in conjunction with multispectral sensors for research purposes, specifically for intensive forest monitoring (i.e., Level II). This research encompasses a wide range of activities, including comprehensive tree geometry measurements, phenology studies, growth assessments, leaf-area-index evaluations, plot mapping, drought stress analysis, and the validation and evaluation of multispectral sensors. An overview map of the research station is available in Krause et al. [44].

The beech stand under consideration for this study is approximately 50 years old. Among the trees selected for this study, nine are designated as long-term phenological observation trees and are equipped with six-point dendrometers. Some of these trees also are equipped with both analogue and digital band dendrometers, along with sap-flow sensors. Notably, two of the selected trees in this study have been equipped with leaf temperature sensors positioned in the upper tree crown.

2.2 Altum Sensor

The sensor utilised for the study was the Micasense Altum (micasense.com). The sensor is composed of six synchronised bands, including blue, green, red, near-infrared, red-edge, and longwave thermal infrared (LWIR). Technical specifications of the Altum sensor are presented in Table 1. Additionally, the sensor incorporates the radiometric-capable Lepton (Lepton LWIR) LWIR sensor from FLIR Systems (FLIR, 2022), which operates within a wavelength range spanning from 8 to 14 μm with a centre wavelength of 11 μm. It also integrates thermal image processing features such as automatic thermal environment compensation, noise filters, and thermal non-uniform correction (NUC) [45]. The NUC functions automatically recalibrates the sensor every five minutes or 2° K changes in internal temperature. The manufacturer reports a thermal accuracy of ± 5 K, while the thermal sensitivity is noted to be less than 50 mK (0.05° K). In the context of this study, data derived from the thermal band is referred to as thermal infrared (TIR).

Table 1 Parameters for the Micasense Altum sensor

2.3 UAV

The UAV employed for the study was a DJI Matrice M210 RTK, equipped with dual gimbals that simultaneously carried both the Micasense Altum and Zenmuse X7 (RGB) sensors. According to the manufacturer, the hovering accuracy in P-mode with Global Navigation Satellite System (GNSS) capability is approximately ± 0.5 m vertically and ± 1.5 m horizontally. The maximum take-off weight is 6.14 kg and a flying time of approximately 24 min, even when powering both the Altum and X7 sensors concurrently.

2.4 Flight Planning

The DJI Matrice M210 RTK was operated using the DJI Pilot App (www.dji.com). Flight plans were generated based on the central positions of tree crowns extracted from an existing Orthomosaic created from data obtained with the Zenmuse X7 and Real-Time Kinematic (RTK)-GNSS positioning. The single-tree waypoint flight plan was created using the R software [46], which produced a custom Keyhole Markup Language (KML) file conforming to the required DJI format. The selected flying height was set to a minimum of 10 m above the beech canopy, typically an altitude of approximately 30 m above ground level. This choice was determined after testing, as it was found that maintaining a minimum distance of 10 m from the canopy was essential to prevent immediate disturbance of the leaves due to the downward airflow generated by the propellers. The single-image waypoint flight plan (see Fig. 1) required about 5–7 min to execute. Each waypoint was maintained for a duration of at least 10 s, typically allowing for the capture of five images from the Altum sensor, with an intervalometer set at two-second intervals. For the trees equipped with leaf temperature sensors, a separate flight plan was devised, which was typically repeated at least twice within a single mission.

Fig. 1
figure 1

An example of a waypoint flight plan for single shot images. The UAV would hover over each waypoint for 10 seconds while acquiring five thermal single-shot images of the tree crown

The standard flight plan for weekly coverage of the entire Britz research station involved the gridded flight. In this flight plan, a forward- and side-overlap of 80 to 85% was maintained, while the UAV operated at an altitude of 75 m. Achieving forward overlap was accomplished by utilising a two-second intervalometer for the Altum sensor, allowing for a flying speed of three meters per second.

2.5 Research Station Sensors

The leaf temperature sensors utilised for validating leaf temperature were the LAT-B2 sensors [47]. These sensors are capable of measuring both the absolute leaf surface temperature and the surrounding ambient leaf temperature (see Fig. 2). According to the manufacturer, these leaf temperature sensors offer an absolute accuracy of ± 0.2 K for both leaf surface and ambient temperature measurements. Installation of these sensors typically takes place at the onset of the growth season, following the conclusion of all phenological phases for this location, which usually occurs at the end of June.

Fig. 2
figure 2

The LAT-B2 [47] leaf temperature sensor which was positioned in the upper tree crown and used to validate the TIR imagery

The study employed DR1W point dendrometers [47], which were installed on six of the study trees. These point dendrometers were permanently installed before the growth season and are indicated by the manufacturer to exhibit an error of a maximum of 4.5% of the measured value. These dendrometers are designed to measure changes in the tree’s diameter at breast height (1.3 m) at a precision level of micrometer (μm) range and configured to record data at five-minute intervals. For the study, five of the six trees were chosen based on the completeness of available data.

The weather data was acquired from a Thies weather station [48], which was configured to capture various meteorological parameters such as air temperature (AT), wind direction (WD), wind speed (WS), solar radiation (SR), relative humidity (RH), and air pressure (AP). The weather station is permanently situated in an open field at a height of two meters above ground level and situated approximately 180 m away from the beech study plot.

2.6 Feature Selection

Working with UAV-based multispectral imagery alongside near real-time meteorological data provides numerous modelling opportunities that are typically unavailable when solely working with pixel values. When dealing with thermal imagery, comparing raw thermal values from day to day can often be misleading. The thermal temperature of plant or tree leaves can exhibit rapid variations, especially during hot summer days with changing cloud conditions. By incorporating meteorological variables such as RH, AT, and SR synchronised at the time of thermal data acquisition, it becomes possible to account for various influences stemming from fluctuating weather conditions. This approach not only proves useful for calibrating thermal imagery but also for modelling tree water status, such as the TWD [49, 50].

It is essential to rigorously assess these features regarding their effectiveness in predicting TWD to develop the most accurate models using the available data. Certain features, such as the vapor pressure deficit (VPD), which is derived from RH and AT, can be employed on their own to predict TWD [50]. Integrating UAV-based TIR imagery into TWD modelling holds the potential to introduce a spatial dimension to the model’s predictions, which could prove valuable in enhancing the overall accuracy of the predictions.

Utilising feature engineering has the potential to enhance modelling performance by creating more effective data representations while mitigating issues related to multicollinearity and the impact of irrelevant predictors [51]. One straightforward example of altering the representation of a feature involves the VPD, which is derived from relative humidity (RH) and air temperature (AT) features. VPD has been demonstrated to influence atmospheric demand and stomatal conductance [52, 53]. Furthermore, the leaf-to-air vapor pressure deficit (LVPD) incorporates the difference between the UAV-based TIR temperature and AT, in addition to VPD. LVPD has the potential to serve as an input feature for TWD modelling, and it can also function as a standalone index that reflects the current status of the tree crown.

The VPD represents the disparity between the current moisture content of the air and the air’s potential moisture-holding capacity when saturated [54]. The maximum capacity for moisture retention depends on the AT, where higher AT values correspond to greater potential saturation, a concept referred to as saturation vapor pressure (SVP) (see Eq. 1). When combined with RH, SVP yields the active vapor pressure (AVD) (see Eq. 2), which represents the actual quantity of water vapor at a given AT. VPD is subsequently calculated by deducting SVP from AVP as shown in Eq. 3.

$$SVP=0.6108\exp \left(\frac{12.27 AT}{AT+237.3}\right)$$
(1)
$$AVP= SVP\frac{RH}{100}$$
(2)
$$VPD= SVP- AVP$$
(3)

The LVPD is calculated by accounting for the difference between the water vapor pressure within the leaf and the water vapor pressure of the ambient AT, [52, 55,56,57]. In this calculation, the leaf surface temperature is implemented to calculate the leaf vapor pressure (LVP) (see Eq. 4), replacing the use of ambient AT as in the AVP equation (see Eq. 2). LVPD is subsequently derived by subtracting LVP from AVP as shown in Eq. 5.

$$LVP= SVP\frac{RH}{100}$$
(4)
$$LVPD= LVP- AVP$$
(5)

In this research, leaf surface temperature was obtained through UAV-based TIR imagery as well as from leaf temperature sensors directly affixed to the leaves. Table 2 provides a summary of the features and indices, along with their corresponding abbreviations, utilised in this study.

Table 2 Feature and indices with abbreviations and units

2.7 Tree Water Deficit as Drought Stress Validation

Daily diurnal fluctuations in stem diameter occur due to the depletion of the stem resulting from water loss through the leaves, typically during the day, and the subsequent nocturnal replenishment of the stem during the night. The magnitude of these variations in stem size serves as an indicator of the current tree water status and can be quantified accurately using point dendrometers positioned at breast height. Point dendrometers record changes stem size attributable to the formation of new xylem and phloem tissues, which accumulate over the course of the growth season [49, 50], as well as the expansion and contraction of elastic tissues [50, 58]. Fig. 3 illustrates the pattern of daily fluctuations, wherein stem contraction, resulting from water loss through transpiring leaves during the day, is followed by nocturnal expansion, continuing into the early morning. The red dashed line indicates the timing of the UAV missions for TIR acquisition.

Fig. 3
figure 3

A representation of the daily stem fluctuations. Nocturnal refilling typically initiates in the late afternoon or early evening, concluding when leaf transpiration commences in the morning. TIR data was generally collected around solar noon, close to the time when the stem approaches its lowest water content (indicated by the red dashed line)

As shown in Fig. 4a, following the methodology outlined by Zweifel et al. [59], we established segments between successive maximum stem radius points observed throughout the growth season. These segments capture the irreversible growth occurring between these maxima. Within each segment grouping, focusing on the reversible changes in stem size (TWD), involves a multi-step process. Initially, we detrend the data to remove the underlying growth trend, isolating variations primarily linked to daily water dynamics. Subsequently, we normalise the data to ensure consistent scaling, facilitating meaningful comparisons across different days and trees. Finally, we invert the values, essentially flipping the data to have the highest point on each curve represent the maximum stem shrinkage for that particular day, as illustrated in Fig. 4b. This approach enables precise capture and analysis of maximum daily stem shrinkage instances, offering valuable insights into the tree’s water status and its response to environmental conditions.

Fig. 4
figure 4

On the left (a), the diagram illustrates a growth segment (dotted line) delineated between two instances of maximum stem radius. The arrows indicate the extent to which the stem deviates from the growth trend (GRO), highlighting the deficit in stem size. On the right (b), the same segment is presented following a series of transformations: detrended, inversed, and normalised. These adjustments position the maximum shrinkage instances at the top of the graph, enabling the determination of TWD on an hourly basis. It's worth noting that the UAV missions typically occurred around solar noon, approximately 3 to 6 h before the stem reached its peak daily shrinkage

2.8 Image Processing

The Micasense Altum offers the capability in the co-registration of single-shot TIR imagery with synchronised multispectral bands. This co-registration process is performed during post-processing utilizing the Micasense “image processing” Python library available at www.github.com/micasense/imageprocessing. In addition, the non-thermal multispectral bands undergo conversion to radiance and subsequent radiometric calibration. This calibration step is facilitated through the use of a Micasense calibrated reflectance panel, which is sampled both before and after each flight mission. As a result of this calibration, the resulting images exhibit reflectance values ranging from 0 to 1. Subsequent to radiometric calibration, an image alignment process is executed, which encompasses unwarping the images via built-in lens calibration, applying an affine transformation, and cropping to eliminate extraneous pixels [60]. The full process was carried out in Python 3.6 and Micasense Python libraries (www.github.com/micasense/imageprocessing). The customised Python workflow was executed in a loop for each corresponding calibration panel.

Further processing was carried out in R [46] where multiple vegetation indices were calculated to facilitate testing and masking procedures. It is important to note that the raw digital number (DN) values of the TIR band, as provided by the sensor, are expressed in centi-Kelvin. To conform to common temperature units, these values were converted to Celsius, following the manufacturer’s recommended conversion equation:

$$TIR{}^{\circ}C=\left(\frac{DN}{100}\right)-273.15$$
(6)

The management of images and their derivatives involved organizing them into layered stacks, with each stack categorised by its corresponding mission ID number for clarity and easy access. Identifying tree crowns within individual images was performed manually but carried out through a rapid annotation method developed in R, utilizing the Terra package [61]. In this process, images were displayed in either RGB or colour-infrared (CIR), and crown polygons were created by selecting the top left and bottom right points of the tree crown, forming the blue rectangle shown in Fig. 5. To establish the dimensions of this rectangle, a centroid point was calculated, subsequently used to define the length and width of the rectangle. Following this, an ellipsoid or, in some cases, a near circle was generated to accurately represent the shape of the tree crown. Given the considerable volume of images, a semi-automated approach was employed, which would systematically move to the next image stack following user inputs. Subsequently, polygons and cropped images were stored using specific naming conventions to ensure repeatability and consistency in the dataset.

Fig. 5
figure 5

Illustration of the rapid annotation method employed for crown pixel extraction in R [46]. The ellipsoid shape is generated by the user's mouse clicks at the positions of the tree crown’s top-left and bottom-right corners

During the looping process, the mean extracted tree crown TIR, multispectral, and vegetation index pixel values were appended to a table, alongside acquisition timestamps, utilising the exiftoolr package [62]. Subsequently, the timestamps for the extracted pixel values and dendrometer data were rounded to the nearest hour and then aligned with the corresponding hourly meteorological data obtained from the weather station. The original dendrometer data, collected at 5-min intervals, were synchronised for this study to calculate the TWD at the hourly level. Three different datasets were created to account for variations in lag regarding TWD calculated from the dendrometer data: no lag, one-hour lag, and two-hour lag.

2.9 Analysis and Modelling

Statistical analysis and modelling was carried out in R [46]. All datasets underwent testing for normalisation and correlation, and modelling approaches were adjusted accordingly. Pearson correlation was employed for analysing parametric data, while the Spearman non-parametric rank correlation was used for data that did not conform to a normal distribution. To average correlation results, correlation coefficients were transformed using the Fischer-Z method, utilizing the fisherZ function from the DescTools package [63]. These transformed values were then averaged and converted back to correlation coefficients for further analysis [64]. For the further analysis of non-parametric data, second-order polynomial regression and generalised additive models (GAM) were applied. GAMs were trained and validated using a 70/30 training and validation split and evaluated using root-mean-squared error (RMSE), mean squared error (MAE), and R2. Linear regression was also evaluated using RMSE, MAE, and R2. Models were compared using the Akaike Information Criterion (AIC). It is important to note that due to the absence of actual dendrometer data influenced by drought, the modelling was limited to simple curve regression models, intended for potential extrapolation and proof-of-concept analysis rather than operational use.

3 Results

3.1 Thermal Sensor Assessment

3.1.1 Indoor Experiment

To evaluate the performance of the Micasense Altum TIR band, extracted TIR values were compared with leaf temperatures obtained from mounted leaf temperature sensors. An indoor experiment was conducted using two indoor plants (Epipremnum aureum), each equipped with two leaf temperature sensors. The choice of these plants was based on practical considerations, as leafed-out beech, typically used for such experiments, is not readily available during the winter months in central Europe. The experiments were carried out over a duration of 30 to 50 min, during which controlled variations in timing and intensity of heat lamps and a ventilator were applied. The temperature sensor was configured with a two-second intervalometer and activated at the start of each experiment. Although the sensor was not specifically pre-warmed before the indoor experiments, it had been in the same room for at least 30 min prior to operation. Both the positioning of the heat lamps and ventilator were carefully arranged to affect not only the plants but also the sensor.

Thermal values were extracted from leaves in close proximity to the leaf temperature sensors (see Fig. 6). Three different types of polygons were tested across three trials: extracting all pixels from the entire leaf, extracting approximately 10 pixels around the leaf temperature sensors, and extracting approximately three pixels near the leaf temperature sensor. Table 3 shows the results of the three trials. Using approximately 10 pixels (trial 2) for extraction yielded a mean difference of 0.02 K. However, employing pixels from the entire leaf or only three pixels for extraction resulted in mean differences of 0.11 °C and 1.18 °C, respectively.

Fig. 6
figure 6

The thermal image on the left, captured by the Micasense Altum during indoor experiments, displays two plants. The watered plant on the left of the thermal image contains leaf 1 (bottom) and leaf 2 (top), while the unwatered plant on the right has leaf 3 (bottom) and leaf 4 (top). In the second trial, polygons were created in close proximity to the mounted leaf temperature sensors. These polygons consist of approximately 10 pixels each, which were utilised to extract the mean temperature at two-second intervals. The RGB image on the right shows the two plants with mounted leaf temperature sensors and the effects of the heat lamps

Table 3 The mean difference in temperature for the three separate trials. The use of approximately 10 pixels created the best results while implementing only three pixels or less resulted in considerably more error

In the accuracy assessment of the second trial, the results for the watered plant were as follows. Leaf one had an RMSE of 0.55 K and MAE of 0.42 K, while leaf two had an RMSE of 0.53 K and MAE of 0.40 K. Both leaves of the watered plant achieved an R2 of 0.9. For the unwatered plant, leaf three had a slightly higher RMSE of 0.66 K and an RMSE of 0.74 K for leaf four. A similar decrease in accuracy was observed for the MAE of the unwatered plant, with a MAE of 0.53 K for leaf three and a MAE of 0.59 K for leaf four. The R2 values for leaves three and four were 0.88 and 0.87, respectively. Table 4 provides an overview of the RMSE, MAE, and R2 of both plants for the second trial. Fig. 7 shows the four separate leaves of the second trial, modelled with second-order polynomial regression. Similar patterns are evident among all four leaves due to consistent artificial environmental effects (i.e., heat lamps and a ventilator). However, there is a wider dispersion in the leaves of the unwatered plant, particularly for leaf four.

Table 4 Mean temperature differences in three separate trials. Using approximately 10 pixels yielded the most accurate results, while using only three pixels or fewer resulted in significantly higher errors
Fig. 7
figure 7

Results of the indoor experiment modelled using second-order polynomial regression. The point distribution indicates lower R2 values and higher dispersion for the unwatered plant, particularly for leaf four. Variations in point clusters can be attributed to controlled heat lamp and ventilator fluctuations, as well as potential sensor drift

The second trial lasted for approximately 50 min, during which the plants and sensor were exposed to varying heat lamp and ventilator intensities. Fig. 8, visually compares the extracted thermal pixel values from leaf four to values derived from the leaf temperature sensor. The near infrared (NIR) band of the Micasense Altum sensor indicates the timing of the heat lamp fluctuations. The spikes observed in the thermal imagery (Leaf TIR 4) are likely caused by the thermal non-uniform calibration (NUC) of the Micasense Altum sensor, which, as reported by the manufacturer, calibrates automatically every five minutes or when the temperature of the camera changes by 2 K [45].

Fig. 8
figure 8

Visual comparison between the extracted thermal camera pixel values (leaf TIR 4) and the leaf temperature sensor readings. The spikes observed in the thermal dataset are likely attributed to the Thermal Non-Uniform Calibration which occurs automatically at five-minute intervals

3.1.2 Field Experiments

During the field experiments, the thermal sensor’s performance was further assessed by comparing the mean crown temperature of two trees, each equipped with a leaf temperature sensor in the upper tree crown. While validation was limited to a single sensor in the upper canopy of each tree, it provided a rough estimation of the upper tree crown temperature, serving as a control for understanding how the leaf surface responded to varying meteorological factors. Fig. 9 shows the results of this comparison using linear regression (upper) and a time-based visual representation (lower). The average thermal values extracted from the crown of tree 328 over 13 acquisition days resulted in an RMSE of 3.31 K, a MAE of 2.95 K, and an R2 of 0.89 when compared to the tree crown-based leaf temperature sensor. Similarly, thermal values extracted from tree 347 yielded comparable results with an RMSE of 3.12 K, MAE of 2.78 K, and an R2of 0.93. Notably, there is a systematic error, indicating a consistent underestimation of leaf temperature, as evident in the time-based visualisation for both trees.

Fig. 9
figure 9

Linear regression and accuracy reporting (top) as well as a time-based visualisation (bottom) for tree 328 (left) and tree 347 (right)

To assess the accuracy requirements for a thermal sensor for tree crown temperature extraction throughout the growth season, the mean absolute difference for tree 328 and 347 was calculated across the full time series (n = 13). Tree 328 resulted in a mean absolute difference of 5.27 K, while tree 347 resulted in a similar mean absolute difference of 5.45 K. This shows that in order to consider the day-to-day differences at the stand level, a minimum of approximately 5 K accuracy could be necessary.

To assess the ability to differentiate between extracted tree crown temperatures within a flight mission, the mean absolute difference was calculated for each epoch separately across the five trees equipped with point dendrometers, resulting in a minimum of 0.7 K and a maximum of 1.3 K. Fig. 10 displays a correlation matrix for the mean absolute difference of extracted thermal values for the growth season in comparison to typically available hourly meteorological data. It is evident that AT and VPD exhibit moderate correlations with daily differences in tree crown temperatures, while RH and WS also play a role but with minor significance. In this case, SR does not appear to have been an influencing factor. The masking of thermal pixels to the upper tree crown to eliminate shaded parts of the lower crown as well as ground pixels did not show any significant difference in correlation, except for a slight improvement RH. On the other hand, the mean crown temperature from the thermal sensor correlates well with SR (0.68) while RH shows moderate correlation (0.49), and WS almost no correlation. VPD, and particularly AT, demonstrate high correlations with the TIR crown temperature at 0.88 and 0.98, respectively.

Fig. 10
figure 10

Pearson cCorrelation matrix showing the relationships between the extracted thermal infrared values for all of the tree crowns (n = 9) summarised for each acquisition day (n = 13). TIR: unmasked extracted TIR values; TIR Mask: masked extracted upper crown TIR values; Difference: the mean difference between extracted TIR values within an acquisition day

3.2 Acquisition Method Assessment

3.2.1 Flight Grid Acquisition Method

Two UAV-based acquisitions methods, flight grid and single-shot, were evaluated to assess the effects of various viewing angles and timing on thermal imaging consistency. Four epochs from the growth season campaign were selected for comparison purposes. Fig. 11 provides an overview of various meteorological data acquired within an hour of the TIR data acquisition for each of the four epochs. Additionally, Fig. 11 also shows the relationship between the acquired TIR imagery of a sample tree crown and the cloud cover on each specific day. All four days experienced varying cloud cover, with day 240 having the most substantial cloud cover, accompanied by low SR and high RH. Day 250 exhibited the highest VPD, which coincided with the highest mean absolute difference of the thermal imagery (mdTIR) and a relatively high TWD. On the other hand, day 257, which had the highest SR, showed the lowest mdTIR.

Fig. 11
figure 11

Four epochs from the growth season campaign were selected for comparison purposes. Overcast refers to when the sky is completely covered by clouds. The term Cloudy depicts a number of clouds which block out the sun most of the time with the possibility of short episodes which uncover the sun. Partially cloudy is when most of the sky is clear however some clouds could potentially block out the sun temporarily. Also reported is the air temperature (AT), relative humidity (RH), solar radiation (RH), wind speed (WS), vapor pressure deficit (VPD), mean TIR (mTIR), mean absolute difference TIR (mdTIR) and mean tree water deficit (mTWD) of the 5-point dendrometer-equipped trees of the pure beech stand. DOY: day of year

Fig. 12
figure 12

The extracted TIR values of the sampled tree crown from three flight lines from the gridded mission shown with timestamps, compared to the leaf temperature of a neighboring tree. Top right: DOY 203; top left DOY 240; bottom left: DOY 250; bottom right DOY 257

In the context of the flight grid mission, the tree crown temperature of a selected tree was extracted from the TIR imagery obtained during three separate flight lines, each conducted at a flying height of 75 m for each of the four selected epochs. The distance from the sensor to the tree crown varied in each image due to changing incidence angles. In the first flight line, the tree crown was positioned at approximately +20° from nadir, and the flight proceeded in the northeast direction. The second flight line had the tree crown positioned at nadir, while the flight headed in the southwest direction. The third flight had an incidence angle of − 20° off nadir in the X-axis. Each flight line comprised five images acquired at two-second intervals. Table 5 provides the mean and SD for each flight line for all four epochs. Notably, Day 250 showed the highest SD (0.59 K) for the flight line with a 20° off-nadir incidence angle, while the lowest SD was 0.06 K on Day 257 with a − 20° angle off nadir. The overall average SD across all acquisition days was 0.25 K (n = 12). The highest SD observed on Day 250 corresponds with the highest mdTIR and VPD among the four epochs. It is worth noting that the average TIR temperature over the flight lines could vary almost two degrees.

Table 5 Comparison of the mean and SD of crown temperature of a selected tree from various flight lines at a flying height of 75 m. The higher SD for the northeast direction suggests an influence due to an increased incidence angle from the sensor to the tree crown and possible sunspot reflections

Figure 12 depicts the gridded flight lines, with each individual tree crown extraction compared to the tree crown temperature of a neighbouring tree equipped with a leaf temperature sensor. It is important to note that the leaf temperature of the neighbouring tree is not intended for direct comparison but rather to illustrate how changing cloud cover affects the crown canopy at a given time. Temperature differences are apparent among all three incident angles, which is expected given varying cloud cover conditions and approximately one minute between flight lines. This is corroborated for the most part by similar fluctuations in the leaf temperature of the neighbouring tree. Notably, except for day 240, there is a significant dispersion in TIR SD. Day 240 also corresponds to the lowest AT, SR, WS, VPD, mTIR, and mTWD, and the highest RH among the four epochs.

3.2.2 Single-Image Acquisition Method

The single-tree acquisition method was accomplished using a waypoint flight planning approach, where the UAV would hover over each tree for 10 s, acquiring five images via a two-second intervals. Two passes were carried out on each acquisition day maintaining a distance of approximately 10 m to the tree crown for all images. The average SD across all acquisition days was 0.9 K (n = 8), with a minimum SD of 0.6 K and a maximum of 0.18 K. Apart from slight variations on day 257, the low SD remained consistent throughout the datasets (see Table 6). Notably, the mean temperature between passes could vary up to two degrees on day 203, which also had the highest WS. Aside from the second pass of day 257, it can be seen in Fig. 13 that the extracted TIR values within a pass maintain a high consistency when applying the close-range hovering waypoint method. However, the average temperature between passes can vary up to two degrees during changing cloud cover and high winds.

Table 6 Results displaying the mean and standard deviation of crown temperatures acquired with single-shot thermal imagery at nadir and a distance of approximately 10 m
Fig. 13
figure 13

The UAV-derived thermal values acquired with two passes separated by a minute repeated over the four acquisition dates shown in conjunction to the direct leaf temperature values in the tree crown. Top right: DOY 203; top left DOY 240; bottom left: DOY 250; bottom right DOY 257

3.3 Drought Stress Validation

3.3.1 Tree Water Deficit and Meteorological Data

Of the nine-point and band dendrometers available at beech study plot, five were selected for modelling and validation purposes. The TWD was calculated for the entire growth season and compared to meteorological data derived from the weather station located in an open area approximately 170 m from the beech study plot. The hourly growth season dataset (n = 2928) was tested for normality using the Shapiro-Wilk test, and the hypothesis of normality was rejected. For this reason, a GAM model was implemented to assess which meteorological features or combination of features best model the TWD estimate. Table 7 displays the adjusted R2 for nine different variations of feature models. Each model was also created using a one- and two-hour lag of the TWD behind the meteorological data, as well as no lag. This was done for the purpose of assessing the temporal offset in transpiration-regulating variables such as VPD, WS, and SR with the timing of lower stem fluctuations expressed by the TWD. The highest R2 scores are shown as bold text and show that a one-hour lag is, for the most part, the best explanation of the physiological delay between the atmospheric drivers and the TWD. The best GAM model was achieved with an R2 of 0.667 when using a combination of WS, WR, and VPD as input model variables. Interestingly, WS and SR faired slightly better when a two-hour lag was implemented. Fig. 14 shows the ranking (left to right) of the nine models using the AIC model evaluation method. Here, it can be seen that the combined WS, SR, and VPD model resulted with the lowest AIC value in comparison to all other features and feature combinations (see Fig. 14).

Table 7 Overview of the generalised additive models (GAMs) where various meteorological features were implemented to predict the tree water deficit across the growth season in 2021 (n = 2928)
Fig. 14
figure 14

The nine models were compared using the Akaike Information Criterion (AIC) mathematical method to evaluate how well the model fits the data. A combination of WS, SR, and the VPD produced the best results

3.3.2 Correlation Analysis Between Tree Water Deficit and Specific Features

Thermal data captured near solar noon (± 90 min) for each of the five-point dendrometer-equipped tree crowns totalled 13 flight missions during the 2021 growth season (n = 65). This dataset did not follow a normal distribution, and therefore, a Spearman correlation matrix was created comparing the calculated TWD for each individual tree against the mean crown TIR temperature, meteorological data, and derivatives (see Fig. 15). Each matrix represents a different lag: no lag, a one-hour lag, and a two-hour lag. The aim here was to determine the physiological delay due to the temporal offset between current stem fluctuations and atmospheric drivers influencing transpiration and leaf temperature at the time of acquisition. Another objective of the correlation matrix analysis was to gauge the feasibility of differentiating between individual tree water statuses, not only on the sampled stand level during acquisition days, but also to assess the heterogeneity among individuals on a given acquisition day. It is evident that some trees correlate less with the available features depending on the time delay. In some cases, it is also noticeable that particular features may influence TWD at different time delays.

Fig. 15
figure 15

Spearman correlation matrix comparing the TWD calculated from five-point dendrometers with the TIR imagery, meteorological data and derivatives. Each section of the matrix from top to bottom coincides with no lag, a one-hour lag, and a two-hour lag

In order to better evaluate the correlations over the various lag times, the mean correlation was calculated for each feature across the three lag variations (see Table 8). The values from each matrix were first subjected to the Fisher-Z transformation to mitigate against bias, which can be particularly relevant for smaller datasets [65]. It is evident here that no single lag variation is solely responsible for the best correlations. The TIR imagery, arguably the most important feature of this study, tends to correlate best at a one-hour lag; however, the difference an hour before or after is not significant. The most noticeable difference is that SR correlates best with the TWD without any lag. Additionally, the stand-alone LVPD also correlates better with zero lag.

Table 8 The mean correlation derived from the Pearson correlation matrix at various time delays compared to the TWD at three different time delays. Correlation values were first transformed to Fisher-z values to mitigate against bias

3.3.3 Modelling Tree Water Deficit

The dataset derived from the 13 flight missions of the 2021 growth season was partitioned into a 70/30 training and validation data split for use with a GAM. Three models were trained with varying input features, including the features LVPD, TIR+RH+AT, and TIR+VPD. The model variations were then applied with the three different time delays (0-h lag, 1-h lag, and 2-h lag) (see Table 9) The models were then used to predict the TWD for the testing datasets, where the RMSE, MAE, and R2 were calculated for accuracy assessment. The models have not been further tested on data outside of the Britz research station beech plot due to a lack of accessible point dendrometers and accompanying TIR tree crown data. In terms of R2, the one-hour lag produced the best results across all three models where the feature combination TIR+RH+AT had the highest R2 of 0.87, an RMSE of 4.92, and MAE of 4 (see Table 9 and Fig. 16). The one-hour lag model with the features VPD+TIR also resulted in a high R2 of 0.81 but also had a high RMSE of 9.76 and an MAE of 7.01. The LVPD model with a one-hour lag also shows promising results with an RMSE of 6.87 and MAE of 5, and an acceptable R2 of 0.71. Most of the models show an almost near-linear relationship, except for RH (see Fig. 16).

Table 9 Overview of the GAMs tested with the three lag variations
Fig. 16
figure 16

The three generalised additive models (GAMs) tested over a lag variation of one hour. a GAM using LVPD. b GAM using TIR, AT, and RH. c GAM using TIR and VPD

4 Discussion

4.1 Thermal Sensor Assessment

Results of the indoor test showed that using a limited number of pixels (< 3), despite being directly adjacent to the leaf temperature sensor, could introduce errors over 1 K. However, the indoor plant experiment provides only an indication of potential error. Different values may occur when testing with beech trees due to their unique leaf morphology and physiology but still support the notion of the importance of a minimum pixel spot size [66, 67]. The spot-size effect occurs when a limited number of pixels are used (i.e., < 3) to extract the temperature of an object. In this context, the object’s temperature can be influenced by nearby surfaces [66] and may also be susceptible to “bad pixels”. It is commonly advertised that TIR array sensors will have no lower than 0.37–1% bad pixels, depending on the manufacturer’s reporting [68, 69]. This implies that a thermal sensor size of 160 × 120 could potentially have up to 192 (from 32,768) faulty pixels. Therefore, it is crucial to work with a minimum spot-size or minimum number of pixels to ensure a sufficient number of “good” pixels from which to average. We tested a minimum of 10 pixels within an extraction polygon; however, FLIR [66] recommends a spot-size with a 10-pixel diameter, which would, in effect, result in an area more than 10 pixels.

In terms of accuracy, the indoor experiments showed an RMSE ranging from 0.55 to 0.73 K, where interestingly, the dryer leaves from the unwatered plant showed lower accuracy. This aligns with the concept that dryer leaves will typically have a lower emissivity (see Table 10), and it is known that objects with lower emissivity can affect measurement accuracy [66, 68, 69, 71].

Table 10 Emissivity ranges for green healthy vegetation and dry vegetation [70]

As for the sensor’s general accuracy, we showed an RMSE of below 1 K during the indoor experiments, which is below the typical commercial thermal sensor accuracy ± 2 K or ± 2% as reported by Vollmer and Möllmann [69]. Accuracy in the field, however, proved otherwise and was more challenging to assess.

Temperature measurement accuracy assessment in the field was somewhat challenging as we had limited possibilities to determine and validate leaf temperature within the upper tree crown. Over the 13 acquisition days, we reported an RMSE of 3.22 K (average of both trees), which is higher than the previously mentioned typical accuracy of 2 K. It is worth noting that the flight missions were carried out under varying weather conditions, including high winds, extreme midday temperatures, changing cloud cover and an onset of autumn senescence near the end of the campaign. Additionally, in both the indoor and field experiments, it was evident that the TIR sensor consistently underestimated the validation measurement, suggesting that a potential offset correction could be alongside, along with further calibration using high-resolution meteorological data. Once again, it should be emphasised that the limitation of having only one leaf temperature sensor located in the upper crown served as a rough estimation but should not be considered a reliable validation source. Our goal is to increase the amount of upper crown leaf temperature sensors for future studies to at least four, while still staying within practical limitations for maintaining a biological validation source. What is crucial is that the sensors are positioned at well-representative spots within the upper tree crown and capture temperature values that the TIR imaging also detect. One question that remains is to what extent typical shade leaves found in the lower tree crown will contribute to the estimation of the TWD. Further testing in this area could assist in developing an improved masking strategy for pixel extraction.

Additional research is essential for utilising leaf temperature sensor in the tree crowns both for validation and to explore calibration techniques. We plan to introduce two more validation trees at the Britz research station, with a minimum of four leaf temperature sensors installed in the upper crown. Additionally we aim to develop and affordable outdoor blackbody to further assess aspects like sensor drift, pre-operational warming, and internal periodic NUC calibration [38, 42] as future areas of inquiry.

4.2 Acquisition Method Assessment

The grid acquisition method for flight missions is the conventional approach capturing TIR imagery via UAVs. In this process, a mosaicked dataset is assembled for the targeted area of interest. Often, RGB imagery is captured simultaneously for better image matching and positional accuracy. Ground control points [72] or onboard RTK systems are also employed for enhanced accuracy [73]. Having a co-registered thermal dataset alongside an RGB or multispectral dataset aids in identifying individual tree crowns, which might otherwise indistinguishable in aerial TIR imagery alone. However, the gridded method presents challenges, as it is time-consuming and requires prior knowledge of the terrain for effective flight planning. Moreover, extended mission durations may necessitate battery changes and could be hindered by line-of-site restrictions [74]. Our results indicate that TIR temperature dispersion can have a SD of up to 0.59 K within a single flight line. Temperature variations can also be several K between flight lines and when acquisitions span more than a minute These discrepancies in TIR temperature readings could become even more pronounced during battery changes, influenced by fluctuating weather conditions.

The single-shot acquisition strategy showcased in this study offers a relatively novel approach. Unlike traditional methods requiring structure-from-motion (SfM) processing, individual images from the Micasense Altum sensor are radiometrically calibrated and affinely transformed. Furthermore, the multispectral bands not only facilitate the segmentation of tree crowns but also the differentiation between sunlit and shaded leaves, as well as woody parts of the tree crown and ground pixels. This single-image acquisition method also offers several advantages. It significantly cuts down on processing time and reduces the duration of flight missions. The centre point of each image serves as a reference, and, using a footprint prediction derived from flying height, azimuth, and sensor intrinsic parameters, it becomes possible to estimate the ground footprint [75]. This estimation then serves as a valuable dataset for ground truthing purposes.

The lower SD for the single-shot acquisition method may be a result of the closer proximity to the tree crown, which can reduce atmospheric interference, particularly on warm and humid days were the RH is high [66]. In addition, being closer to the object of interest allows for a greater number of pixels within the tree crown to be captured. This increases the pool of pixels that can be averaged and facilitates more complex masking procedures. Consistency in the incidence angle could also contribute to minimising outliers, as it helps control variations in emissivity.

Further research should focus on identifying and mitigating sources of error in field operations. Minkina and Dudzik [71] demonstrated through error modelling, the most significant error sources often arise from a combination of object emissivity, RH, and camera-to-object distance (see Table 11). From a technical standpoint, object emissivity error can be minimised by reducing the incidence angles, especially at extreme angles. Similarly, errors due to RH can be diminished by maintaining closer proximity to the object, which also increases the number of pixels available for averaging. This could help mitigate the impact of outliers, such as those arising from senor “bad pixels” or areas of the crown affected by sun glint. Integrating near-real-time local meteorological data, such as RH, AT, and WS synchronously with thermal data acquisition could offer a viable strategy for calibrating thermal imagery under fluctuating environmental conditions. The key challenge remains in reliably obtaining such data during field campaigns.

Table 11 Ranges of potential errors during simulations (adapted from [71])

4.3 Drought Stress Validation

The challenge with detecting drought-induced stress in beech trees lies not only in developing a consistent method for TIR image acquisition but also in finding a practical and reliable approach to validate the extent to which a tree is experiencing drought stress. The TWD can be moderately well explained throughout the growth season by VPD (R2 > 0.659) as well as derivates RH and AT (R2 > 0.659). Drew et al. [49] found that daily variations in TWD were mainly influenced by soil water availability; however, RH and AT also contributed to variability during some periods. Although soil moisture is an important factor for estimating TWD, it was not included in this study due to the impracticality of collecting soil moisture data during field campaigns. Nevertheless, it will be considered in future experiments.

The best modelling results we obtained when using one-hour lag, which was relevant for all variables except for WS and SR, which showed a higher R2 at a two-hour lag. This lag in the TWD and VPD relationship was also reported by Zweifel et al. [59] among spruce, pine, and ash where modelling efficiency was maximised by shifting the VPD in 30-min increments. Interestingly, this was not the case for beech which contrasts with our findings. This suggests that it could recording high-resolution meteorological data like RH and AT, along TIR image acquisition, could help identify appropriate lag parameters. Factors like wind gusts and changing cloud cover make it essential to acquire TIR imagery and meteorological data at that are more representative of that day’s conditions. For instance,, a sudden increase in SR and VPD due to clouds abruptly clearing may not necessarily affect the TWD for that day or even that hour if such occurrences are infrequent. Yet, TIR measurement can be significantly influenced by such rapid changes. Over the 13 missions, this concept was somewhat neglected in this study and mitigating such factors is not always practical or feasible due to time constraints and limited UAV battery life. Despite these challenges, we demonstrated that TWD is highly correlated with the TIR Imagery (r > 0.81) irrespective of lag time and prevailing weather conditions during flight missions. This is promising for the detection of drought at the stand level but does not necessarily imply that within-stand heterogeneity can be reliably obtained for a single acquisition epoch.

With respect to the relationship between TWD and feature variables on an individual tree basis, it is evident that some individuals correlate less at varying lag times (see Fig. 15). This could be interpreted as some individuals having slower reaction times to environmental variables, or it could indicate that the point dendrometer is positioned at a location where stem fluctuations are not well-pronounced. This latter issue can be particularly problematic for larger tree stems, where it is recommended to use multiple point dendrometers to account for potential differences [76]. As for individual feature variables reacting at different lag times, this warrants further research. Modelling could be susceptible to overfitting, especially when accounting for particular microclimates.

The resulting models in this study should be interpreted with caution. In 2021, the research station in Britz did not experience any particular drought stress conditions, especially compared to what was observed from 2018 to 2020. It is important to note is that we are lacking in a full range of TWD data, particularly for beech trees, where TWD values did not exceed 60 μm for the year 2021. Zweifel et al. [50] reported TWD values of up to 500 μm for oak, suggesting that in a drought year, we could observe potentially higher values than 60 μm which would broaden the range. Until then, evaluating our resulting RMSE values remains challenging. However, the current results could prove relevant for establishing within-stand heterogeneity if maintained with a broader dataset range. The GAM was chosen specifically to maintain simple curves, which would be more conducive to extrapolation [77] compared to typical decision tree machine learning algorithms. Nevertheless, extrapolation should be avoided [78], and improvements in TWD modelling will only be possible with more data, especially data ranges obtained during drought stress conditions.

Special care was taken to avoid features repetition within a model [51]. For instance, when VPD was implemented as a feature, RH and AT were excluded to prevent redundant inclusion, as VPD is derived from both. Another consideration is the potential need for limited feature engineering when modelling TWD. Allowing ML algorithms to discover specific feature weightings may require avoiding complex transformations like VPD or LVPD, as these could mask specific weightings and lag differences among feature. LVPD performed moderately well within the TWD model. However, the possibility of using LVPD as a stand-alone index could also be of interest, particularly if an absolute range representing quantifiable drought stress among beech trees for specific region were established.

5 Conclusion

In this study, we introduced the novel approach of using single-shot TIR imaging to obtain promising results for calculating LVPD and estimating TWD. Even in the absence of actual drought events, we successfully modelled variations in TWD during the 2021 growing season using close-range single-shot thermal imaging in conjunction with synchronous meteorological data. Unlike typical UAV-based gridded flight plans and orthomosaic derivatives, close-range single-shot thermal imaging can mitigate the effects of variations of RH and emissivity. This is achieved by reducing incidence angles and sensor-to-object distance, and by increasing the number of pixels available for thermal data extraction. Further field trials are necessary, particularly those that incorporate high-resolution meteorological data synchronised with thermal imaging for calibration purposes. Additionally, there is a need to improve the validation of thermal imaging accuracy. This can be accomplished by increasing the number of crown-based leaf temperature sensors and employing a field-based blackbody. This research serves as an important stepping stone towards incorporating thermal imaging for quantifying drought stress, among other applications, at intensive monitoring sites.