High-throughput phenotyping of two plant-size traits of Eucalyptus species using neural networks

In forest modeling to estimate the volume of wood, artificial intelligence has been shown to be quite efficient, especially using artificial neural networks (ANNs). Here we tested whether diameter at breast height (DBH) and the total plant height (Ht) of eucalyptus can be predicted at the stand level using spectral bands measured by an unmanned aerial vehicle (UAV) multispectral sensor and vegetation indices. To do so, using the data obtained by the UAV as input variables, we tested different configurations (number of hidden layers and number of neurons in each layer) of ANNs for predicting DBH and Ht at stand level for different Eucalyptus species. The experimental design was randomized blocks with four replicates, with 20 trees in each experimental plot. The treatments comprised five Eucalyptus species (E. camaldulensis, E. uroplylla, E. saligna, E. grandis, and E. urograndis) and Corymbria citriodora. DBH and Ht for each plot at the stand level were measured seven times in separate overflights by the UAV, so that the multispectral sensor could obtain spectral bands to calculate vegetation indices (VIs). ANNs were then constructed using spectral bands and VIs as input layers, in addition to the categorical variable (species), to predict DBH and Ht at the stand level simultaneously. This report represents one of the first applications of high-throughput phenotyping for plant size traits in Eucalyptus species. In general, ANNs containing three hidden layers gave better statistical performance (higher estimated r, lower estimated root mean squared error–RMSE) due to their greater capacity for self-learning. Among these ANNs, the best contained eight neurons in the first layer, seven in the second, and five in the third (8 − 7 − 5). The results reported here reveal the potential of using the generated models to perform accurate forest inventories based on spectral bands and VIs obtained with a UAV multispectral sensor and ANNs, reducing labor and time.


Introduction
With the recent growth in the forestry sector in Brazil, the planted forest area in 2018 reached 7.83 million ha, representing 1.3% of the GDP and 6.9% of the industrial GDP (IBÁ 2019). Species of Eucalyptus are favored for wood production in commercial forests in Brazil as one of the main strategies to reduce the deforestation of native forests and supply wood for energy and timber (Ferraz et al. 2019).
One of the factors determining the success in obtaining high yielding forests is the choice of species since each species has specific characteristics and adaptation to edaphoclimatic conditions. According to Brisola and Demarco (2011), E. grandis grows well but has little resistance to drought, while E. urophylla produces wood with a slightly higher density but slower growth. Breeding programs have helped Abstract In forest modeling to estimate the volume of wood, artificial intelligence has been shown to be quite efficient, especially using artificial neural networks (ANNs). Here we tested whether diameter at breast height (DBH) and the total plant height (Ht) of eucalyptus can be predicted at the stand level using spectral bands measured by an unmanned aerial vehicle (UAV) multispectral sensor and vegetation indices. To do so, using the data obtained by the UAV as input variables, we tested different configurations (number of hidden layers and number of neurons in each layer) of ANNs for predicting DBH and Ht at stand level for different Eucalyptus species. The experimental design was randomized blocks with four replicates, with 20 trees in each experimental plot. The treatments comprised five Eucalyptus species (E. camaldulensis, E. uroplylla, E. saligna, E. grandis, and E. urograndis) and Corymbria citriodora. DBH and Ht for each plot at the stand level were measured seven times in separate overflights by the UAV, so that the multispectral sensor could obtain spectral bands to calculate vegetation indices (VIs). ANNs were then constructed using spectral bands and VIs as input layers, in addition to the categorical 1 3 provide genotypes with high resistance to pests, good adaptability to various climatic conditions, and desirable characteristics for the timber industry, such as E. urograndis, a hybrid of E. grandis and E. urophylla (Fernández et al. 2018). E. saligna is known to have higher wood density than E. grandis (Batista et al. 2010). According to Azevedo et al. (2015), E. camaldulensis has high adaptability in less fertile soils and resistance to drought. Corymbia citriodora a gum tree endemic to Australia, stands out for the high mechanical resistance of its wood (Morais et al. 2010).
For assessing tree performance in forest inventories in delimited field plots, diameter to breast height (DBH) and total height (Ht) of the trees are the most commonly selected dendrometric variables to measure directly. These variables are then used to generate estimates of production variables (volume, biomass), which are then extrapolated to the total plantation area. However, such inventories are expensive; plots that are representative of the entire population must be taken out of production, and the highly precise measurements that are required for decision-making require skilled labor and are complicated due to the complexity of forest formations .
For consecutive evaluations of quantitative and qualitative variables of interest in forest breeding, high-throughput phenotyping can be time-consuming and provide a low level of spatial and temporal information. Thus, to improve the efficiency of phenotyping, new tools with high precision and data quality must be adopted (Sankaran et al. 2019). For this purpose, remote sensing techniques make it possible to measure plant traits with high precision in a shorter field time (Jay et al. 2017). Thus, remote sensing can accurately and quickly provide DBH and Ht data for phenotyping and thus reduce the costs of forest inventories.
Currently, there are multispectral sensors aboard space platforms, and more recently, on aerial platforms such as unmanned aerial vehicles (UAVs) that can assist with phenotyping. Spectral bands are measured in the region from visible to near-infrared spectral bands of the electromagnetic spectrum, allowing a wide range of vegetation indices (VIs), which are mathematical ratios, to be calculated (Rezzouk et al. 2019). One of the main VIs used for high-throughput phenotyping is the normalized difference vegetation index (NDVI) (Hentz et al. 2018). Like all indices created to simplify what are otherwise complex amalgamations of data, the NDVI has great appeal in commercial agriculture and land-use studies because of its ability to quickly delineate vegetation and vegetative stress (Huang et al. 2020).
In forest modeling to estimate the volume of wood, artificial intelligence has been shown to be quite efficient, especially using artificial neural networks (ANNs). This technique has high generalization power and is better for generating nonlinear models unknown to the modeler, among other characteristics, in relation to the regression models (Vieira et al. 2018). Recently, ANNs have been used to estimate the volume of wood using inputs such as DBH, Ht, among others (Soares et al. 2012;Bhering et al. 2015;Miguel et al. 2016;Azevedo et al. 2020). However, to the best of our knowledge, there are no reports of the use of a UAV multispectral sensor to obtain variables as input in ANNs to predict DBH and Ht of Eucalyptus trees.
We hypothesized that the DBH and Ht at the stand level in Eucalyptus can be predicted using spectral bands and vegetation indices. Therefore, we tested different configurations (number of hidden layers and number of neurons in each layer) of ANNs for predicting DBH and Ht at stand level in different Eucalyptus species using input variables obtained by UAV-multispectral sensor.

Experimental site and setup
The experiment was installed in January 2014 in the experimental area of the Federal University of Mato Grosso do Sul, Chapadão do Sul campus. The altitude is 820 m a.s.l. The soil is classified as medium-textured Red Oxisol. According to the Köppen classification, the climate is tropical humid (Aw) with a rainy season from October to April and a dry season between May and September. Average rainfall varies from 750 to 1800 mm a −1 , and average annual temperature varies from 20 °C to 25 °C (Peel et al. 2007).
The experimental design was randomized blocks with four repetitions, with 20 plants in each experimental plot. The treatments were composed of five Eucalyptus species (E. camaldulensis, E. uroplylla, E. saligna, E. grandis and E. urograndis) and Corymbria citriodora.

Evaluated variables
The diameter at breast height (DBH) and total plant height (Ht) at stand level were obtained by measuring the same five trees in each experimental unit throughout the 1-year study. A tape measure was used to measure the circumference at breast height, which was later converted to DBH. The Ht (m) was obtained with the aid of a Haglof 1 3 hypsometer. The variables were measured seven times for each tree (11/01/2018, 12/06/2018, 01/22/2019, 03/29/2019, 05/10/2019, 10/30/2019, and 11/28/2019). Figure 1 shows the climatic conditions in the experimental area during these assessments.
The characteristics related to high-throughput phenotyping were obtained on the same dates using the senseFly eBee (Cheseaux-sur-Lausanne, Switzerland) RTK fixed-wing unmanned aerial vehicle (UAV) with autonomous flight control (Fig. 2). We used a Parrot Sequoia sensor (Prilly, Switzerland), a multispectral camera for agriculture that uses a sunshine sensor and an additional RGB camera 16 MP for scouting. Radiometric calibration was performed for the entire scene, based on calibrated reflective surfaces. The Pix4D mapper (Prilly, Switzerland) software was used to correcting the parameters of solar irradiation, and the reflective target of the camera with reflectance calibration plate was individualized for each device. For the entire scene, information was obtained on the reflectance rates for each spectral band measured by the multispectral sensor. This procedure is performed in the field immediately using senseFly e-Motion software (Cheseaux-sur-Lausanne, Switzerland) before the flight. Because the flight has a maximum duration of 15 min, there was no need to repeat the calibration after the flight in the field.
The multispectral sensor had a horizontal field of view (HFOV) of 61.9°, vertical field of view (VFOV) of 48.5°, and diagonal field of view (DFOV) of 73.7°. Multispectral reflectance images were obtained for green (550 nm ± 40 nm), red (660 nm ± 40 nm), red-edge (735 nm ± 10 nm) and near-infrared (Nir, 790 nm ± 40 nm) spectral bands. The obtained values were then used to calculate the following VIs according to Table 1: normalized difference vegetation index (NDVI), soil-adjusted vegetation index (SAVI), green normalized difference vegetation index (GNDVI), normalized difference red-edge (NDRE), simplified canopy chlorophyll content index (SCCCI), enhanced vegetation index (EVI) and modified soil-adjusted vegetation index (MSAVI). These VIs are the main ones that can be calculated with the Qi et al. (1994) 1 3 spectral bands obtained. In addition, they have already been used for high yield phenotyping in agricultural species, due to their correlation with plant biomass Ramos et al. 2020).
The overflights were performed with 80% lateral and 85% longitudinal overlap of the images, and the same area was imaged twice using perpendicular flight lines. The increase in the overlap between the images was necessary to obtain a greater number of scenes containing the same control points, allowing greater accuracy in the mosaic of the images by the Pix4Dmapper software. This need is based on plant height, because the stem oscillates as a function of the wind, and regardless of the speed, interferes in the mosaicking process. The overflight was performed at 100 m altitude, allowing a spatial image resolution of 0.10 m. The overflights were carried out near the zenith due to the minimization of the shadows of the trees at 11 a.m., given that the multispectral sensor is a passive type (i.e., dependent on solar luminosity).

Statistical analyses
To verify the existence of plant size differences among the Eucalyptus species, a joint analysis of variance (ANOVA) was performed according to the statistical model in Eq. 1.
where Y ijk is the observation in the k-th block evaluated in the i-th species and j-th measurement, µ is the overall mean, B k is the block effect considered as fixed, E i is the species effect considered as fixed, M j is the measurement effect considered as random, EM ij is the random effect of the interaction between species i and measurement j, and e ijk is the random error associated with Y ijk observation. (1) Pearson's correlation coefficients (r) were estimated to verify the association between grown traits with spectral bands and VIs according to Eq. 2: where COV XY is the covariance between the traits X and Y, ̂2 x is the phenotypic variance of trait X, and ̂2 y is the phenotypic variance of trait Y. We used the correlation network to graphically express the results. In this procedure, green lines link variables that are positively correlated and red lines to join negatively correlated variables. The line thickness is proportional to the magnitude of the correlation.
Subsequently, for developing and training the ANNs, the Intelligent Problem Solver tool of stattista STATIS-TICA 7.0 (Hamburg, Germany) was used. This tool performs data mining, i.e., normalizes data in the 0-1 range, tests different network architectures (multilayer merceptron [MLP] or radial basis function [RBF]), and selects the best performing networks. In MLP networks, the input layer was composed of spectral bands and vegetation indices, besides the categorical variable species; the output layer was composed of two layers corresponding to DBH and Ht at stand level. For intermediate layer(s), we used the activation function f(x) = logistic (Eq. 3) applied to each neuron, which uses the scalar product of the input vector (x) and the weight vector (w) associated with this node. For these networks, topologies containing two and three layers with one to 15 neurons each and a high degree of connectivity between the neurons were tested, which is defined by synaptic weights (Fig. 3).
(2) where x is a binary value that represents the activation of the neuron (1) versus non-activation (0). The training used was feedforward by the supervised method. Therefore, 3600 ANN topologies were tested, composed of the following combinations: MLP with two hidden layers (15 × 15 possibilities) and MLP with three hidden layers (15 × 15 × 15 possibilities). Only the 10 ANNs with the highest linear correlation between observed versus predicted volumes in the training step were saved. For training, 80% of the data was used, and the remaining 20% were used to validate the 10 best neural networks.
For selecting the best ANNs in each step (training and validation), Pearson's correlation coefficient ( r XY ; Eq. 4) and root mean square error (RMSE; Eq. 5) were used.
where COV XY is the covariance between the observed (X) and predicted (Y) values; ̂2 x is the variance of observed values; ̂2 y is the variance of predicted values.
where Ŷ i is the mean of the observed values; n is the total number of observations.

Results and discussion
The joint ANOVA identified that a statistically significant difference between the species for all the traits evaluated ( Table 2). The effect of measurements was significant (p ≤ 0.05) for most traits, except for red and red-edge spectral bands and NDVI, SCCCI and EVI. There was no species × measurements interaction for DBH or Ht at the stand level, showing that the species E. grandis was the tallest at each date. It is important to note that the estimates of the coefficient of variation were less than 20% for all traits evaluated, which is considered high precision for forest experiments (Garcia 1989).
Among the species studied, E. grandis was largest for DBH and Ht at the stand level at all measurements, and E. camaldulensis was the smallest for both variables (Fig. 4). E. grandis, besides having the largest size, also had a smaller standard error, i.e., the plants had a distribution closer to the mean. E. urograndis was not as large as E. grandis, but larger than the other species. However, E. urograndis grows well in the study region, so the higher means for E. grandis proves that is better adapted to the conditions. Macedo et al. (2006)  The study site has adequate pluviometric rates, resulting in a satisfactory plant size of E. grandis, which has low drought resistance (Embrapa 2010). E. camaldulensis, however, is susceptible to the insect pest psilidus (Camargo 2011), which is present in the experimental region, and the trees were smaller as a consequence of psilidus attack. Considering the great variability obtained between species in different measurements, this data set is robust for training and validating ANN models.
The correlation network in Fig. 5 shows a high and positive correlation between the plant size traits (DBH and Ht). The red-edge and NIR spectral bands had a positive linear correlation with the plant size traits, so these spectral bands stood out for estimating DBH and Ht. The vegetation indices NDVI, GNVDI, EVI, NDRE, MSAVI, SAVI, and SCCCI were poorly correlation with the plant 1 3 size traits, which can be explained by the relationship between these variables with the canopy of the Eucalyptus species. The higher the leaf area index (LAI), the higher will be the VI estimates. However, for forest species, the presence of abundant leaves indicates the presence of trees, hence a lower commercial wood volume. Marsden et al. (2010) also found low correlations between the NDVI and the mean annual increment (MAI), where this correlation was progressively increasing throughout the plant size and establishment of the forest. Bikindou et al. (2012) compared multiple regressions related to soil properties, and the NIR index was correlated with dominant height for productive capacity classification. According to these authors, the NIR provided a better correlation and a low mean square error for better predicting index variations in the studied area. Maire et al. (2011) used MODIS reflectance and NIR to estimate LAI and found significant correlations; however, NIR overestimates the variable at young ages.
Forest tree species produce more stems and leaves at lower Ht and DBH (Almeida et al. 2015). LAI is also larger at younger ages when the development in Ht and DBH is lower. With increasing age, DBH and Ht increase, and leaf area decreases. The VIs follow the leaf mass, i.e., the greater the leaf production, the higher the vegetation index, thus explaining the negative and low magnitude correlation between vegetation indices and plant size traits at the stand The low association between spectral bands and VIs with the plant size traits demonstrates that nonlinear techniques must be used to predict these variables. In this sense, ANNs stand out because they have the ability to learn in a nonlinear way, and through the input data, they are able to generate their architecture and free parameters that can predict the output data with high accuracy in the training process (Costa et al. 2019). According to Nieto et al. (2012), ANNs have been widely used to solve regression problems with good performance in forest studies and are suitable for more complex modeling situations.
The statistical performance of the 10 best ANNs is presented in Table 3. The results show a good ability to simultaneously predict plant size traits (DBH and Ht) at the stand level using the variables obtained with high-throughput phenotyping (spectral bands and VI's). Overall, persons correlation coefficient (r) values between the estimated and predicted data increased at the validation step compared to the training step, but RMSE was reduced at the validation step. In general, artificial neural networks containing three hidden layers showed better statistical performance (higher estimate of r and lower estimate of RMSE) due to their greater capacity for self-learning. These results show that there was no overfitting for the 10 best ANNs. Vendruscolo et al. (2015) found similar results when using ANNs to estimate Ht as a function of DBH, with r close to 0.90 and the RMSE less than 10%.
The high-throughput phenotyping associated with ANNs made it possible to accurately predict (high correlation, low error) the plant size at the stand level of Eucalyptus species. Thus, the ANNs trained in this study make it possible to predict DBH and Ht of eucalyptus trees simultaneously at stand level, that is, when feeding the ANN with spectral bands and vegetation indices. Thus, the costs of forest inventories are reduced, since only VIs and spectral bands are needed to predict the variables measured in the inventory process.

Conclusions
In general, artificial neural networks (ANNs) containing three hidden layers showed better statistical performance (higher estimate of r, lower estimate of RMSE) due to their greater capacity for self-learning. Among these ANNs, the best contains eight neurons in the first layer, seven in the second, and five in the third (8 − 7 − 5). Thus, the models associated with ANNs can provide accurate forest inventories using spectral bands and VIs obtained with a multispectral sensor on a UAV, reducing time and labor.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.