Background

Relative growth rate (RGR) and fruit number are two essential parameters of plant performance and fitness [1,2,3]. Proper estimation of RGR is achieved with the destructive measurement of plant biomass across several individuals sequentially harvested [4, 5]. However, sequential harvesting is space and time consuming, which makes this approach inappropriate for large-scale studies. Furthermore, it is problematic for evaluating measurement error, as well as to compare growth dynamics and fitness-related traits, like fruit production, on the same individuals. Thus, a variety of platforms and equipment have been developed in the last decade for high-throughput phenotyping of plant growth from image analysis, specifically in crops [6,7,8,9,10] and in the model species A. thaliana [11,12,13,14]. Because commercial technologies are powerful but generally expensive [6, 8, 11, 13], low-cost methods have been proposed, for instance to estimate rosette expansion rate from sequential imaging of A. thaliana individuals [14,15,16]. These methods can be adapted to a variety of lab conditions, but they do not allow the quantification of complex traits like biomass accumulation, RGR and fruit production.

Strong variation in RGR has been reported across and within plant species [17,18,19,20,21,22], which has been assumed to reflect the inherent diversity of strategies to cope with contrasting levels of resource availability [3, 23, 24]. For instance, species from resource-scarce environments generally show a lower RGR than species from resource-rich environments, even when they are grown in non-limiting resource conditions [25, 26]. Ecophysiological studies [18, 26] have shown that plant RGR depends on morphological traits (e.g. leaf mass fraction, leaf dry mass per area) and physiological rates (e.g. net assimilation rate) that differ between species, genotypes, or ontogenetic stages. For instance, plants become less efficient to accumulate biomass as they get larger and older, resulting in a decline of RGR during ontogeny [4]. This is due to developmental and allometric constraints such as self-shading and increasing allocation of biomass to supporting structures, like stems, in growing individuals.

To assess plant performance, response to environment, or genetic effects, it is important to link individual’s growth trajectory to productivity, yield or reproductive success. However, while several methods have been proposed to estimate growth dynamics from image analysis [8, 11,12,13,14,15,16], methodologies for automated, high-throughput phenotyping of fruit number per plant remain surprisingly scarce [27, 28]. Yet, the analysis of inflorescence images in A. thaliana could offer a valuable tool to connect growth dynamics and plant fitness. Because of its small size, inflorescences can easily be collected, imaged and analyzed with simple equipment. Furthermore, the genetic resources available in this species enable large-scale analyses (mutants screening, quantitative trait loci mapping and genome-wide association studies). For instance, the recent analysis of 1135 natural accessions with complete genomic sequences [29] allows conducting large comparative analysis of phenotypic variation within the species [30, 31].

With the methods proposed here, we aimed at developing flexible and customizable tools based on the automated computerization and analysis of plant images to estimate fruit number and growth dynamics, including RGR throughout the life cycle. We focused on A. thaliana because it is a widely used model in plant science and also increasingly being used in ecology, although the same approach could be performed on other rosette-shaped species. The estimation of biomass accumulation was semi-invasive, as it requires sacrificing some individuals to train a predictive model. This approach considerably reduced the number of plants needed to estimate RGR during ontogeny, from seedling establishment to fruiting. Furthermore, the estimation of fruit number from automated image analysis of A. thaliana inflorescences could greatly help link growth variation to plant performance and fitness, in various genotypes and environmental conditions.

Results

Estimation of biomass accumulation, RGR and growth dynamics

Description

The method for growth analysis requires a set of plants on which we want to non-destructively measure dry mass, and a set of individuals harvested to train a predictive model (Fig. 1). In the case study presented here, we evaluated the method on 472 genotypes of A. thaliana grown in trays using a growth chamber equipped with Raspberry Pi Automated Phenotyping Array (hereafter RAPA) built at the Max Planck Institute (MPI) of Tübingen. We partitioned the whole population (n = 1920) in two subpopulations: the focal population (n = 960) on which growth dynamics (and fruit production) were measured, and the training population (n = 960) on which a predictive model of plant dry mass was developed.

Fig. 1
figure 1

Estimation of plant dry mass from image analysis and statistical modeling. a Example of sequential tray images, analyzed with ImageJ to extract individual rosette shape descriptors during ontogeny. b Dry rosettes weighed at 16 DAG in the training population. c Series of cross-validation performed for different predictive models with different training population size (x axis). Dots represent mean prediction accuracy, measured as Pearson’s coefficient of correlation (r2) between observed and predicted values. Error bars represent 95% confidence interval (CI) across 100 random permutations of the training dataset. d Correlation between observed and predicted values from the best model obtained after stepwise regression, performed 60 individuals to train the model, and tested on 300 individuals not used to train the model

Individuals of the focal population were daily photographed during ontogeny (Fig. 1a), and harvested at the end of reproduction when the first fruits (siliques) were yellowing (stage 8.00 according to Boyes et al. [32]). Top-view images were manually taken during the first 25 days of plant growth (Additional file 6: Fig. S1). Plants of the training population were harvested at 16 days after germination (DAG), dried and weighed for building a predictive model of rosette biomass with top-view images (Fig. 1b). Predictive models were trained and evaluated with a cross-validation approach (Fig. 1c). Once a predictive model has been chosen and validated, rosette dry mass can be non-destructively estimated on all individuals of the focal population, which allows modeling growth trajectory, biomass accumulation and RGR throughout the plant life cycle.

Implementation

We developed an ImageJ [33] macro (Additional file 1) to extract shape descriptors of the rosette from tray or individual pot images (Fig. 1a). The macro guides users in the different steps of image analysis to label plant individuals, perform segmentation and measure rosette shape descriptors. It processes all images (trays or individual pots) present in an input folder, and returns shape descriptors of individual rosettes in an output folder defined by users. Shape descriptors include individual rosette area (RA) and perimeter (Perim) in pixels, rosette circularity \(\left(Circ = 4\pi \times \left( {\frac{RA}{{Perim^{2} }}} \right) \right)\), aspect ratio \(\left(AR = \frac{Major\;axis\;length}{Minor\;axis\;length}\right)\), and roundness \(\left(Round = \frac{4 \times RA}{{\pi \times Major\;axis\;length^{2} }} \right)\). Rosette area and perimeter can be converted into cm2 and cm, respectively, by measuring the area and perimeter of a surface calibrator defined by users.

Predictive models of plant dry mass from shape descriptors were tested against measurements in the training population (R code in Additional file 2). Depending on the training population size, we observed variable prediction accuracy for different models, as measured by the coefficient of correlation (r2) between measured and predicted rosette dry mass in individuals not used to train the model (Fig. 1c). LASSO and RIDGE models reached high prediction accuracy even with very small training population size (< 20 individuals). However, with a minimum of 50 training individuals, lm and RIDGE/LASSO performed equally, with a prediction accuracy > 90%. Using stepwise regression, we showed that using only rosette area and circularity as predictors in a simple linear model framework can reach high prediction accuracy (r2 = 0.91, Fig. 1d). Thus, the final equation we used to estimate rosette dry mass from rosette pictures was \(Rosette\;DM = - 0.00133 + 0.00134 \times RA + 0.00274 \times Circ\) (cross-validation r2 = 0.91, Fig. 1d).

Application

From estimated rosette dry mass during the ontogeny and final rosette dry mass measured at the end of the life cycle (maturity), we modeled sigmoid growth curves of biomass accumulation (mg), M(t), for all individuals in the focal population with a three-parameter logistic function [4, 34] (Fig. 2a, b), as in Eq. 1:

$$M\left( t \right) = \frac{A}{{1 + e^{{\frac{{t_{inf} - t}}{B}}} }}$$
(1)

where A, B and tinf are the parameters characterizing the shape of the curve, which differ between individuals depending on the genotypes and/or environmental conditions. A is the upper asymptote of the sigmoid curve, which was measured as rosette dry mass (mg) at maturity. The duration of growth was estimated as the time in days between the beginning of growth after vernalization (t0) and maturity. B controls the steepness of the curve, as the inverse of the exponential growth coefficient r (r = 1/B). tinf is the inflection point that, by definition, corresponds to the point where the rosette is half the final dry mass. Both B and tinf were estimated for every individual by fitting a logistic growth function to the data in R (Additional file 3).

Fig. 2
figure 2

Application of the dry mass estimation method to model growth dynamics in A. thaliana. Statistical modeling of rosette dry mass during ontogeny, M(t), with three-parameter logistic growth curve, on one individual (a) and 451 natural accession accessions (b); absolute growth rate during ontogeny, GR(t), on one individuals (c) and the 451 accessions (d); relative growth rate during ontogeny, RGR(t), on one individuals (e) and the 451 accessions (f). tinf (red dashed line) represents point of growth curve inflection. Individuals on the right panels are colored by duration (days) of plant life cycle. (gi) Variation of M(tinf), GR(tinf) and RGR(tinf) across the 451 accessions phenotyped, with broad-sense heritability (H2) on the top-left corner of each panel. Dots represent genotypic mean ± standard error (n = 2)

Growth dynamics variables were computed from the fitted parameters, such as GR(t), the derivative of the logistic growth function (Fig. 2c, d), as in Eq. 2:

$$GR\left( t \right) = rM\left( t \right) \times \left( {1 - \frac{M\left( t \right)}{A}} \right)$$
(2)

and the relative growth rate (mg d−1 g−1), RGR(t), measured as the ratio GR(t)/M (t) (Fig. 2e, f), as in Eq. 3:

$$RGR\left( t \right) = r \times \left( {1 - \frac{M\left( t \right)}{A}} \right)$$
(3)

Comparing growth traits measured at tinf, i.e. when GR is maximal for all individuals [4], revealed important variations between accessions (Fig. 2g–i), with an important part of phenotypic variance accounted by genetic variability, as measured by broad-sense heritability (H2 = 0.93, 0.90 and 0.65 for M(tinf), GR(tinf) and RGR(tinf), respectively). To evaluate the robustness of the method, we repeated an experiment on 18 accessions selected for their highly contrasted phenotypes (Additional file 6: Fig. S2). Results showed a good correlation between the rosette dry mass at the inflection point estimated in the first experiment and the dry mass destructively measured in the second experiment (r2 = 0.67; Additional file 6: Fig. S3a).

Estimation of fruit number from inflorescence images

Description

The method to estimate fruit number from inflorescence images requires to manually counting fruits on a fraction of individuals in order to train predictive models (Fig. 3). All individuals were harvested at the same stage, when the first fruits started to dry. Inflorescence and rosette of individuals of the focal population were separated and both photographed (Fig. 3a). Fruits were manually counted on the inflorescence images of 352 out of 856 plants harvested (Fig. 3b). In parallel, we analyzed the inflorescence skeletons of all the 856 harvested plants with a dedicated ImageJ macro (Additional file 4). Using the skeleton descriptors computed with the macro and manual measurements in the population subset, we evaluated the accuracy of different models to predict the number of fruit per individual (Fig. 3c), and applied the best model to the whole focal population.

Fig. 3
figure 3

Estimation of fruit number from image analysis and statistical modeling. a Example of inflorescence images, analyzed with ImageJ to extract individual skeleton descriptors after segmentation and 2D skeletonization. b Manual counting (purple dots) of fruit number on a subset of inflorescence images. c Series of cross-validation performed for different predictive models with different training population size (x axis). Dots represent mean prediction accuracy, measured as Pearson’s coefficient of correlation (r2) between observed and predicted values. Error bars represent 95% CI across 100 random permutations of the training dataset. d Correlation between observed and predicted values for cross-validation of the best model obtained with stepwise regression, performed 60 individuals to train the model, and tested on 100 individuals not used to train the model

Implementation

For all images present in the input folder, the “RAPAmacro_InflorescenceSkeleton.txt” macro (Additional file 4) automatically performs image segmentation, skeletonization, and computation of 2D skeleton parameters of inflorescence (Fig. 3a). 2D skeletons analysis with ImageJ returns nine vectors of variables for each plant (described in Fig. 3), which were automatically saved as.xls files by the macro (in an output folder defined by user). The sums of these nine vectors per individual were used as nine predictors of fruit number.

Using the same approach as for estimating rosette dry mass, we tested different models and different training population size with cross-validation (R code in Additional file 5). As for rosette dry mass, results showed that the nine skeletons descriptors predict > 90% of fruit number in 100 individuals not used to train the model (Fig. 3c). With a training population size > 30 individuals, lm performed equally than LASSO and RIDGE regressions. As for dry mass estimation, quadratic models performed poorly. For small training population size, LASSO and RIDGE regressions reached higher prediction accuracy than linear or quadratic models. Using stepwise regression, we showed that the best model to estimate fruit number in a linear model framework is: \(Fruit\;Nb = 0.181 \times Nb\;actual\;junctions + 0.003 \times Nb\;slab\;pixels + 0.226 \times Nb\;triple\;points\) (cross-validation r2 = 0.91, Fig. 3d).

Application

The model to estimate fruit number from inflorescence images was applied on all individuals of the focal population (Fig. 4a). We measured a relatively high broad-sense heritability for fruit production across accessions (H2 = 0.68), compared to H2 estimates of morphological and physiological traits measured in previous studies [35]. In addition, fruit number estimated from image analysis was well correlated with fruit number manually counted on 18 genotypes phenotyped in a second experiment (r2 = 0.70; Additional file 6: Fig. S3b). To further validate the method, we applied the predictive model on an independent set of inflorescence images taken at the Center for Plant Molecular Biology (ZMBP, University of Tübingen) on the rev5 knock-out mutant. Compared to wild-type Col-0, rev5 produced less fruits due to the effect of the mutation on branching pattern and flower development [36]. This was well captured by the predictive model (Fig. 4b), yet trained on the natural accessions.

Fig. 4
figure 4

Application of the method to estimate fruit number in natural accessions and rev5 mutant of A. thaliana. a Variability in fruit number across 441 natural accessions, with broad-sense heritability (H2) on the top-left corner. Dots represent genotypic mean ± standard error (n = 2). b Prediction of fruit number (mean ± 95% CI) from model trained on accessions and applied to rev5 mutant and Col-0 wild-type (n = 5). Results are compared to observed fruit number manually counted at harvesting

Discussion

Arabidopsis thaliana is the most widely used plant species in molecular biology, ecology and evolution, but we still largely ignore how growth dynamics is related to individual performance and fitness [37]. This is mainly because traits like RGR and fruit number remain difficult to measure in large scale experiments. Thus, our goal was to develop a set of tools for biologists to analyze these traits with low-cost equipment. Rather than the development of a new methodology or algorithm, we propose an application guide for the implementation of image computerization with free software (R, ImageJ). From simple top-view imaging of rosette and inflorescence, we built robust predictive models of plant dry mass and fruit number. Based on a semi-invasive approach and two computing steps—one to analyze images with ImageJ and one to model data with R –, the method allows a single experimenter to simultaneously measure biomass accumulation, RGR and fruit production over thousands of plants.

For rosette-shaped species like A. thaliana, top view pot or tray imaging can easily be done in any laboratory or facilities. In this study, we used pictures of trays manually taken during ontogeny with a regular camera. The same approach has been proposed in low-cost systems for high-throughput phenotyping in A. thaliana, using projected rosette area to measure growth during several hours or days [14,15,16]. Comparatively, our method allows measuring the absolute and relative rate of biomass accumulation during the whole life cycle of a plant. The time lapse and frequency of tray imaging is important for proper fitting of the growth curve. We used daily imaging during the 25 first days of growth after vernalization, although growth curves can be fitted with only one picture every 2–3 days. The ImageJ macro we developed here automatically processes tray images when plants are young and do not overlap. When they become too large (20–25 DAG in our study), the macro offers the possibility to spatially separate plants (manual mode). We estimated that, on a desktop computer, the macro takes approximately 20–25 s per tray (30 individuals) when running on automatic mode, and between 1 and 2 min in manual mode (depending on the number and amplitude of corrections to make).

The semi-invasive approach drastically reduces the number of replicates necessary to measure the growth dynamics, or the time needed for manual measurement of fruit number. Furthermore, it allows experimenter to compute biomass accumulation non-destructively until the end of the life cycle, and thus, to compare growth and reproductive success on the same individuals. We showed that the method is robust and reproducible across experiments. Moreover, the model for fruit prediction correctly predicted the decrease in rev5 due to flower abortion in a complete independent experiment. However, we recommend making a new predictive model of plant biomass with cross-validation for each experiment (code example available in Supplementary File 2), specifically if growth conditions change, as the relationship between rosette morphology and rosette biomass is expected to differ depending on genotypes and environments. Furthermore, our approach for estimating growth dynamics was powerful in A. thaliana, a rosette-shaped species for which size can be estimated from on 2D images. Although our method must be efficient in other rosette-shaped species, biomass estimation in plants with complex 3D architecture requires more sophisticated image analysis. A recent study in maize offers a nice example of 3D reconstruction and biomass prediction with a dedicated phenotyping platform [8]. The same limitations hold true for the estimation of fruit number: our image-based method can only be performed on species with inflorescences that can be imaged on a 2D plan.

In this study, we propose flexible methods and customizable tools for researchers to characterize plant phenotype in their own facilities. This should lower the barriers of high-throughput phenotyping, and help dissect the relationships between growth dynamics and reproductive success in various laboratory conditions. Methods were developed for A. thaliana, which is the favorite model in plant genetics and molecular biology, and it is also becoming a model in evolutionary biology and ecology [30, 31, 37,38,39]. We hope that these tools will encourage researchers to analyze complex traits and fitness components in various conditions and genotypes, thus participating to the effort to better understand the physiological bases of plant adaptation.

Methods

Plant material

472 natural accessions of A. thaliana were selected from the initial germplasm of the 1001 Genomes project [29] (http://1001genomes.org/; Additional file 7: Table S1). Seeds used in this study were obtained from parental plants propagated under similar conditions in greenhouse. All seeds were stored overnight at − 80 °C and surface-sterilized with 100% ethanol before sowing. A transgenic line of A. thaliana affecting in branching pattern and fruit production was used: rev5, which is a strong ethyl-methylsulfonate (A260 V) knock-out mutation of REVOLUTA in the Col-0 background [36].

Growth conditions

Plants were cultivated in hydroponics, on inorganic solid media (rockwool cubes) watered with nutrient solution [40]. Four replicates of 472 accessions were grown, with pots randomly distributed in 64 trays of 30 pots each. Seeds were sowed on 3.6 cm × 3.6 cm  × 3 cm depth rockwool cubes (Grodan cubes, Rockwool International A/S, Denmark) fitted in 4.6 cm (diameter) × 5 cm (depth) circular pots (Pöppelmann GmbH and Co., Germany). The pots were covered with a black foam disk pierced in the center (5–10 mm hole manually made with a puncher). Before sowing, the dry rockwool cubes were watered with 75% strength nutrient solution. The chemical composition of the nutrient solution was obtained from Conn et al. [40].

After sowing, trays were incubated for 2 days in the dark at 4 °C for seed stratification, and then transferred for 6 days to 23 °C for germination. After germination, all plants, having two cotyledons, were vernalized at 4 °C during 41 days to maximize the flowering of all the different accessions. Plants were thinned to one plant per pot, and trays were moved to the RAPA room, set to 16 °C with a temperature variability of close to ± 0.1 °C, air humidity at 65%, and 12 h day length, with a PPFD of 125–175 µmol m−2 s−1 provided by a 1:1 mixture of Cool White and Gro-Lux Wide Spectrum fluorescent lights (Luxline plus F36 W/840, Sylvania, Germany). All trays were randomly positioned in the room, and watered every day with 100% strength nutrient solution.

Replicates 1 and 2 (the focal population, n = 960) were harvested when the first fruits started to dry. Due to germination failure, mortality or missing data, only 451 accessions were phenotyped for growth, and 441 for fruit number. Replicates 3 and 4 (the training population, n = 960) were harvested at 16 DAG for dry mass measurement.

A second experiment was performed on a set of 18 contrasted accessions (Additional file 6: Fig. S2), grown in the same conditions. Three replicates per genotype were harvested at the estimated inflection point for rosette dry mass measurement (inflection point estimated from the first experiment), and five replicates were harvested at the end of the life cycle for manual fruit counting.

rev5 and Col-0 were cultivated in the Center for Plant Molecular Biology (ZMBP, University of Tübingen, Germany). Plants were grown on standard soil (9:1 soil and sand) under controlled conditions: in long days (16 h day; 8 h night), low light (70–80 µE m−2 s−1) and an ambient temperature of 21 °C (see [41] for details).

Plant imaging and harvesting

All trays were manually imaged every day during the first 25 days after vernalization with a high-resolution camera (Canon EOS-1, Canon Inc., Japan). Individual labeling (i.e. genotype, replicate and date of measurement) was performed with ImageJ [33] during the image analysis process with the “RAPAmacro_RosetteShape.txt” macro. Image segmentation was performed on rosette and inflorescence, after inverting images and adjusting color saturation between 35 and 255. However, it is important to note that color threshold for segmentation depends on light conditions during imaging, and thus, needs to be adjusted by users on a set of template images. To clean segmented images, undesirable dots that remained after segmentation were removed with the ‘Remove outliers’ function in ImageJ. After segmentation, inflorescence skeletonization and 2D skeleton analysis were automatically performed with the corresponding functions in ImageJ (see code in Additional file 4). Skeletons were not pruned for loops. Extracted rosette shape and inflorescence skeleton parameters were automatically saved as.xls files.

Plants in the training population were harvested at 16 days after vernalization, rosette were dried at 65 °C for three days, and separately weighed with a microbalance (XA52/2X, A. Rauch GmbH, Graz, Austria). All individual rosette parameters extracted after segmentation were saved as.xls files, each row corresponding to a specific date, tray label and pot coordinates.

At the end of the life cycle, inflorescence and rosette of the focal population were harvested and separately photographed. They were dried at 65 °C for at least three days, and weighed with a microbalance (XA52/2X). In the experiment at ZMBP, whole plants of rev5 and Col-0 were photographed at the end of the life cycle (first yellowish fruits) by taking side pictures of each pot separately (n = 5).

Statistical analyses

Different predictive models were evaluated for both the estimation of rosette dry mass and fruit number. We notably compared linear models, quadratic models—where each predictor was fitted as a two-order polynomial function, RIDGE and LASSO regressions (Additional files 2 and 5). Prediction accuracy was tested by cross-validation on 100 individuals not used to train the model, using the Pearson’s coefficient of correlation (r2) between observed and predicted trait values. For each model, we tested prediction accuracy according to training population size across 100 random permutations of the training dataset (R code in Additional files 2 and 5). Training population size varied between 10 and 250 for dry mass estimation, and between 10 and 120 for fruit number estimation. Stepwise regression, using step function in R, was used to identify the best model, with minimum predictors, of rosette dry mass and fruit number.

Non-linear fitting of individual growth curves (Eq. 1) were performed with the nls function in R (Additional file 3). Since some plants germinated during or, for a few, after vernalization, we considered the first day of growth (t0) for each individual of the focal population as the day at which it had a minimum size. For convenience, we used the size of the biggest measured plant across all individuals at the first day of growth following vernalization, which corresponded to a plant with first true leaves just emerged. Growth was expressed as a function of days after germination (DAG, starting at t0). This procedure allowed for normalization of growth trajectories from the same starting point between individuals that differ in germination speed [42]. Growth dynamics variables were computed from the fitted parameters, such as absolute growth rate, GR(t), the derivative of the logistic growth function (Eq. 2), and RGR(t) (Eq. 3).

Broad-sense heritability (H2) was calculated with a Bayesian approach implemented in a MCMCglmm model performed in R, considering the accession as a random factor, as:

$${\text{y}}_{ik} = {\text{G}}_{i} + {\text{e}}_{ik}$$

where y is trait of interest in individual k of genotype i, Gi is accession i, and eik is the residual error. H2 was calculated at the proportion of genotypic variance (\(\sigma_{G}^{2}\)) over total variance (\(\sigma_{G}^{2} + \sigma_{E}^{2}\)):

$$H^{2} = \frac{{\sigma_{G}^{2} }}{{\sigma_{G}^{2} + \sigma_{E}^{2} }}.$$