Our system for high-throughput phenotyping is illustrated in Fig. 1. The seedling is surrounded by different cameras, observing the object from different perspectives (Fig. 1a). The system can deal with a variable number of cameras. The quality of the 3D reconstruction generally improves when more cameras are used. However, as the increase in quality gets smaller with higher number of cameras and the computational time increases linearly with the number of cameras, a trade-off between accuracy and speed needs to be made. In our experiments, we used 10 cameras as an optimum of the trade-off. The silhouettes of the seedling in the acquired camera images (Fig. 1b) are used to reconstruct the object in 3D through a shape-from-silhouette method (Fig. 1c). Next, the reconstruction of the whole plant is segmented in stem and leafs (Fig. 1d). Based on the whole plant reconstruction and the segmented stem and leafs, different quality features are calculated.
The 3D reconstruction method is described in more detail in Sect. 2.1. The leaf/stem segmentation is outlined in Sect. 2.2, and the different quality methods are described in Sects. 2.3 and 2.4.
High-throughput 3D reconstruction
We developed a shape-from-silhouette [38] method that calculates a 3D reconstruction of the sample from the silhouettes in all camera images. To perform the reconstruction, the precision position and orientation of all cameras with respect to the workspace need to be known, which are estimated through a calibration procedure. This procedure is outlined in Sect. 2.1.1, followed by a description of the fast 3D reconstruction method in Sect. 2.1.2 and a short discussion of the method in Sect. 2.1.3.
Calibration
To be able to perform the shape-from-silhouette method, the position of all cameras must be known with respect to the workspace. For every voxel in the 3D workspace, we need to know to which pixel(s) it maps on each of the cameras. To be able to calculate this mapping, we need to determine the internal and external parameters of the cameras. This procedure is known as multi-camera calibration [39]. The internal parameters describe the parameters of the camera and the lens, and include the focal distance of the lens, the size of a pixel on the sensor, the coordinates of the optical centre of the sensor and the radial distortion coefficient. The external parameters define the position and orientation of the camera in space with respect to a chosen origin, in our case the centre of the workspace.
To calibrate the system, a regular dot-pattern calibration plate is used. The plate is placed in a number of different orientations and positions and is synchronously observed by the cameras. In each camera image, a number of unique points can automatically be extracted. From the corresponding points between different cameras and by knowing the true dimensions of the plate, the internal and external parameters can be estimated. In general, the estimation improves when the calibration plate is placed in more poses, but typically around 20–25 observations are sufficient for a good calibration. We use the position and orientation of one of the calibration plates to determine the origin of the voxel space.
The whole procedure of multi-camera calibration was developed using Labview (Fig. 2), and is based on the single camera procedures provided by Halcon. Once both the internal and external camera parameters are determined for each camera, one specific plate position is chosen as a real-world reference. Through a chain of affine transformations, the correspondence of all camera positions to the real-world coordinates can be calculated. The software will optimize the chain by determining the smallest overall RMS error.
3D reconstruction
By applying the calibration, i.e. the projection matrices for each camera resulting from the external parameters and the corrected camera model, the mapping between the voxels in the 3D workspace and the pixels in the 2D camera images can be determined.
Knowing the projection of the voxels on pixels in the images, the object under observation can be reconstructed from the silhouettes in the camera images. The method for reconstruction is given in pseudo code in Fig. 3. All camera images are first segmented in foreground and background, resulting in binary silhouette images B\(_{\mathrm{k}}\). \(<\)
2.1
\(>\)This is done using a procedure known as background subtraction, where segmentation is obtained by subtracting an image containing only the background from an image containing the seedling. A pixel is part of the foreground when the Euclidian distance in RGB space between the two images is larger than a threshold value. The optimal value of this threshold is set manually for each camera to correct for small differences in apertures and lighting conditions.
Next, all voxels in the voxel space V are initially set to ‘occupied’. The occupancy of each voxel is then investigated by looking at the corresponding pixels in all camera images, which are determined using the camera parameters P\(_{\mathrm{k}}\). If all corresponding pixels are labelled as foreground, the voxel must be part of the object and remains occupied. Otherwise, the voxel is set to ‘empty’.
Figure 4 shows three different views on a 3D reconstruction of a tomato seedling. The plant is reconstructed well. Stem and leaf shapes are clearly visible. Also smaller structures like the petioles are included in the model.
Discussion on the 3D reconstruction method
We intentionally chose the fastest and simplest implementation of space carving. This approach is known to have minor drawbacks: a voxel is eliminated from the reconstruction volume if one of the image pixels covered by a voxel shows background. Thus, as a voxel usually covers multiple pixels in the same image, it is not ensured that a voxel is kept if at least one corresponding pixel in every image contains foreground. In some cases, this may lead to discontinuities in the reconstruction. To avoid this, the finest structures need to have a diameter of at least \(2\sqrt{3}\approx 3.5\) voxels for the worst-case scenario when the structures are oriented diagonally. In our experiments, we used a voxel resolution of 0.25 mm/voxel, whilst the diameter of the leaf stems was generally well above 1 mm. See also the description in Sect. 3.1.
Another known disadvantage of the shape-from-silhouette method is that only those parts of the object can be reconstructed well that are visible in the contour images, which means that occluded parts and concavities cannot be reconstructed. However, since plants, especially seedlings, are relatively open structures, all relevant parts are visible and this can hardly be called a drawback. Issues with occlusion are not unique for this method.
One of the strengths of the method is its high flexibility. The number of cameras, their viewpoints and their optics can be altered, requiring only a recalibration of the system, which is easy to perform. Also the dimensions of the workspace and the voxel resolution can easily be changed. The workspace can be adapted to the size and structure of the plant, which allows to work with very small workspaces of a few millimetres to workspaces that span several meters. There is, however, a practical limit to the number of voxels in the voxel space, as the method needs to store tables in memory with the relation of all voxels to the corresponding pixels in all images. The flexibility to easily adjust the size and resolution of the workspace allows the system to work with various types and sizes of plants, as long as the plant structure is relatively open, not to create too much occlusions and concavities.
\(<\)
3.2
\(>\) A major advantage of the space carving method is its ability to be used for in-line purposes at high speed using relative low-cost hardware, making it suitable for large-scale phenotyping experiments.
Stem and leaf segmentation
To be able to calculate the stem and leaf features, the complete 3D representation of the plant needs to be segmented into stems and leaves. We developed a segmentation method exploiting the structural layout of tomato seedlings (see Fig. 5 for a 2D illustration of the method). This algorithm is based on the breath-first flood-fill algorithm with a 26-connected neighbourhood, which iteratively fills a structure. In our case, we start with the lowest point in the voxel representation, the bottom of the main stem of the plant (red square in figure). In every iteration of the flood-fill algorithm, the neighbouring points that are not yet filled are added and the iteration number is stored for these voxels (illustrated by decreasing brightness of the squares). As long as the main stem is filled, the added points in each generation are closely located in space. However, at the point that the first side branches and leaves appear, the spread of the newly added points increases. When this exceeds a given threshold, that iteration is labelled as the end of the main stem (yellow square). This threshold depends on the resolution of the voxel space and the characteristics of the plant type. In our experiments, we used a threshold of 4 mm. When the flood fill is completed, we start a leaf segment with the last added point (green square), which will be one of the leaf tips, and subsequently backtrack the flood fill until the end point of the main stem is reached. In the process, all voxels are added to the leaf segment (squares white borders). We perform the same procedure from the next leaf tip (blue square), resulting in another leaf segment, and repeat this until all voxels have been labelled as either stem or leaf. If a leaf consists of a number of lobes, this algorithm separates the leaf into different segments. To correct this, segments that connect at a place other than the end of the main stem are merged.
Measuring leaf features
After stem and leafs are segmented, we calculate relevant phenotypic features of the leafs, specifically leaf length, leaf width, and leaf surface. These features are very predictive for the quality of the plant, as the size of the leafs play an important role in growth through the photosynthetic process.
Leaf length
We define the length of a leaf as the length of the midrib from the stipule (where the leaf connects to the stem) to the apex (the tip of the leaf). The leaf tip is determined by the segmentation method (see Fig. 5, blue and green point). It is the point on the surface of the leaf that is furthest from the stem end point. The stipule is determined in reverse order, as the point on the leaf surface that is furthest from the leaf tip. This method assumes an elongated leaf shape. Both points are marked in Fig. 6a by a purple cross. The Euclidian distance between these two points would be a very crude approximation of the leaf length and always an underestimation, as a leaf is typically curved in 3D. Instead, we estimate the leaf length with an additional point in the middle of the leaf (green cross in Fig. 6a). To determine this midpoint, a band of points halfway between the begin and end point of the leaf is selected (marked light blue in Fig. 6a). The midpoint is set to the centroid of this band of points. The leaf length is then calculated as:
$$\begin{aligned} l^{\text {leaf}}=\left| {\overrightarrow{\mathbf{{m}-\mathbf {s}}} } \right| +\left| {\overrightarrow{\mathbf{a-m}} } \right| , \end{aligned}$$
where s is the stipule, m the midpoint and a the apex. \(\vert \)
x
\(\vert \) gives the length of the vector x.
In a similar fashion, the length of the midrib in 3D can be determined more precisely by adding additional points between stipule and apex. However, we approximate the length by this three-point poly line, to keep computational costs low, as we aim to develop a high-throughput phenotyping system.
Leaf width
The algorithm for finding the width of the leaf aims to find the widest part of the leaf perpendicular to the axis through the begin and end points of the leaf, indicated by the purple crosses (Fig. 6a). The widest part is defined as the part were the Euclidian distance between the blue crosses is maximal. \(<\)
2.2
\(>\) To determine the position of the blue crosses, for all leaf points, the orthogonal projection on the line through the leaf’s beginning and end point (purple crosses) is calculated. The distance between the purple crosses is divided into a number (20) of equidistant sections. Leaf points having their projection in a section are selected, indicated in blue. They form a band across the leaf. The outermost points in this band on the left, \(\mathbf{m}^{\mathrm{l}}\), and on the right, \(\mathbf{m}^{\mathrm{r}}\), are used to approximate the width of that section. This is repeated for all sections; the width of the leaf is defined as the maximum of these:
$$\begin{aligned} w^{{\mathrm{leaf}}}=\text{ max }\left\{ \left| {\overrightarrow{\mathbf{m}^\mathrm{l}-\mathbf{m}^\mathrm{r}} \}} \right| \right\} \end{aligned}$$
As with the leaf length this is an approximation that will result in a slight underestimation, because the Euclidian distance is used instead of the ‘true’ distance over the surface of the leaf.
Leaf area
Growth of a plant is for a large part based on photosynthesis. The amount of photosynthesis, and therefore the rate of growth, is related to the total leaf area of the plant, which is the sum of the surface areas of all leafs. Based on our 3D voxel representation of the leafs, we can accurately determine the leaf surface (see Fig. 6b).
A leaf is reconstructed by a set of voxels, where the thickness of a leaf is generally at least two voxels thick. To get the leaf area, we first determine the set of all surface points, that is, all occupied voxels that neighbour one or more non-occupied voxels. This set contains points on the top and on the bottom of the leaf. The leaf area is defined as the area of the top surface of the leaf. We obtain that value by:
$$\begin{aligned} a^{{\mathrm{leaf}}}=\frac{1}{2}\left| {S^\mathrm{leaf}} \right| \cdot r^2, \end{aligned}$$
where \(S^{\mathrm{leaf}}\) is the set of all surface voxels of the leaf, \(\vert \).\(\vert \) the size of the set, and r is the voxel resolution (mm per voxel). The surface of a voxel is thus approximated by the square of the voxel resolution.
We have chosen to work with this approximation because of its simplicity, but it should be noted that it is only fully correct for horizontally or vertically oriented surfaces. For tilted surfaces, the method will give an underestimation of the area, which is worst for a 45\(^{\circ }\) angle, when the actual area will be underestimated by a factor \(\sqrt{2} \). For the set of seedlings used in this experiment, this approximation worked rather well (see Sect. 4.4), but it is a limitation to be considered.
Measuring stem length
The stem length is another important quality feature of a seedling. Our measure of the stem length is based on the 3D stem segment resulting from the stem/leaf segmentation. The midline, or skeleton, of the stem is determined in 3D by a midline-tracking algorithm finding a number of points on the midline (see red dots in Fig. 6c). This is an iterative algorithm that starts from the lowest point, m\(_{0}\). Each consecutive point on the midline is selected by searching the N nearest points connecting to the current point that have not yet been visited by the algorithm. From this set of N points, the centroid is calculated, which defines the new point on the midline, m\(_{i}\). Starting from that new point, the algorithm iteratively continues until all points in the stem segment have been visited. The length of the stem is then determined as:
$$\begin{aligned} l^{{\mathrm{stem}}}=\mathop \sum \limits _{i=1}^n \left| {\mathrm{m}_\mathrm{i} -\mathrm{m}_{i-1} } \right| ,\quad M=\left\{ {\mathrm{m}_0 ,\ldots ,\mathrm{m}_n } \right\} \end{aligned}$$
Table 1 Processing times of all parts of the processing and analysis
Processing time
Throughout the development, the aim has been to develop a system that is actually capable to act as a high-throughput phenotyping system, sufficient both in speed and accuracy. To achieve the necessary speed, approximations were used where needed.
Actual processing speed depends on the size of the voxel space, the number of cameras, and the size and complexity of the plant. All experiments were done using a voxel grid size of \(240 \times 240 \times 300\) voxels, 10 cameras, and a PC with an i7 type processor (3.2 GHz). The implementation was done in C++ and Labview.
Table 1 gives an indication of the time needed for all steps of the process.